K-means separated neural networks training with application to backgammon evaluations

Size: px
Start display at page:

Download "K-means separated neural networks training with application to backgammon evaluations"

Transcription

1 K-means separated neural networks training with application to backgammon evaluations Øystein Johansen December 19, 2007 Abstract This study examines whether a k-means clustering method can be utilied to classify different backgammon positions into several clusters, such that each of these clusters has a neural network to estimate the winning probabilities. In this way the different phases of the game can be covered by different neural net, without specifying a special rule for each phase of the game. Each neural net will have less responsibility and the sie of the neural nets can be kept relatively small. The neural nets has been trained both with Sutton s TD(λ) algorithm, and with a supervised learning algorithm. The same training algorithms has also been applied to a rule based classification configuration, such that the two classification methods can be compared. The results are still in favour of the rule based classification, but the results also indicate that the k-means separation method is usable and promising and worth further investigation. Contents 1 Introduction Backgammon Computer backgammon: a brief history Evaluations, equity and selection of the best move GNU Backgammon: Current evaluation techniques and position classification Lookahead searches Cubeful evaluations Scope of this study Experimental setup and code implementation Neural networks Output units Input units Activation function Backpropagation Code structure and modelling Benchmark method Database benchmark Head-to-head benchmark player

2 3 Training algorithms and k-means separation TD(λ)-training Supervised training K-means separation Complexity of k-means Results and analysis Benchmarks of GNU Backgammons neural networks Database benchmark compared to head-to-head Training results before introduction of k-means separation TD(λ) training results Supervised training results Results of k-means clustering Training of k-means separated neural networks TD(λ) training results Supervised training results The final verdict: Best k-means versus the reference nets Discussion TD(λ) training vs. supervised training K-means separation: Improvement or not? Further work Conclusion 29 A Algorithm implementations 31 A.1 TD(λ) training algorithm A.2 Supervised training algorithm B Description of tools developed 34 C Simplified class diagram of evaluator engine 36 1 Introduction Backgammon is a popular game, and it is winning getting the attention from more and more players all over the world. The possibility to make a computer program predict the outcome of a backgammon positions is very valuable for any backgammon player who take the game seriously. It is therefore important to be able to develop software that can give good estimates of the possible outcomes of a position. Serious players are willing to pay a relatively high price for a good software tool that can predict the probabilities of the game in a given position. A player who is studying the game with a good software tool will learn more about the game, an in this way the player can get a small edge on his opponent in a tournament. 1.1 Backgammon Backgammon is an ancient game. The exact origins remain unknown, but according to Crawford and Jacoby [10] the most ancient possible ancestor for the 2

3 Ž Œ u v r s t k l m n o p q r s t k l m n o p q r s t K L M N O p q r s t K L M n o p q r s t k l m n o p q r s t P Q R S T e d c B A j i h g f e d c b a j i h g f e d c b a e d c b a j i e d c b a j i h g f e d c b a J I H G F ƒ ˆ Š ẕ - Figure 1: The normal initial position of a backgammon game. Black should move his checkers clockwise to his home (points 1 to 6). White should move his checkers anti-clockwise to his home (points 19-24). game to be found so far, dates back about 5000 years. However backgammon with the rules we know today is much newer. The scoring system and the doubling cube is probably invented in 1926 or 1927 in New York [15]. Backgammon has increased its popularity over the last decades. The possibility to play the game over Internet and the increasing number of tournaments held all over the world can possibly be the main reason for this increased popularity. Backgammon is a two player board game, played with fifteen checkers (sometimes called men) for each player. The board consists of 24 point, where the checkers can be placed. These 24 points are usually of triangular shape and arranged in four quadrants with six points in each quadrant. The quadrants are divided by a bar, and checkers that are hit are placed on the bar. See figure 1. The game is turn based and is played with a pair of dice. The player on turn roll the dice and make a move according to the dice rolled. The object of the game is to collect all checkers in the home board quadrant and then bear checkers off the board. The player who first gets all his checkers off the board wins the game. Checkers can not be borne off unless all the players remaining checkers are inside his home quadrant. Backgammon is basically a race. The players move in opposite directions on the board so, the checkers interfere with each other in the race to get home. A single checker on a point (often called a blot) can be hit. A hit checker is placed at the bar and has to enter from the bar again before it can start race home again. Being hit is therefore a setback in the race. A point which is occupied 3

4 with two or more checkers is said to be owned by the player. The opponent can not use this point. Owning several constructive points can have a major obstructive effect, and can therefore slow the opponent down in his race. There are three types of wins in backgammon. A normal win is simply getting all checkers off before the opponent. This win gives the player one point. If a player can get all checkers off before the opponent can get any of his checker off, the player wins two points. This is called a gammon. The third type of win is if a player can get all checkers off while the opponent still has all checkers on the board and at least one checker remaining in the opponents home quadrant or on the bar. This win is called a backgammon and give the winner three points. The doubling cube is also an element in backgammon. If a player believes he has a good position he can offer to double the stakes in the game. The opponent must then consider if he will accept this offer and continue playing for twice the stakes, or if he rather wants to resign the game at the current stake. 1.2 Computer backgammon: a brief history Making computer backgammon programs are different from making computer chess programs. Chess programs can do brute force searches in the game tree. It is possible to see deep into the tree, and alpha-beta pruning speeds up the searches such that the computer chess player can find a good move. Computer backgammon players must take into account each possible dice roll, and there are 21 distinct rolls, and find the best corresponding move each of this 21 dice rolls. To find the best move for each roll, all the possible moves after each dice roll must be considered. On average there is about 20 legal moves in a position with a given dice roll. The fan out of the search tree will therefore be much wider than in chess. This limits how deep it is possible to search the game tree. One of the first attempts to make a computer backgammon player must have been BKG by Hans Berliner [4, 2]. This backgammon player was developed in the late 1970s. In 1979, BKG played the world champion of the same year, Luigi Villa. BKG won a 5 point match by 7-1 [3]. Later Berliner admits that the program was really lucky and made some terrible bad moves, but Villa was not able take advantage of the programs weaknesses. BKG was knowledge based computer program with heuristic rules of how to evaluate each position. It is hard to estimate how strong this program was compared to todays programs, but it was probably just playing at an amateur level. Expert Backgammon, by Tom Weaver was a commercial program available for Macintosh in the mid 1980s. It was also knowledge based, but one of the big features of Expert Backgammon was its ability to do rollouts. Rollouts are simply a Monte Carlo-simulation of a position, which gives a really good estimate for how good a position is, and what the probable outcome is. Before this software, rollouts were done manually by two human players and was considered time consuming and boring. Computeried rollouts were much faster and more reliable. The next successful attempt to make a computer player was Neurogammon, by Gerald Tesauro, a researcher at IBM [18]. It used a artificial neural network for evaluating backgammon positions, and this must be considered a milestone in the development of computer backgammon. Neurogammon was trained with supervised training based on inputs from an expert player. 4

5 This program became the best computer program for backgammon, but it still played at the level of a strong amateur player. Gerald Tesauro continued his research of computer backgammon and constructed TD-Gammon. TD-Gammon was based on Neurogammon and used neural nets for the evaluations, but the difference was that TD-Gammon was trained with a Temporal Difference Learning-algorithm, TD(λ) [19, 20]. It is a reinforcement learning technique which is described in detail by Sutton [17]. TD(λ)-learning is simply that the program learn as it plays against it self. TD- Gammon achieved a master level of of play, and it is fascinating that there was ero knowledge (except from the rules) coded into the system. It was purely trained by self play [21, 22]. JellyFish by Fredrik Dahl, was the first commercial backgammon playing program that used neural nets for evaluating positions. Kit Woolsey, a world class player, said JellyFish 1.0 was better than TD-Gammon in some respects, but overall weaker [16]. JellyFish 2.0 was released in March 1996, and version 3.0 was available in The development of JellyFish ceased in 1999, and the latest version available is version 3.5. JellyFish was also trained by a reinforcement learning algorithm, but not the exact TD(λ) algorithm used by TD-Gammon, however the details about this is not known. Snowie was the next commercial program that played backgammon. It was first released in July It was really strong, and it had an appealing user interface. Even though it was expensive compared to JellyFish, it became the choice for most serious backgammon players. It was so strong that evaluations and rollouts performed by Snowie was considered as the only true evaluation. GNU Backgammon (gnubg) was an open source effort that was started by Gary Wong in 1998 [24]. It is free software and several developers joined the project and it soon became just as strong as Snowie, if not stronger. The neural networks were mainly trained by Joseph Heled, who has documented much of his work at his web page [9] and in the GNU Backgammon source code. This study is really much based on the work of Heled in the GNU Backgammon project, and some of the details in the GNU Backgammon is therefore described throughly. 1.3 Evaluations, equity and selection of the best move Common for all computer backgammon programs is that they do not consider moves. They evaluate positions. An evaluation of a backgammon position is simply an estimation of the probability for each outcome of the game. Each possible outcome is a random variable and an evaluation is the process of predicting or estimating the values of these random variables. When an estimate for the probabilities of the possible outcomes of the game has been found, it is trivial to calculate the equity of the position. The equity is the expected number of point a player win pr. game (ppg). The expected number of points a player wins is of course the same as the expected number of points the opponent loses. Therefore the equity of a player is always the negative of the opponents equity. Since a player can win or lose one, two or three points, the equity will always be from -3.0 to 3.0. When a computer backgammon player selects a move, it take the current position and finds all legal moves and then evaluates all the resulting positions after each of these moves. The resulting position that gives the best equity is 5

6 the position that was resulted from the best move. So the best move is found by evaluating the resulting position after each legal move from the current position with the dice roll given. 1.4 GNU Backgammon: Current evaluation techniques and position classification GNU Backgammon is a backgammon program that plays and analyse backgammon games and positions. Its evaluations are considered to be one of the best among other computer backgammon programs. This study tries to recreate the training done in the GNU Backgammon project, and hopefully the quality of the evaluations can even be improved. The evaluation engine in GNU Backgammon is based on several techniques to get a good evaluation of each position. The positions are classified in five different evaluation classes according to the position on the board. These different classes are evaluated in different ways. Two sided bearoff Both players are bearing off their checkers, and both players have six or less checkers remaining on the board. These situations are handled by a two sided database. For each possible position there is a corresponding winning chance. This bearoff database is pre-calculated by a recursive algorithm and the evaluation of this type will therefore give an exact result. One sided bearoff Both players are bearing off, but at least one of the players has more then six checkers left. This type of positions are evaluated by another database. This database has stored the distribution of rolls to get all checkers off from each position of one of the players. The winning probability is then given by combining these distributions. This method is described by Boro [5], and this method is known to give results that is very close to the exact probabilities. Race positions This is the type of positions where the contact between the players checkers has been broken and no checkers can be hit or blocked anymore. These positions are evaluated by the race neural network. This neural network has 214 input units and 128 hidden units and 5 outputs. This neural net was developed and trained by Heled [9], and it is considered to be very strong and reliable. Crashed positions These type of positions are when one of the players has six or less free checkers to play, where the rest of the checkers are either borne off or stacked on the ace point or deuce point 1 of the board. This type of position is also evaluated by a neural network. The neural net has 250 input units, 128 hidden units and 5 output units. This neural net is known to have some weaknesses, and can make big mistakes in certain positions. Contact positions These are the class that is most common in a backgammon game. It is just a contact position, that does not fall into the crashed definition above. Heled reports that 79% of all positions fall into this category 1 The ace point and deuce point is the traditional names for the one-point and two-point. See the points numbered 1 and 2 in figure 1 6

7 based on online games played at FIBS between computer and humans. This position class is also evaluated by a neural network. This neural network has 250 input units, 128 hidden units and 5 output units. This net is strong, but it is known to make some mistakes in certain positions Lookahead searches For improving the evaluations, GNU Backgammon can also do a lookahead search. It is the brute force game tree search known from computer chess, which is applied for backgammon. Since the fan out of the game tree is much wider in backgammon than in chess, the search is shallow compared to the similar search in chess. Lookahead search is simply looking at all possible rolls by the opponent and then averaging the value of all resulting evaluations of the resulting position after the best move of each roll. This can be done in arbitrary number of subsequent moves. This is a method that is known to work well for chess, but in backgammon the search tree has a high branching factor and the search becomes expensive. To improve this there is also a set of pruning neural nets. These are smaller neural nets with only 200 input units, 10 hidden units, and the same 5 output units. These neural networks are applied at the top level search when doing lookahead search. For deeper nodes in the search tree, there are move filters that cancels out bad moves that should not be further evaluated Cubeful evaluations Backgammon is often played with a doubling cube. The player on roll can offer a double for his opponent, suggesting that the game continues for twice the stakes. The opponent can accept this suggestion, take the cube, or he may chose to pass the suggestion and thereby lose the game at the current stake. This is sometimes called to drop the cube. The cube gives a skewness to the equity, since if a player can double where the opponent will pass, the true equity after this will be 1.0 for the player and -1.0 for his opponent. Janowski [11] has studied this in detail and his research serves as a basis for cubeful evaluations in GNU Backgammon. It is basically an adjustment to the cubeless evaluation based on the cube ownership. This method is considered to be state of the art for handling cube evaluations in backgammon, even though several other methods has been implemented and tried. 1.5 Scope of this study This study will try to build a set of training tools to create a new independent backgammon evaluation engine based on some of the techniques and code from GNU Backgammon. The tools will be developed and neural nets will be trained. Since the bearoff databases are considered to be accurate, and the race neural net is very strong and reliable, the effort should be put into the contact and crashed position types. Since most of the positions are evaluated with the contact neural net, it is believe that it would give a better overall performance to improve this specific neural net. Another approach to improving the overall quality of the evaluations, is to define different position classes than the current contact and crashed class. The 7

8 classification between these two classes is simply rule based, and this classification could be done with a k-means classifications instead. Such a k-means classification also means that the positions that is now evaluated by crashed and contact evaluators, can be evaluated by several new neural nets. This study has therefore two parts: Try to recreate and retrain a neural network that handle contact positions, and hopefully get something that is stronger than the current GNU Backgammon neural net for this position class. This is to verify that the training algorithm works. Reclassify the positions now handled by crashed and contact neural nets by a k-means method. Then retrain these new neural nets based on these new classifications. The study is also limited to evaluations where the cube is not considered. This means that all evaluations are considered based on the assuption that the game will be played to the end, and not terminated by a cube offer and a drop. Every point scored is considered to have the same value for the player, like each point scored is connected to a stake. This is sometimes called cubeless money game. It is believed that a good cubeless money game evaluation will give a good enough cubeful evaluations with the current cubeful evaluation techniques suggested by Janowski [11]. The study will also only consider static evaluations without any lookahead in a search tree. If the static neural net evaluations are improved, the lookahead evaluations will also improve. 2 Experimental setup and code implementation This section describes some of the technologies used in this study and some of the decisions taken to build the computational model. It gives a very short description of neural networks. For a more extensive description of neural nets, refer to text books like Kartalopoulos [12] or Haykin [8]. This section will describe the neural nets used, the benchmarking methods, and how the software model is designed and implemented. 2.1 Neural networks Artificial neural networks, or just neural networks is a technology utilised in development of artificial intelligence. The neural net technology is inspired by how the biological brain of humans and animals works. A biological brain is composed of millions of neurons, and these neurons are connected to each other by axons and dendrites. The connection between them is adaptive, which means that the connection structure is dynamically changing. Changes of the connections is what we call learning. In an artificial neural network it is similar, however here it is just numeric values that connects the neurons. One of the most used structures is the multilayer neural network. The neurons are usually modelled in three layers, where the first layer is called the input layer, the intermediate layer is called the hidden layer and the last layer is called the output layer. See figure 2. The neurons in this model are sometimes called units or nodes. The nodes of each layer 8

9 Figure 2: Structure of a multilayer neural network. This network has three input units, four hidden units and three output units. The units, or neurons are connected through adjustable weights. Adjusting the weights is considered as training of the neural network. are connected to each other with adjustable weights. The process of adjusting these weights is called training. The numeric value into a unit is the sum of all its input connections times the weight of that input connection. The numeric value of the neuron is then adjusted with an nonlinear activation function. The neural networks used in this study is based on the implementation found in GNU Backgammon. However, some features have been removed and some other features have been added. The neural net is now written as a class with the gobject system. The neural networks is quite standard three layer neural networks. There are five output units, 128 hidden layer units and 250 input units. The activation function is a standard sigmoid function Output units There are five outputs of the neural networks, each representing a probability of an outcome. The output vector will be denoted. Since there are six different outcomes of a backgammon game, there is only need for five outputs, since there must be an end to a backgammon game. The five different outputs are defined as in table 1. Note that there is no output for the probability of losing, since this would be redundant. The probability of losing is simply the probability of not winning which is easily calculated as

10 Output description 0 Probability of winning. Any type of win 1 Probability of winning a gammon or a backgammon 2 Probability of winning a backgammon 3 Probability of losing a gammon or backgammon 4 Probability of losing a backgammon Table 1: Description of neural net outputs Input units The input calculation from a backgammon position description to a input vector for the neural networks is the same as in GNU Backgammon. There are four base inputs for each point of the board. These four inputs are all ero when there is no checker on the point. The first base input is set to 1.0 if there is a blot (single checker) on the point and is ero if there is more than a single checker on the point. The second base input is set to 1.0 if there is exactly two checkers on a point and else ero. The third base input is set to 1.0 if there are three or more checkers on the point. The last base input for a point is increasing by 0.5 for each checker more than three. If there are three or less checkers on a point, the input is kept ero. These base inputs are more or less the same as described by Tesauro [19]. In addition to the base inputs, there are 25 handcrafted inputs. For being able to test the current GNU Backgammon neural networks, these handcrafted inputs were kept the same as in GNU Backgammon. These inputs are pipcounts, a blocking value, a contact value, a momentum of checker distribution, some inputs for anchors, and some inputs for number of hitting dice rolls and how much race setback if hit. For a details about the additional handcrafted inputs, check the GNU Backgammon source code [23]. So, there are 24 points on the board, in addition comes the bar, and these 25 points has four base inputs which makes 100 inputs. There are 25 additional handcrafted inputs. These 125 inputs are calculated for both players, so the total number of inputs to the neural net is Activation function The activation function is a quite standard sigmoid function shown in equation 1. The β values for the sigmoid functions are 0.1 for the hidden layer activation, and 1.0 for the output layer activation. Some small experiments were done to see if these values could be adjusted to some other values that could make weights converge faster, however the effect of the β values was found to be minimal. 1 f(x) = (1) 1 + exp( βx) Since the call to sigmoid is done really often, it is quite important that this function is really effective and fast. To increase the performance of this sigmoid, the function has been discretied to a lookup table. 10

11 Evaluator +name: GString +evaluate() NetEvaluator +nn: NeuralNet +ip: InputCalculator +pp: PostProcessor evaluate() DBEvaluator +db: Database evaluate() OverEvaluator evaluate() Figure 3: Simplified class diagram of the Evaluator abstract class, and the three implementation Backpropagation For adjusting the weights during training, a standard backpropagation algorithm is used. A simple experiment with momentum adjustment to the backpropagation was used at an early stage, but there was no effect on the weight convergence to be observed. Momentum adjustment was therefore abandoned. 2.2 Code structure and modelling Before the new training tools were developed, an object oriented design to the system was made. All code is now written in the C programming language, and the only non standard library that is used is GLib [6]. GLib is a class library for C, which provides basic data structures like linked lists, tree structures, hash tables, etc. This library is the foundation for the GTK+ toolkit system. In addition to the data structures, GLib also contains gobject and gtype. These provides the C programming language with an object system. With this object system it is possible to make object oriented code with known principles like inheritance, polymorphism and encapsulation. There are two fundamentally different ways of evaluating a position: database evaluation and neural network evaluation. In addition there is also an evaluation when a game is over, and these situations gets an evaluation type on its own. The obvious design would be to have an abstract class, Evaluator, with the method evaluate(), and three implementations of this abstract class, as shown in figure 3. In this way an evaluation can be implemented to the abstract class instead of to each implementation. The NetEvaluator class is composed of three other classes: NeuralNet, InputCalculator and PostProcessor. An evaluation with an instance of this class begins with calculation the input vector to the neural net from the board. This calculation is performed by the InputCalculator instance. The evaluation is the calculation performed by the neural net. After the neural net evaluation, the PostProcessor checks the neural net output, to make sure the output make sense. In some situations the neural net can report impossible results, like a higher probability of winning a gammon than the probability of winning. The Post- Processor instance, will correct error like these. The composition of the NetE- 11

12 InputCalculator +name: gchar * +n_input +calculate() NetEvaluator +nn: NeuralNet +ip: InputCalculator +pp: PostProcessor evaluate() NeuralNet +n_input: gint +n_output: gint +n_trained: gint +evaluate() +train() PostProcessor +db1: Database +db2: Database +postprocess() Figure 4: Simplified class diagram of the NetEvaluator class. Evaluator * 1 Engine Classifier +name: GString +evaluate() +find_best_move() +find_best_moves(): GList +evaluate_position() +get_evaluator(): Evaluator -invert_values() +classify_position() Figure 5: Simplified class diagram of the Engine class. The class has a collection of all different evaluators and appurtenant classifier. valuator is shown i figure 4. The NeuralNet class is simply done as one class with all operation like evaluate, train, load and save. However, the activation function should have been separated out of the NeuralNet class, since the use of different activation functions can be a subject for further studies. The collection of different evaluators together with an appurtenant classifier instance is collected in a class called Engine. The simplified class diagram for the engine class is shown in figure 5. A full (simplified) class diagram of the of the whole evaluation system can be found in appendix C. 2.3 Benchmark method With the tools developed as described above, it is really simple to breed a new neural network. The big problem is to breed a neural network that is better than previous nets. It is therefore necessary to be able to benchmark a neural net in an effective way. The best test would be to let one neural net play a high number of games against a reference neural net. However, the natural variance in high level backgammon is quite high. The number of games to be played in such test, must probably be above a million to get statistical significant results. Such test would therefore be time consuming and would not be practical for 12

13 benchmarking Database benchmark A better approach is to have a collection of positions where a dice roll is given, and then find the best move by performing a Monte Carlo-simulation of the resulting position after each legal move. In that way a database can store a position and an accommodating dice roll, where the best move is known, and in addition for every other move, the equity loss (ppg) by not doing the best move. Fortunately such database already exists from the GNU Backgammon project. Heled [9] collected such benchmark database of positions for each position class. The database for contact positions contains positions with an accommodating dice roll and Monte Carlo-simulated results of the best moves. For each position in the database, there is usually only five or six move candidates that has been simulated, however these are the five or six best moves. The benchmark is then calculated by looping through all these positions, find the best move according to the current neural net evaluator, and compare this move with the best move stored in the database. If the evaluator finds the same best move as the best move in the database, no error is added to the running total error. If another move is found to be best by the evaluator, the running error increases with the equity difference of the best move and the move done. When all positions in the database are done, the total error is reported as the benchmark. Since this benchmark is based on comparing different position relative to each other, this is called the relative error benchmark. The same database also contains some positions where there is no dice roll given. There is just a Monte Carlo-simulation of the five outputs. In Heleds work these positions and results were used for benchmarking cube evaluation. In this study these positions are used to get an absolute error benchmark. The benchmarking algorithm return the relative error, and the absolute error, and the number of positions where the evaluator did the best move according to the benchmark database. We should distinguish between absolute error and relative error: Relative error The error of how different positions are evaluated compared to each other. Is one position better than another? This can be measured by a benchmark algorithm that selects the best move given a position and a dice roll, which is exactly what the benchmark algorithm do. Absolute error The error in the evaluation of the position. How precisely does the net predict the outcome of the game compared to a rollout of the same position. (Rollout is the backgammon lingo for a Monte Carlosimulation, these MC-simulations are usually quite accurate). Fortunately the benchmark database also contains positions where there is a rollout result, and this is used to make a measure for the absolute error. Important: We are most interested in having the relative error as low as possible. We want our evaluator to make the right move. If it can find the right move, a rollout can find the true value of the position. This is also how human experts evaluate when making a move decision. A human will compare the different resulting positions after each move, and compare these positions to each other. The human will not try to estimate the true winning chance after each move, and then compare the winning chances. 13

14 2.3.2 Head-to-head benchmark player In addition to the benchmark based on the database, a head-to-head benchmark was developed. It is a small tool that can set up two different evaluation engines against each other for a specified number of games. These games were played without any cube actions taken and the different outcomes were reported together with how many points pr. game an evaluation engine scores against another. Standard deviations to all numbers were also calculated. In this way the database benchmark could be verified, and a relation between the benchmark error rates and ppg could be developed. This relation can be found in the result section. 3 Training algorithms and k-means separation 3.1 TD(λ)-training It is possible to make a neural net based backgammon evaluator learn how to play good backgammon with ero knowledge (except for the rules of the game), just by letting it play against itself. The method is a Reinforcement Learning algorithm called Temporal Difference (TD) learning. This algorithm was introduced by Sutton [17]. It was first applied for backgammon by Tesauro [19]. The algorithm is updating the weights in a neural net according to equation 2. w t+1 w t = α(y t+1 Y t ) t λ t k w Y k (2) The left side in equation 2 is simply the weight change. The α is a learning rate parameter, and Y t+1 and Y t is the evaluation outputs at time step t + 1 and t. A time step in this sense is one move by a player. The w Y k is the output gradient with respect to the weight. The λ is a parameter in the range from 0.0 to 1.0, which controls how much temporal credit to be assigned to errors for time steps (moves) further back in the game. A λ value of 0.0 will only assign credit for the error in the immediate previous position, while λ=1.0, will assign just the same credit for all errors trough that game. Intermediate values for λ will give a logarithmic decrease of temporal credit assignment in previous time steps. Several sources [22] states that a good value for λ is 0.0. This simplifies a lot. This means that there is only a temporal credit assigned for the previous time step. There will be only one term in equation 2. Notice how equation 2 reduces to the backpropagation equation when λ = 0. The backpropagation algorithm can be utilised directly to update the weights. The target values for the input at time step t is simply the output of the estimator at time step t + 1, namely Y t+1. This makes a really simple implementation of the algorithm. The implementation can be found in appendix A Supervised training Heled [9] used a database of training positions that was stepwise increased. It started out with a small database of position, and the net was trained until k=1 14

15 no further improvement was seen. Then more positions were added to the training database, as the training developed. However, in the training in this study, the database uses all six hundred thousand training position right from the beginning, and the number of positions in the database is never increased. In the article, Heled describes his supervised algorithm: A switch to supervised training (ST) addresses the first drawback. A set of positions is chosen, and the net is trained in epochs (A pass through all of the training data with a fixed α) over and over, trying to find the best fit for the data set as a whole. It was quite an effort to make supervised training work. I ended up with a method which works for me, but it is still unclear why it does. I use ridiculous values of α in the range 30 to 1, coupled with randomiing the order of data in each epoch. 1. set α to α start 2. select a random order for the data 3. perform one epoch 4. err = NN error 5. if err is smaller by p% than err previous, return to stage 3 6. decrease α. if α < α min, set it to α start 7. if err < err previous, return to stage 3 8. return to stage 2 So, while there is a p% improvement, another pass is performed with the same settings. When there is a small improvement, α is clamped down some more. When error increases, the data is reordered. Typical values are α start = 20, p = 0.5% and α min = 1 or 0.5. Obviously a big α lets us escape from small local minima, but it is not clear to me why it converges at all. The algorithm in this study is quite similar to this. The difference is that in step 6, α is decreased with a decrease factor, by default set to 0.9, as found in Heleds code, but if a minimum α is reached, the α is kept low instead of reset to α start. Minimum α is set to The implementation has instead an increase of the α at step 8 in the above algorithm. This increase is called the increase factor. The default value of the increase factor is The idea of the increase of α is to create a small disturbance such that a training session can get out of a local minimum. The implementation of the supervised learning algorithm can be found in appendix A.2. During an epoch 2, the weights are updated for each position in the database, referred to as stochastic training in Duda [7], as opposed to batch training. Duda [7] recommends stochastic training for most applications especially ones employing large redundant training sets. The database with training positions for the contact neural net contains positions. The database use in this study is the same as Heled used in his training. The positions in this database are chosen from self play, and the corresponding probability values are simulated with Monte Carlo simulations ( Rollouts ). 2 An epoch is one run through all positions in the training database 15

16 3.3 K-means separation Our proposed method to improve the overall evaluations is to separate the neural nets in several nets. For the current evaluators, the evaluations from bearoff databases and the race neural network, are close to perfect and need no further improvement. As shown in figure 6a, the current classifier for positions is rule based, and classifies positions in these position types: Game over Two sided bearoff One sided bearoff Race positions Contact positions Crashed positions The first is simply the class returned when a game is over and there is a winner. The corresponding evaluator just gives the points to the winner. The two bearoff classes are evaluated by lookup databases. It is only the three last items at this list that are evaluated by neural nets. The Race position class has a different input calculation and the positions are conceptually different from the other two neural net classes. The race position neural network is also really good. It is therefore natural to keep the race position evaluations as they are today. The crashed and contact position classes may benefit from a further subdivision, and the idea is to provide such division with a k-means method as shown in figure 6b. The idea is to rewrite the classifier such that the positions that are now classified as contact or crashed, will be classified with a new k-means classifier. These new classes will simply be labeled contact0, contact1, contact2 and contact3. See figure 6b. The data vector to classify from is a subset of the input vector to the neural net. The input vector to the neural net is composed by 200 base inputs and 50 additional inputs which is basically hand crafted features of the position like pipcount, how many hitting numbers, pips set back if hit, blocking value etc. It is these additional inputs that is used for the k-means classification. A simple program was written to extract all the positions from training databases. The training data for the k-means classification are simply the positions from the training data sets from both the crashed training database and the contact training database. For each position in these databases the 50 additional handcrafted features was calculates, and saved to a file. There are positions in the contact training database and positions in the crashed training database. The k-means algorithm used to generate the codebook, was the implementation from Scipy. Scipy is a set of algorithms and tools for the Python programming language. It enriches the Python language with matrix operations like the operations known from commercial programs like MATLAB. It also has some analytical packages available. The k-means algorithm used here is from the package scipy.cluster.vq. 16

17 Figure 6: Decision tree for the standard rule based classifier in (a), and for the new implemented classifier with k-means classification (b). 17

18 Each iteration of the k-means algorithm, with the described dataset, took about 30 seconds. The calculation was set to calculate a codebook (set of µ vectors) for 1680 iteration. This takes about 15 hours, and was all the time that was available at the moment. The algorithm was set to classify to four different classes. Of course it could be more than four or less than four, but five or more would of course demand even more training. Three is only one more than the existing configuration, so four new classes sounded like the best number for this study. Optimum number of classes can be examined further. The interactive Python/Scipy code for the k-means classifier training, can be seen in listing 1. Listing 1: Interactive python code to generate the k-means codebook from scipy.io import read_array a = read_array("features.txt", atype='f', rowsie=100) # The above reading takes half an hour from scipy.cluster.vq import kmeans, vq codebook, dist = kmeans( a, 4, thresh=1.0e-10, iter=1680) The distortion, the sum-of-square errors, after these 1680 iterations became With further iterations the distortion could have been further decreased, but the available time and progress of the project limited the number of iterations. More information about the k-means algorithm that is used, can be found in the Scipy documentation [14] Complexity of k-means There are several good features about k-means clustering. There is no need to name the classes, the algorithm find the classes on its own. It is also proven that the clustering algorithm must converge to a minimum distortion. After some number of iterations, there will be no changes in the set of µ, and the algorithm terminates. The computational complexity of k-means is linear with respect to all input sies. The complexity is O(ndcT ), where n is the number of training samples, d is the dimension of each sample, c is the number of clusters and T is the number of iterations. See also Pattern Classification by R. Duda et al. [7] The problem is to find out how many iterations are needed before the k- means algorithms terminates with a stable set of µ. This has been studied by D. Arthur and S. Vassilvitskii [1]. This article state that the number of iterations needed to full convergence is superpolynomial. The lower bound for the number of iterations is indicated to 2 Ω( n). This is much higher than the 1680 iterations done in this study, but it looks like the classifier can find certain concepts after just these relatively few iterations. 4 Results and analysis This section summaries some of the results from head-to-head benchmarking, the k-means separation and training sessions for the standard classification scheme and the k-means classification scheme. The results from the training sessions are give as the error rates from the database benchmarking. 18

19 Relative error benchmark # correct moves # total moves err/tot ratio % Total rel. error Average error Error rate Absolute error benchmark # total positions Total equity diff Abs. error rate Table 2: Benchmark of current GNU Backgammon neural net evaluator. 4.1 Benchmarks of GNU Backgammons neural networks It is natural to first find how the benchmark values of Heleds [9] evaluation engine. In this way it is possible to compare directly if something is done right or wrong. The benchmark results of these are shown in table 2. This neural nets are considered as the reference neural nets. 4.2 Database benchmark compared to head-to-head From a supervised training before k-means separation, some of the neural nets were matched head-to-head against the best net from Heleds work. From the training session 24 contact nets were selected with a benchmark relative error range from 1207 to Heleds best neural net benchmarks to a relative error of Each net was matched in a series of cubeless money games, terminated at the state where a race position occurs. The results were logged. To make the numbers comparable to other reference neural nets, the relative score of Heleds reference net (1122.7) is subtracted, and a linear regression is made based on the benchmark error difference of the two combating nets. Figure 7 show the difference of the conventional benchmark relative errors compared to the equity lost against Heleds best neural nets. The error bars indicate the 95% confidence interval. The dashed line is the regression line. It is assumed that the line intersects at the origin, since two equal nets should logically be even. The slope of the regression line is e-03. This linear relation is simple and can be used to estimate how many points pr. game one neural net will lose to another based on the database relative error difference. This result does not only give a relation between the benchmark and ppg lost. It also verifies the benchmark. Since the correlation can be seen, the benchmark database must be valid. 4.3 Training results before introduction of k-means separation The first experiments performed was to try training with the same neural network configuration as the current GNU Backgammon evaluator. This was done to verify that the algorithms worked properly, and that a new evalua- 19

20 ppg Benchmark relative error difference Figure 7: The difference in benchmarked relative error compared to points pr. game lost tor could be retrained to the same strength with the newly implemented algorithms TD(λ) training results Training with TD(λ) training was some of the first experiments performed. This was even before the logging feature of the tool set was implemented, and a figure of the first initial TD-training can therefore not be shown. This training method showed a lowest relative error benchmark at , and an absolute error rate of Unfortunately the number of games played to achieve this error rate is lost, but the training was run for over two weeks so the real number of games must have been above 100 million. The learning rate α was set to 0.15 for all updates. A bootstrap method was used to indicate if there was any improvement, and at the time the training was stopped, there was no indication of improvement. Achim Müller [13] also performed some TD-training. He tried to train from a different starting position. The starting position used in his training session was the position known as Nackgammon. Nackgammon is the same as backgammon, but both players starts with four checkers back instead of two checkers. This leads to more complex positions. This training converged to a relative error of 1616 after 110 million games. The α was set to 0.5 for this training. Another TD-training session was performed later in the study, and the figure from this training session is show in figure 8. The difference to the initial 20

21 rel. error # training games (x ) Figure 8: TD(λ) training. Decreasing error rate with number of training games played. TD-training is the learning rate, α, which was during this training set to 0.5. The best of these training sessions, gave a new contact neural net that benchmarked to According to the relation between relative error and ppg lost, it is likely that this neural net will lose about ppg against Heleds reference neural net Supervised training results With this training method, the neural net scored better. A net that was trained in advance for about 1800 epochs, managed to reach a relative error of , but after epoch there was absolutely no improvement, and this training was stopped. A figure of this training is given in figure 9. This is still the lowest relative benchmark observed in this study. According to the relation found, it is estimated that this neural net will only lose ppg against the reference neural net. This is extremely close to the reference net. This can be verified by a head-to-head test, but such test must be played over millions of games to give a statistical significant result. This head-to-head test is therefore not performed. 4.4 Results of k-means clustering The developed k-means method is now used to classify the positions in the training database. The result is quite promising. The classification algorithm has been coded in C, and the codebook values from the k-means algorithm 21

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play NOTE Communicated by Richard Sutton TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas 1. Watson Research Center, I? 0. Box 704, Yorktozon Heights, NY 10598

More information

Decision Making in Multiplayer Environments Application in Backgammon Variants

Decision Making in Multiplayer Environments Application in Backgammon Variants Decision Making in Multiplayer Environments Application in Backgammon Variants PhD Thesis by Nikolaos Papahristou AI researcher Department of Applied Informatics Thessaloniki, Greece Contributions Expert

More information

Contents. List of Figures

Contents. List of Figures 1 Contents 1 Introduction....................................... 3 1.1 Rules of the game............................... 3 1.2 Complexity of the game............................ 4 1.3 History of self-learning

More information

CS 229 Final Project: Using Reinforcement Learning to Play Othello

CS 229 Final Project: Using Reinforcement Learning to Play Othello CS 229 Final Project: Using Reinforcement Learning to Play Othello Kevin Fry Frank Zheng Xianming Li ID: kfry ID: fzheng ID: xmli 16 December 2016 Abstract We built an AI that learned to play Othello.

More information

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel Foundations of AI 6. Adversarial Search Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard & Bernhard Nebel Contents Game Theory Board Games Minimax Search Alpha-Beta Search

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS Thong B. Trinh, Anwer S. Bashi, Nikhil Deshpande Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 Tel: (504) 280-7383 Fax:

More information

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask Set 4: Game-Playing ICS 271 Fall 2017 Kalev Kask Overview Computer programs that play 2-player games game-playing as search with the complication of an opponent General principles of game-playing and search

More information

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1 Foundations of AI 5. Board Games Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard and Luc De Raedt SA-1 Contents Board Games Minimax Search Alpha-Beta Search Games with

More information

Game-playing: DeepBlue and AlphaGo

Game-playing: DeepBlue and AlphaGo Game-playing: DeepBlue and AlphaGo Brief history of gameplaying frontiers 1990s: Othello world champions refuse to play computers 1994: Chinook defeats Checkers world champion 1997: DeepBlue defeats world

More information

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente Valentijn Muijrers 3275183 Valentijn.Muijrers@phil.uu.nl Supervisor: Gerard Vreeswijk 7,5 ECTS

More information

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art

Foundations of AI. 6. Board Games. Search Strategies for Games, Games with Chance, State of the Art Foundations of AI 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard, Andreas Karwath, Bernhard Nebel, and Martin Riedmiller SA-1 Contents Board Games Minimax

More information

Bootstrapping from Game Tree Search

Bootstrapping from Game Tree Search Joel Veness David Silver Will Uther Alan Blair University of New South Wales NICTA University of Alberta December 9, 2009 Presentation Overview Introduction Overview Game Tree Search Evaluation Functions

More information

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess Stefan Lüttgen Motivation Learn to play chess Computer approach different than human one Humans search more selective: Kasparov (3-5

More information

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1 Unit-III Chap-II Adversarial Search Created by: Ashish Shah 1 Alpha beta Pruning In case of standard ALPHA BETA PRUNING minimax tree, it returns the same move as minimax would, but prunes away branches

More information

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Programming an Othello AI Michael An (man4), Evan Liang (liange) Programming an Othello AI Michael An (man4), Evan Liang (liange) 1 Introduction Othello is a two player board game played on an 8 8 grid. Players take turns placing stones with their assigned color (black

More information

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search

COMP219: COMP219: Artificial Intelligence Artificial Intelligence Dr. Annabel Latham Lecture 12: Game Playing Overview Games and Search COMP19: Artificial Intelligence COMP19: Artificial Intelligence Dr. Annabel Latham Room.05 Ashton Building Department of Computer Science University of Liverpool Lecture 1: Game Playing 1 Overview Last

More information

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5

Adversarial Search. Soleymani. Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 5 Outline Game

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game

Outline. Game Playing. Game Problems. Game Problems. Types of games Playing a perfect game. Playing an imperfect game Outline Game Playing ECE457 Applied Artificial Intelligence Fall 2007 Lecture #5 Types of games Playing a perfect game Minimax search Alpha-beta pruning Playing an imperfect game Real-time Imperfect information

More information

Absolute Backgammon for the ipad Manual Version 2.0 Table of Contents

Absolute Backgammon for the ipad Manual Version 2.0 Table of Contents Absolute Backgammon for the ipad Manual Version 2.0 Table of Contents Game Design Philosophy 2 Game Layout 2 How to Play a Game 3 How to get useful information 4 Preferences/Settings 5 Main menu 6 Actions

More information

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games?

Contents. Foundations of Artificial Intelligence. Problems. Why Board Games? Contents Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Wolfram Burgard, Bernhard Nebel, and Martin Riedmiller Albert-Ludwigs-Universität

More information

Adversarial Search (Game Playing)

Adversarial Search (Game Playing) Artificial Intelligence Adversarial Search (Game Playing) Chapter 5 Adapted from materials by Tim Finin, Marie desjardins, and Charles R. Dyer Outline Game playing State of the art and resources Framework

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

MITOCW Project: Backgammon tutor MIT Multicore Programming Primer, IAP 2007

MITOCW Project: Backgammon tutor MIT Multicore Programming Primer, IAP 2007 MITOCW Project: Backgammon tutor MIT 6.189 Multicore Programming Primer, IAP 2007 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue

More information

Ar#ficial)Intelligence!!

Ar#ficial)Intelligence!! Introduc*on! Ar#ficial)Intelligence!! Roman Barták Department of Theoretical Computer Science and Mathematical Logic So far we assumed a single-agent environment, but what if there are more agents and

More information

On the Design and Training of Bots to Play Backgammon Variants

On the Design and Training of Bots to Play Backgammon Variants On the Design and Training of Bots to Play Backgammon Variants Nikolaos Papahristou, Ioannis Refanidis To cite this version: Nikolaos Papahristou, Ioannis Refanidis. On the Design and Training of Bots

More information

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1

Lecture 14. Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1 Lecture 14 Questions? Friday, February 10 CS 430 Artificial Intelligence - Lecture 14 1 Outline Chapter 5 - Adversarial Search Alpha-Beta Pruning Imperfect Real-Time Decisions Stochastic Games Friday,

More information

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46. Foundations of Artificial Intelligence May 30, 2016 46. AlphaGo and Outlook Foundations of Artificial Intelligence 46. AlphaGo and Outlook Thomas Keller Universität Basel May 30, 2016 46.1 Introduction

More information

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Google DeepMind s AlphaGo vs. world Go champion Lee Sedol Review of Nature paper: Mastering the game of Go with Deep Neural Networks & Tree Search Tapani Raiko Thanks to Antti Tarvainen for some slides

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence selman@cs.cornell.edu Module: Adversarial Search R&N: Chapter 5 1 Outline Adversarial Search Optimal decisions Minimax α-β pruning Case study: Deep Blue

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Joschka Boedecker and Wolfram Burgard and Bernhard Nebel Albert-Ludwigs-Universität

More information

Programming Project 1: Pacman (Due )

Programming Project 1: Pacman (Due ) Programming Project 1: Pacman (Due 8.2.18) Registration to the exams 521495A: Artificial Intelligence Adversarial Search (Min-Max) Lectured by Abdenour Hadid Adjunct Professor, CMVS, University of Oulu

More information

CSC321 Lecture 23: Go

CSC321 Lecture 23: Go CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 21 Final Exam Friday, April 20, 9am-noon Last names A Y: Clara Benson Building (BN) 2N Last names Z: Clara Benson Building (BN)

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 Introduction So far we have only been concerned with a single agent Today, we introduce an adversary! 2 Outline Games Minimax search

More information

Presentation Overview. Bootstrapping from Game Tree Search. Game Tree Search. Heuristic Evaluation Function

Presentation Overview. Bootstrapping from Game Tree Search. Game Tree Search. Heuristic Evaluation Function Presentation Bootstrapping from Joel Veness David Silver Will Uther Alan Blair University of New South Wales NICTA University of Alberta A new algorithm will be presented for learning heuristic evaluation

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with Chance, State of the Art Joschka Boedecker and Wolfram Burgard and Frank Hutter and Bernhard Nebel Albert-Ludwigs-Universität

More information

Playing Othello Using Monte Carlo

Playing Othello Using Monte Carlo June 22, 2007 Abstract This paper deals with the construction of an AI player to play the game Othello. A lot of techniques are already known to let AI players play the game Othello. Some of these techniques

More information

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions Slides by Svetlana Lazebnik, 9/2016 Modified by Mark Hasegawa Johnson, 9/2017 Types of game environments Perfect

More information

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009

By David Anderson SZTAKI (Budapest, Hungary) WPI D2009 By David Anderson SZTAKI (Budapest, Hungary) WPI D2009 1997, Deep Blue won against Kasparov Average workstation can defeat best Chess players Computer Chess no longer interesting Go is much harder for

More information

Experiments on Alternatives to Minimax

Experiments on Alternatives to Minimax Experiments on Alternatives to Minimax Dana Nau University of Maryland Paul Purdom Indiana University April 23, 1993 Chun-Hung Tzeng Ball State University Abstract In the field of Artificial Intelligence,

More information

Game Playing. Philipp Koehn. 29 September 2015

Game Playing. Philipp Koehn. 29 September 2015 Game Playing Philipp Koehn 29 September 2015 Outline 1 Games Perfect play minimax decisions α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information 2 games

More information

Game Playing AI Class 8 Ch , 5.4.1, 5.5

Game Playing AI Class 8 Ch , 5.4.1, 5.5 Game Playing AI Class Ch. 5.-5., 5.4., 5.5 Bookkeeping HW Due 0/, :59pm Remaining CSP questions? Cynthia Matuszek CMSC 6 Based on slides by Marie desjardin, Francisco Iacobelli Today s Class Clear criteria

More information

Game Playing: Adversarial Search. Chapter 5

Game Playing: Adversarial Search. Chapter 5 Game Playing: Adversarial Search Chapter 5 Outline Games Perfect play minimax search α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Games vs. Search

More information

Adversarial Search and Game Playing

Adversarial Search and Game Playing Games Adversarial Search and Game Playing Russell and Norvig, 3 rd edition, Ch. 5 Games: multi-agent environment q What do other agents do and how do they affect our success? q Cooperative vs. competitive

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search Instructor: Stuart Russell University of California, Berkeley Game Playing State-of-the-Art Checkers: 1950: First computer player. 1959: Samuel s self-taught

More information

COMP219: Artificial Intelligence. Lecture 13: Game Playing

COMP219: Artificial Intelligence. Lecture 13: Game Playing CMP219: Artificial Intelligence Lecture 13: Game Playing 1 verview Last time Search with partial/no observations Belief states Incremental belief state search Determinism vs non-determinism Today We will

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,

More information

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I Adversarial Search and Game- Playing C H A P T E R 6 C M P T 3 1 0 : S P R I N G 2 0 1 1 H A S S A N K H O S R A V I Adversarial Search Examine the problems that arise when we try to plan ahead in a world

More information

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing

Today. Types of Game. Games and Search 1/18/2010. COMP210: Artificial Intelligence. Lecture 10. Game playing COMP10: Artificial Intelligence Lecture 10. Game playing Trevor Bench-Capon Room 15, Ashton Building Today We will look at how search can be applied to playing games Types of Games Perfect play minimax

More information

School of EECS Washington State University. Artificial Intelligence

School of EECS Washington State University. Artificial Intelligence School of EECS Washington State University Artificial Intelligence 1 } Classic AI challenge Easy to represent Difficult to solve } Zero-sum games Total final reward to all players is constant } Perfect

More information

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997)

How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997) How AI Won at Go and So What? Garry Kasparov vs. Deep Blue (1997) Alan Fern School of Electrical Engineering and Computer Science Oregon State University Deep Mind s vs. Lee Sedol (2016) Watson vs. Ken

More information

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence Adversarial Search CS 486/686: Introduction to Artificial Intelligence 1 AccessAbility Services Volunteer Notetaker Required Interested? Complete an online application using your WATIAM: https://york.accessiblelearning.com/uwaterloo/

More information

Intuition Mini-Max 2

Intuition Mini-Max 2 Games Today Saying Deep Blue doesn t really think about chess is like saying an airplane doesn t really fly because it doesn t flap its wings. Drew McDermott I could feel I could smell a new kind of intelligence

More information

CS 331: Artificial Intelligence Adversarial Search II. Outline

CS 331: Artificial Intelligence Adversarial Search II. Outline CS 331: Artificial Intelligence Adversarial Search II 1 Outline 1. Evaluation Functions 2. State-of-the-art game playing programs 3. 2 player zero-sum finite stochastic games of perfect information 2 1

More information

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu DeepStack: Expert-Level AI in Heads-Up No-Limit Poker Surya Prakash Chembrolu AI and Games AlphaGo Go Watson Jeopardy! DeepBlue -Chess Chinook -Checkers TD-Gammon -Backgammon Perfect Information Games

More information

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie!

Games CSE 473. Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie! Games CSE 473 Kasparov Vs. Deep Junior August 2, 2003 Match ends in a 3 / 3 tie! Games in AI In AI, games usually refers to deteristic, turntaking, two-player, zero-sum games of perfect information Deteristic:

More information

A Quoridor-playing Agent

A Quoridor-playing Agent A Quoridor-playing Agent P.J.C. Mertens June 21, 2006 Abstract This paper deals with the construction of a Quoridor-playing software agent. Because Quoridor is a rather new game, research about the game

More information

Game-Playing & Adversarial Search

Game-Playing & Adversarial Search Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,

More information

Plakoto. A Backgammon Board Game Variant Introduction, Rules and Basic Strategy. (by J.Mamoun - This primer is copyright-free, in the public domain)

Plakoto. A Backgammon Board Game Variant Introduction, Rules and Basic Strategy. (by J.Mamoun - This primer is copyright-free, in the public domain) Plakoto A Backgammon Board Game Variant Introduction, Rules and Basic Strategy (by J.Mamoun - This primer is copyright-free, in the public domain) Introduction: Plakoto is a variation of the game of backgammon.

More information

CS 771 Artificial Intelligence. Adversarial Search

CS 771 Artificial Intelligence. Adversarial Search CS 771 Artificial Intelligence Adversarial Search Typical assumptions Two agents whose actions alternate Utility values for each agent are the opposite of the other This creates the adversarial situation

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Adversarial Search Vibhav Gogate The University of Texas at Dallas Some material courtesy of Rina Dechter, Alex Ihler and Stuart Russell, Luke Zettlemoyer, Dan Weld Adversarial

More information

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search CSE 473: Artificial Intelligence Fall 2017 Adversarial Search Mini, pruning, Expecti Dieter Fox Based on slides adapted Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Dan Weld, Stuart Russell or Andrew Moore

More information

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1

Last update: March 9, Game playing. CMSC 421, Chapter 6. CMSC 421, Chapter 6 1 Last update: March 9, 2010 Game playing CMSC 421, Chapter 6 CMSC 421, Chapter 6 1 Finite perfect-information zero-sum games Finite: finitely many agents, actions, states Perfect information: every agent

More information

Pengju

Pengju Introduction to AI Chapter05 Adversarial Search: Game Playing Pengju Ren@IAIR Outline Types of Games Formulation of games Perfect-Information Games Minimax and Negamax search α-β Pruning Pruning more Imperfect

More information

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws The Role of Opponent Skill Level in Automated Game Learning Ying Ge and Michael Hash Advisor: Dr. Mark Burge Armstrong Atlantic State University Savannah, Geogia USA 31419-1997 geying@drake.armstrong.edu

More information

Prepared by Vaishnavi Moorthy Asst Prof- Dept of Cse

Prepared by Vaishnavi Moorthy Asst Prof- Dept of Cse UNIT II-REPRESENTATION OF KNOWLEDGE (9 hours) Game playing - Knowledge representation, Knowledge representation using Predicate logic, Introduction tounit-2 predicate calculus, Resolution, Use of predicate

More information

Adversarial Search Aka Games

Adversarial Search Aka Games Adversarial Search Aka Games Chapter 5 Some material adopted from notes by Charles R. Dyer, U of Wisconsin-Madison Overview Game playing State of the art and resources Framework Game trees Minimax Alpha-beta

More information

Artificial Intelligence Search III

Artificial Intelligence Search III Artificial Intelligence Search III Lecture 5 Content: Search III Quick Review on Lecture 4 Why Study Games? Game Playing as Search Special Characteristics of Game Playing Search Ingredients of 2-Person

More information

Computing Science (CMPUT) 496

Computing Science (CMPUT) 496 Computing Science (CMPUT) 496 Search, Knowledge, and Simulations Martin Müller Department of Computing Science University of Alberta mmueller@ualberta.ca Winter 2017 Part IV Knowledge 496 Today - Mar 9

More information

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA

Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA Game-playing AIs: Games and Adversarial Search FINAL SET (w/ pruning study examples) AIMA 5.1-5.2 Games: Outline of Unit Part I: Games as Search Motivation Game-playing AI successes Game Trees Evaluation

More information

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial

More information

CS 188: Artificial Intelligence Spring Game Playing in Practice

CS 188: Artificial Intelligence Spring Game Playing in Practice CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein UC Berkeley Game Playing in Practice Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994.

More information

Game Design Verification using Reinforcement Learning

Game Design Verification using Reinforcement Learning Game Design Verification using Reinforcement Learning Eirini Ntoutsi Dimitris Kalles AHEAD Relationship Mediators S.A., 65 Othonos-Amalias St, 262 21 Patras, Greece and Department of Computer Engineering

More information

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar

Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello/Reversi using Game Theory techniques Parth Parekh Urjit Singh Bhatia Kushal Sukthankar Othello Rules Two Players (Black and White) 8x8 board Black plays first Every move should Flip over at least

More information

Automated Suicide: An Antichess Engine

Automated Suicide: An Antichess Engine Automated Suicide: An Antichess Engine Jim Andress and Prasanna Ramakrishnan 1 Introduction Antichess (also known as Suicide Chess or Loser s Chess) is a popular variant of chess where the objective of

More information

Teaching a Neural Network to Play Konane

Teaching a Neural Network to Play Konane Teaching a Neural Network to Play Konane Darby Thompson Spring 5 Abstract A common approach to game playing in Artificial Intelligence involves the use of the Minimax algorithm and a static evaluation

More information

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here:

Adversarial Search. Human-aware Robotics. 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: Slides for this lecture are here: Adversarial Search 2018/01/25 Chapter 5 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/adversarial.pdf Slides are largely based

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 7: Minimax and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein 1 Announcements W1 out and due Monday 4:59pm P2

More information

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters

Announcements. CS 188: Artificial Intelligence Spring Game Playing State-of-the-Art. Overview. Game Playing. GamesCrafters CS 188: Artificial Intelligence Spring 2011 Announcements W1 out and due Monday 4:59pm P2 out and due next week Friday 4:59pm Lecture 7: Mini and Alpha-Beta Search 2/9/2011 Pieter Abbeel UC Berkeley Many

More information

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5

Adversarial Search and Game Playing. Russell and Norvig: Chapter 5 Adversarial Search and Game Playing Russell and Norvig: Chapter 5 Typical case 2-person game Players alternate moves Zero-sum: one player s loss is the other s gain Perfect information: both players have

More information

Adversarial Search. CMPSCI 383 September 29, 2011

Adversarial Search. CMPSCI 383 September 29, 2011 Adversarial Search CMPSCI 383 September 29, 2011 1 Why are games interesting to AI? Simple to represent and reason about Must consider the moves of an adversary Time constraints Russell & Norvig say: Games,

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory AI Challenge One 140 Challenge 1 grades 120 100 80 60 AI Challenge One Transform to graph Explore the

More information

CMSC 671 Project Report- Google AI Challenge: Planet Wars

CMSC 671 Project Report- Google AI Challenge: Planet Wars 1. Introduction Purpose The purpose of the project is to apply relevant AI techniques learned during the course with a view to develop an intelligent game playing bot for the game of Planet Wars. Planet

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan ICSI and UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell or

More information

game tree complete all possible moves

game tree complete all possible moves Game Trees Game Tree A game tree is a tree the nodes of which are positions in a game and edges are moves. The complete game tree for a game is the game tree starting at the initial position and containing

More information

Adversarial Search: Game Playing. Reading: Chapter

Adversarial Search: Game Playing. Reading: Chapter Adversarial Search: Game Playing Reading: Chapter 6.5-6.8 1 Games and AI Easy to represent, abstract, precise rules One of the first tasks undertaken by AI (since 1950) Better than humans in Othello and

More information

CS-E4800 Artificial Intelligence

CS-E4800 Artificial Intelligence CS-E4800 Artificial Intelligence Jussi Rintanen Department of Computer Science Aalto University March 9, 2017 Difficulties in Rational Collective Behavior Individual utility in conflict with collective

More information

Theory and Practice of Artificial Intelligence

Theory and Practice of Artificial Intelligence Theory and Practice of Artificial Intelligence Games Daniel Polani School of Computer Science University of Hertfordshire March 9, 2017 All rights reserved. Permission is granted to copy and distribute

More information

Humanization of Computational Learning in Strategy Games

Humanization of Computational Learning in Strategy Games 1 Humanization of Computational Learning in Strategy Games By Benjamin S. Greenberg S.B., C.S. M.I.T., 2015 Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search 1 By the end, you will know Why we use Monte Carlo Search Trees The pros and cons of MCTS How it is applied to Super Mario Brothers and Alpha Go 2 Outline I. Pre-MCTS Algorithms

More information

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1 Announcements Homework 1 Due tonight at 11:59pm Project 1 Electronic HW1 Written HW1 Due Friday 2/8 at 4:00pm CS 188: Artificial Intelligence Adversarial Search and Game Trees Instructors: Sergey Levine

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero

TTIC 31230, Fundamentals of Deep Learning David McAllester, April AlphaZero TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 AlphaZero 1 AlphaGo Fan (October 2015) AlphaGo Defeats Fan Hui, European Go Champion. 2 AlphaGo Lee (March 2016) 3 AlphaGo Zero vs.

More information

Game playing. Chapter 5. Chapter 5 1

Game playing. Chapter 5. Chapter 5 1 Game playing Chapter 5 Chapter 5 1 Outline Games Perfect play minimax decisions α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Chapter 5 2 Types of

More information

Artificial Intelligence Adversarial Search

Artificial Intelligence Adversarial Search Artificial Intelligence Adversarial Search Adversarial Search Adversarial search problems games They occur in multiagent competitive environments There is an opponent we can t control planning again us!

More information

U.S. TOURNAMENT BACKGAMMON RULES* (Honest, Fair Play And Sportsmanship Will Take Precedence Over Any Rule - Directors Discretion)

U.S. TOURNAMENT BACKGAMMON RULES* (Honest, Fair Play And Sportsmanship Will Take Precedence Over Any Rule - Directors Discretion) U.S. TOURNAMENT BACKGAMMON RULES* (Honest, Fair Play And Sportsmanship Will Take Precedence Over Any Rule - Directors Discretion) 1.0 PROPRIETIES 1.1 TERMS. TD-Tournament Director, TS-Tournament Staff

More information

1 Introduction. w k x k (1.1)

1 Introduction. w k x k (1.1) Neural Smithing 1 Introduction Artificial neural networks are nonlinear mapping systems whose structure is loosely based on principles observed in the nervous systems of humans and animals. The major

More information

Backgammon Basics And How To Play

Backgammon Basics And How To Play Backgammon Basics And How To Play Backgammon is a game for two players, played on a board consisting of twenty-four narrow triangles called points. The triangles alternate in color and are grouped into

More information

Local Search. Hill Climbing. Hill Climbing Diagram. Simulated Annealing. Simulated Annealing. Introduction to Artificial Intelligence

Local Search. Hill Climbing. Hill Climbing Diagram. Simulated Annealing. Simulated Annealing. Introduction to Artificial Intelligence Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 6: Adversarial Search Local Search Queue-based algorithms keep fallback options (backtracking) Local search: improve what you have

More information

AN ABSTRACT OF THE THESIS OF

AN ABSTRACT OF THE THESIS OF AN ABSTRACT OF THE THESIS OF Paul Lewis for the degree of Master of Science in Computer Science presented on June 1, 2010. Title: Ensemble Monte-Carlo Planning: An Empirical Study Abstract approved: Alan

More information