A Study of Machine Learning Methods using the Game of Fox and Geese

Size: px

Start display at page:

Download "A Study of Machine Learning Methods using the Game of Fox and Geese"

Aubrey Dalton
6 years ago
Views:

1 A Study of Machine Learning Methods using the Game of Fox and Geese Kenneth J. Chisholm & Donald Fleming School of Computing, Napier University, 10 Colinton Road, Edinburgh EH10 5DT. Scotland, U.K. Abstract: The game Fox and Geese is solved using retrograde analysis. A neural network trained using a co-evolutionary genetic algorithm with the help of the expert knowledge database was found to be a very capable Fox and Geese player after training, and quickly learned to beat training opponents. Key-Words: Game theory, rote-learning, neural networks, genetic algorithms, co-evolution. 1 Introduction 1.1 The Game of Fox and Geese Fox and Geese is a derivative of draughts (checkers) and is played on a standard 8 by 8 draughts or chess board. The black player has four pieces (the Geese) which are initially placed on the four dark squares at the top of the board. The white player has a single piece (the Fox), which is normally placed either at the bottom of the board, on the second dark square from the left (figure 1) or on any free dark square on the board, chosen by the white player. Fig. 1. The standard starting position for Fox and Geese Black pieces (Geese) can move one square, but can only move down the board, so their options are limited to the 2 squares diagonally ahead. White s Fox piece can move diagonally one square in any direction. There is no taking or jumping in Fox and Geese, so an occupied square is blocked to both players. The object of Fox and Geese, for the Fox, is to break past the line of Geese and reach one of the four dark squares at the top of the board, where the Geese are initially placed. The aim for the Geese is to hem the Fox in so that it can no longer make a legal move (There are no drawn games in Fox and Geese. If the Fox fails to break through the line of Geese, it will eventually be pinned to the bottom of the board, and will lose the game (see Perham, 1998 or Berlekamp, Conway and Guy, 1982 for full details [2][9]). 2 A Simple Rote-Learning Player A rote-learning algorithm, closely based on the technique used by Samuel [11], was used to improve the play of a basic AI Fox and Geese program. Previously encountered board positions are recalled from a database of moves in order to increase the look-ahead ability of a mini-max search tree. Samuel s rote-learning method can quite easily be adapted for the game Fox and Geese. As the rules of Fox and Geese are relatively similar to draughts, and the playing board is the same, the AI mini-max algorithm is easily adapted to this new implementation. The only part of the AI programming that needs to be really specific to Fox and Geese is the design of the board evaluation function [16]. 2.1 A Board Evaluation Function for Fox and Geese For this implementation a range of values between -100 and 100 was chosen, thus allowing a score to be efficiently stored in one byte of memory. A score of -100 represents an overwhelming advantage for the Geese, and a score of 100 means the same for the Fox. As the AI algorithm would eventually be supplanted to some extent by the accumulated knowledge from previous games, there was no need to create a highly complex board evaluation function which can perform as well as a master player. The design of the evaluation function reflects these simple needs, and values are based on only a few features of the board. First, the board evaluation function checks for a winning position and returns either 100 or 100 if it finds one. Otherwise, the score is calculated by setting an initial value of -50 and then adding two points for every dark square reachable by the white piece. To this total a bonus of two points is added for every row the Fox has advanced from the bottom of the board. The other important feature in the evaluation function, that of increasing the score as the Fox advances up the board, is included to encourage the Fox to move forward as much as possible.

2 A simple mini-max AI player using this board evaluation function with a search limit of 6 ply can play a fairly competent game of Fox and Geese. The algorithm does, however, perform better for the Geese than it does for the Fox. When the AI program takes both players roles, the Geese usually win, although at lower search depths (below 6 ply) the advantage the Geese have is diminished considerably, and the Fox can sometimes win. The Geese can also defeat most human opponents, whereas a fairly competent human player can easily beat an AI Fox. The temptation to tinker further with the evaluation function (and improve the Fox s performance) has been resisted as it is felt to be adequate for the machine learning experiments to come. 3 A Complete Solution of Fox and Geese A simple form of retrograde analysis was used to construct a database of valid boards and moves for Fox and Geese [7][13]. First, in a relatively straightforward manner, all possible boards were simulated and stored. These moves were then edited down to two smaller databases consisting of boards with legal white moves and legal black moves. 3.1 Assigning Values to the Expert Database Having now collected and ordered all playing positions in Fox and Geese, it only remained to assign a gametheoretic value to each position. A simple implementation, similar to the retrograde analysis that employed by Schaeffer et al. for the Chinook checkers program [7][13], was used for this purpose. In this case however, a forward moving analysis is found to be most suitable and each position is resolved by calculating its successors. Initially a first pass through the databases resolves all known winning positions (those where the Fox is either trapped, or has achieved the top row of the board). Subsequent iterations through the database (from each legal board position) simulate all succeeding moves from any positions which as yet have unknown values. From these studies it was thus determined that the value for the game Fox and Geese is a win for the Geese and the game is thus strongly solved, as defined by Allis [1]. This significant result is believed by the authors to be the first such complete solution of the game Fox and Geese. If the difference between the network output and the database value falls outside of a tolerance of ±0.1, the networks weights are updated by back propagation using the network output and database value as actual and target values. The learning rate (α) is set to 0.5, and no momentum or other optimisation measures are used. 4.1 F&G-NN/BP Experiments The neural network was trained by playing 200 games against the simple mini-max AI opponent using a lookahead of 6 ply. For each game the winner is recorded, as is the proportion of plays made by the network player which are valued as an eventual win in the perfect moves database. Also recorded are the number of times back propagation is used to adjust the network weights during each game and the average error for each game. The error for each network activation is expressed as the difference between the target and actual network output (a value between 1 and 0). The average error is simply the total network error for one game divided by the total number of network activations. This experiment, and all other experiments, were conducted 10 times in order to reduce the potential for misleading results caused by the random nature of network weights initialisation. The roles of the AI player and the neural network were then switched and the experiments were performed a further 10 times. 4.2 F&G-NN/BP Results Results show that the neural network player was able to quickly supplant the simple mini-max AI player. When playing as both the Fox and the Geese, the neural network is able to beat the simple AI player within a few games of training commencing. After the first ten games the neural network was able to win more than half as either player. The neural network plays better as the Fox. The average number of wins (out of ten) starts above eight, and varies throughout training between eight and ten. By the end of training the neural network playing as the fox wins almost every game against the opposing AI player. 4 A Neural Network Player: F&G-NN/BP An artificial neural network was trained using back propagation to perform the role of a board evaluation function in a standard mini-max search algorithm. The database of perfect moves provided expert knowledge for training. Each time the neural network is called upon to give a value to a board at the leaves of a search tree, the raw output of the network (between 0 and 1) is compared to the value found in the perfect moves database for the relevant board. The result from the database is allotted a value of 1 for Fox wins and 0 for Geese wins, so that this result can be directly compared with the network s output. Fig. 2. Back propagation. Average no. of wins per 10 games for Fox. Number of wins (out of 10) for every 10 games of the 200 game training run. (This graph represents the average from the 10 runs.) The neural network playing the Geese performs less strongly, only winning an average of around six of its first ten games. After the initial ten games performance

3 fluctuates, but the last sixty games show a steady improvement towards an average of more than eight wins out of ten. Fig. 3. Back propagation. Average no. of wins per 10 games for Geese. (This graph represents the average from 10 training runs.) The slightly erratic nature of the learning process expressed in terms of game wins and losses is in contrast to the results gained from testing the average network output error for each game. Here, results show a smooth decrease towards an extremely low error rate. Fig. 4. Back propagation. Average percentage error for Fox. Percentage error is calculated by summing the errors from the network (target output actual output) and dividing the result by the total number of network activations. (A similar graph is obtained for the Geese.) This disparity is probably caused by the neural network s non-static evaluation of board positions at different points in a mini-max search tree. Although the accuracy of the network outputs is steadily increasing, the adjustment of the network during mini-max searches leads to erratic search results. Once the fluctuation of the network weights has reduced to a background level, the gameplaying performance of the neural network player stabilises towards more consistent results. 5 Machine Learning using GAs There recently been a number of board game implementations using genetic algorithms in some way [6]. Chisholm and Bradbeer have used a genetic algorithm to control and optimise the board evaluation function of a draughts program [5]. The algorithm uses crossover and selection to develop optimal weights for board evaluation. Each weight represents the importance assigned to a feature on the board, such as number of pieces left, number of kings on the board [5]. Some researchers such as Carling have used genetic algorithms to train neural networks to play board games[3]. Richards, Moriarty, McQuesten and Mükkulainen have experimented with this approach [10]. Rather than evolving whole networks by adapting the weights between nodes, Richards et al. have developed a system where promising neurones are bred and combined in new networks every generation. Their system, which they have called SANE, has been applied to the task of playing Go with some considerable success. An alternative solution to this problem of training neural networks is described in Chellapilla and Fogel [4]. Here, instead of using a reinforcement method such as back propagation or temporal difference learning, Chellapilla and Fogel create a set of network weights by co-evolution [4]. 6 A Neural Net/GA Player: F&G-NN/GA A genetic algorithm was used to breed a set of optimal weights for an artificial neural network. The neural network comprises a standard multi-layer feed-forward network consisting of 33 input units (32 concerning the state of each square on the board and one for who is to move), a hidden layer of 20 units and a single output node. A pool of ten randomly created network players competes against each other in a tournament using a co-evolutionary strategy. Poorer players are replaced with offspring bred from two successful players, as well as new randomly initialised players being introduced. 6.1 Implementation The genetic algorithm used in this implementation has the task of breeding optimal weights for the artificial neural network. The network performs the role of a board evaluation function, used by a mini-max search tree to give values to board positions encountered during a search [8]-[12]. A binary system of encoding was not used in this system. Instead the network weights themselves each form a gene of the GA chromosome. The chromosome or string for this implementation is simply the entire collection of network weights. New offspring are bred from two parent sets of weights. Crossover points are chosen randomly at intervals anywhere between one and five weights along a string. This method of crossover provides the reason that weights are grouped into sub-strings by the network nodes that they feed into, rather than the nodes they emanate from. As the output layer of the neural network consists of only a single node, strings formed by the weights between the 20 node output layer and the hidden layer are grouped into one string of 20 weights, rather than 20 strings of one weight each. During crossover, for each weight the offspring receives from a parent, there is a 0.1% chance of mutation. Mutated weights are created by randomly re-initialising the weight to a random value between 1 and +1.

4 6.2 The F&G-NN/GA Fitness Function The solution chosen here is to use the perfect moves database (see section 3) to supply the level of fitness based on the number of winning moves the players make during the course of a game. A pool of ten players is used, randomly initialised having each of the network weights set to a random value between 1 and +1. Every generation, each player plays one game against an AI mini-max opponent. The total fitness of each player is calculated by counting the number of correct moves (moves which will definitely lead to an eventual win), the player makes. The initial fitness value given to each player is simply the proportion of correct moves to total moves made expressed as a percentage. To this total is added either a winning bonus of 150 points, or for a losing player 3 points for every move made in the game. The bonus added for losing players is designed to promote long games and prevent the players learning stalling at local minima. A player who plays four moves, and only makes one (fatal) mistake scores a fitness of 75, and is unlikely to improve further if the fitness function only reflects the proportion of winning moves. The bonuses encourage longer games, and more generalised good play. The winning bonus is introduced to ensure that winning play, as the main goal of the board game problem, is rewarded above all other fitness criteria. The rules for breeding new players are as follows. The four best players are kept on for the next generation. The rest of the players are replaced, four by offspring and two by new randomly initialised players. The only difference is the pool members chosen for breeding. The existing players are replaced by offspring in reverse order to that of their fitness ranking. This is done in order to include the soon to be replaced players in the fifth, sixth and seventh position in the breeding. For each replacement, offspring are bred from two parents randomly chosen from all players above the new offspring in the fitness ranking. This allows players which do not qualify in the top four a limited opportunity to breed before they are replaced. The further up the fitness table these players rank, the more offspring they qualify as parents for, before they are replaced. The top four players are, of course, candidates for breeding all four offspring. The limited inclusion of lower ranking players in the breeding process is intended to decrease the risk of the GA settling in local minima [6]. If the breeding network weights are too similar, the resulting offspring may well be carbon copies of the original. Although mutation may eventually reintroduce diversity, mixing the pedigree of new offspring helps to increase the mix of weights in the pool, and consequently, any local minima will eventually be surpassed by a new combination of weights. 6.3 The F&G-NN/GA Experiments Each generation, a pool of ten players play one game each against an AI mini-max type opponent. After each generation, the fitness function ranks players based on their performance, and breeding takes place based on these results. Each experiment lasts 200 generations, and was repeated ten times to reduce the chance of anomalous results. A further ten runs were then performed with the GAs and the AI opponent switching roles. The number of games won by the GA was recorded, as was the percentage of winning moves played in each game. 6.4 F&G-NN/GA Results The genetic algorithm breeds players able to beat the AI player. As the Fox player, the GA is particularly strong, and is very quickly able to beat the AI player (see figure 5). Fig. 5. Neural Network/GA. Average no. of wins out of 10 for the Fox. This graph represents the number of wins out of 10 for every 10 epochs of the 200 generation training run. The results are taken from the highest ranking player in a pool of ten. (Averaged over ten runs.) It takes the GA considerably more time to breed network weights capable of defeating a Fox AI player. Two out of the ten runs did not win any games at all during the training period. On average, results show a smooth learning curve (figure 6) and a consistent improvement in performance which results in a player strong enough to win eight games out of ten. Fig. 6. Neural Network/GA. Average no. of wins out of 10 for the Geese (Results are based on the average from 10 training runs.) The results showing percentage of winning moves made per game also show a steady increase in performance throughout training (see figures 7 and 8). This is

5 unsurprising, as the percentage of winning moves is one of the main features of the GA fitness function. Nevertheless, these results give a good indication that the quality of play increases fairly steadily throughout training. Fig. 7. Neural Network/GA. Average percentage of winning moves for Fox This graph shows the percentage of moves in each game which are held in the perfect moves database as eventual winning positions. Results are an average from 10 training runs. Fig. 8. Neural Network/GA. Average percent of winning moves - Geese 7 Conclusions The three learning algorithms implemented, rote learning, neural network/bp and neural network/ga, have been able to improve their ability to play the board game Fox and Geese. Table 1 shows the total number of games won by each algorithm during training, and the results show a clear measure of success for both learning techniques. The neural network/ga shows the best aptitude for learning as both Fox and Geese. The rote-learning algorithm displays a poorer performance, especially for those games where it played as the Geese. TABLE 1. Total (average) number of wins during 200 training games with each starting. (Final total out of 400) Training Method Average no of Wins (out of 400) Fox Geese Total Rote learning Neural Network/BP Neural Network/GA As may be noticed from table 1 above, there were a very high number of wins for the neural network/ga Fox. This is due to the fact that the simple AI Geese did not play well at the relatively low ply search used in these experiments (due to cpu-time constraints). Fortunately, as stated in section 4.2 and shown in figure 4, winning was not the sole fitness criteria used in the training process. The rote-learning algorithm clearly experiences problems overcoming the native deficiencies of its AI board evaluation function, and has shown a general lack of flexibility when approaching the learning task. Considering that, unlike the neural network/bp and neural network/ga, the rote-learning system has a preprogrammed board evaluation function and does not have to learn the game from scratch, learning (although clearly demonstrated during the Fox runs, at least) could be said to be incremental at best. Although some game knowledge is pre-existent in the rote-learning algorithm within the board evaluation function, this system is only one of the methods which presents unsupervised learning. The neural network/bp and the neural network/ga are helped to learn the game by having access to the contents of a perfect moves database. In this respect, the rote-learning results are more significant than they at first appear. The genetic algorithm is clearly established as an effective technique for the training of a neural network. Excellent results are achieved from training runs as both the Fox and the Geese. The GA shows results far better than rote learning, even though playing ability was learned from scratch. Perhaps detracting a little from these results though, is the supervised nature of the learning. The back propagation neural network applies itself to the problem of playing Fox and Geese with great success. Showing the most wins of all, the neural network is able to easily supplant its simple AI competitor whether playing as the Fox or the Geese. 7.1 Machine Learning Behaviours It is interesting to note that all three of the machinelearning techniques perform much better when playing as the Fox (see Table 2). This at first may seem at odds with the remarks in section 2.1, which suggest that for experienced Fox and Geese players, the Geese have the advantage, but the following discussion clarifies these results. TABLE 2. Proportion of the total wins achieved as Fox and Geese. Training Method % of Total Wins Fox Geese Rote learning 82.1% 17.9 % Neural Network/BP 54.1% 45.9% Neural Network/GA 62.5% 37.5% Table 2 above shows the spread of all the games won by the machine learning algorithms during training runs of 200 games playing the Fox, and 200 games as the Geese. Results represent a percentage of the total number of

6 victories from the 400 games (and are averaged over 10 training runs). Although all three algorithms show ability to play the game, and can usually defeat a simple AI opponent, neither of them can really be described as expert players. During a game of Fox and Geese, the onus is on the Geese to preserve a defensive line of pieces. For a Fox, the tactics involved in playing against an expert are much more complex than those used against a less experienced player. If the Geese player is prone to occasional mistakes, it is not overly difficult to capitalise on one of these errors, and win the game. This is what appears to be happening here. The various machine learning players are competent enough to play well as the Fox, adopting the simple tactics of pushing against the line of Geese and waiting for an opportunity to slip through. Taking the more complex role of the Geese is more problematic for the learning algorithms. The task of keeping a tight formation of pieces is more complex, and each mistake can potentially lead to a quick defeat. The assertion made in section 2.1 that the simple AI player performs better as the Geese is qualified by observing that lower search depths may expose flaws in the AI player s game. Both machine-learning implementations are trained at relatively low search depths (4 ply for the neural network/ga system and 6 ply for rote learning), so the AI Geese player may well be prone to making mistakes which the opposition can capitalise on. [9] Perham, M. (Ed.) The Encyclopedia of Games. London: Aurum Press. [10] Richards, N., Moriarty, D., McQuesten, P. and Miikkulainen, R Evolving neural networks to play Go. In Proceedings of the 7th International Conference on Genetic Algorithms. East Lansing, MI. [11] Samuel, A.L Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, vol. 3, no. 3. [12] Shannan, C. E Programming a Computer for Playing Chess. Philosophical Magazine, 41(4), [13] Shaeffer, J. and Lake, R Solving the Game of Checkers. In Richard J. Nowakowski (ed.). Games of No Chance. Cambridge: Cambridge University Press. [14] Sutton, R. S Learning to Predict by the Methods of Temporal Differences. Machine Learning, 3, [15] Tesauro, G. and Sejnowski, T. J A parallel network that learns to play backgammon. Artificial Intelligence, 39, [16] Turing, A. M., Strachey, C., Bates, M. A.and Bowden, B.V Digital Computers applied to games. In Bowden, B.V. (Ed.). Faster Than Thought, London: Pitman. References [1] Allis, L.V Searching for Solutions in Games and Artificial Intelligence. Ph.D. thesis, Department of Computer Science, University of Limburg. [2] Berlekamp, E. R., Conway, J. H. and Guy, R. K Winning Ways For Your Mathematical Plays. London: Academic Press. [3] Carling, A Introducing Neural Networks. London: John Wiley & Sons. [4] Chellapilla, K. and Fogel, D. B Co-evolving checkers playing programs using only win, lose, or draw. In Proceedings of SPIE's AeroSense'99: Applications and Science of Computational Intelligence II. [5] Chisholm, K. J. and Bradbeer, P.V.G., 1997, Machine Learning Using a Genetic Algorithm to Optimise a Draughts Program Board Evaluation Function, Proceedings of IEEE International Conference on Evolutionary Computation, (ICEC'97), Indianapolis, USA, [6] Goldberg, D Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley. [7] Lake, R. Schaeffer, J. and Lu, P Solving Large Retrograde Analysis Problems Using a Network of Workstations. In H.J. van den Herik, I.S. Herschberg and J.W.H.M. Uiterwijk (eds.). Advances in Computer Chess VII, Maastricht: University of Limburg. [8] Levy, D. and Newborn, M How Computers Play Chess. New York: W. H. Freeman.

Machine Learning Using a Genetic Algorithm to Optimise a Draughts Program Board Evaluation Function

Machine Learning Using a Genetic Algorithm to Optimise a Draughts Program Board Evaluation Function Kenneth J. Chisholm and Peter V.G. Bradbeer. Department of Computer Studies, Napier University, Edinburgh,