MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws

Size: px

Start display at page:

Download "MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws"

Kimberly Horton
5 years ago
Views:

1 The Role of Opponent Skill Level in Automated Game Learning Ying Ge and Michael Hash Advisor: Dr. Mark Burge Armstrong Atlantic State University Savannah, Geogia USA Abstract- This paper explores the role of opponent skill level in automated game learning. The game of checkers was chosen as our experiment model. With the standard linear polynomial approach to evaluate the board state and a Least Mean Square (LMS) rule to update the weights, we trained a naive checkers program to play by having it play against: a naive random computer player, against another naive player who learned to play the game with exactly the same algorithm, and against an expert computer player. The results show that the program learned effectively using the above algorithm, but an over-fitting problem occurred approximately after 2 to 3 games in all three scenarios. The use of a smaller update rate significantly delayed the overfitting problem. We also observed that playing against an expert actually slowed down the learning process for a naive player. This is somewhat analogous to human learning: a naive chess player will learn little by playing the master Kasparov. Our finding was that playing against a random player with a small update rate is a very effective method for a beginner to learn to play the game in our current model. In addition, the learning process can be improved by incorporating the MINMAX look-ahead algorithm. 1 Introduction This experiment was initially conducted to help understand a model of machine learning based on Mitchell's design [1]. We modified a random checker player program written in C at the University of Massachusetts. Then we applied the linear polynomial approach to evaluate the board state after each move and the Least Mean Square (LMS) rule to update the weights of that polynomial. Our modified learning players played against each other on a UNIX machine for several days and nights. Aplayer from the top five was chosen to be our expert player for this paper. 1.1 Learning Model In order to teach the computer to pick the best next move at a certain board state, we needed a representation of the current board state and an evaluation of the future board state after the move. There are some interesting board features with respect to the relative piece positions. For example, a board position can be expressed in terms of the first and higher moments of the white and black pieces separately about two orthogonal axes on the board. Our program uses a simple and common representation. The board was represented by the quantity of six features: x 1 x 2 x 3 x 4 x 5 x 6 : The number of my pawns : The number of opponent pawns : The number of my kings : The number of opponent kings : The number of pieces threatened by the opponent : The numberofopponent pieces threatened by me The evaluation of the board was represented by the linear combination of the above six features: V (b)=w+w1 x1+w2 x2+w3 x3+w4 x4+w5 x5+w6 x6;

2 where w i are the weights to be learned. We arbitrarily defined the final board state's evaluation value to be V(b) = 1 (won the game), V(b) = -1 (lost the game), and V(b) = (tie). Our computer player would pick the next move that led to the biggest board value V(b) since V(b) = 1 was defined as winning. For each game played, we recorded all of the board states until the end and updated the weights using the difference of the real board value and the trained board value derived from V (b) =w + w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 x 4 + w 5 x 5 + w 6 x 6 : We used the LMS weight update rule: w i = w i + (V t rain(b) V (b)) x i : 1.2 Experiment Approaches Throughout our experiment, we kept human learning in mind. Thinking of human situations in game playing, we let the naive computer program learn to play checkers by playing against a random player who randomly chose the next legal move, by playing against a learning pl ayer who had the same algorithm as ours, and finally by playing against an expert player who had won most of the games with many different types of players on a network for several days and nights. The results of how the machine learned show both differences and similarities to human players. 2 Learning Results The results provided in this document are primarily in graph form. The learning weights for each of the 6 features are presented: my pawn, opponent pawn, my king, opponent king, my threatened, and opponent threatened. Also, to observe the performance, the numbers of winning, losing, and tied games were graphed. For each of the following three learning experiences, 1, 5, and 1 games were tested. Only the pertinent results are presented. 2.1 Playing Against A Random Player The random player picked the next move randomly as long as it was legal. It had no sense of which boards were more likely to lead to a win. Results show that the machine learned very quickly and started to defeat the opponent after only a few games. The weights were changing in the correct direction. My Pawns and My Kings were weighted positively while Opponent Pawns and Opponent Kings were weighted negatively. My Threatened was weighted negatively. This means that the program had learned the piece advantages on the board, i.e. My Kings, My Pawns were desirable and threatened pieces were not. However, there was an over-fitting phenomenon observed: the key weights, My Pawns and Opponent Pawns began to head in the wrong direction after approximately 2-3 games and then fluctuated all the way till the end. From the game results graph, we can see that although the total number of games won is bigger than the random opponent's, the performance started decaying after the weights were over-fitted Figure 1: All six weights adjusting for 1 games against random player Figure 2: The distribution of games won for 1 against random player Figure 3: All six weights adjusting for 5 games against random player.

3 Figure 4: The distribution of games won for 5 against random player Figure 6: The distribution of games won for 5 against learning player. 2.2 Playing Against A Learning Player In the second experiment, we cloned a new learning player with exactly the same learning algorithm and data representation scheme. This time, the overall games won were not fixed to either of the player, which was consistent with the prediction derived from the human situation. Two competitors with the same strategy and strengths impaired the decision to choose an absolute winner? The over-fitting problem still existed. Weights started to fluctuate after about 25 games. In addition, the weights became unreasonable. 2.3 Playing Against An Expert Player The idea of this experiment came from the fact that human beginners learn very little, when playing experts, because they get beaten so quickly. We therefore conducted this experiment to see how a machine learned to play against an expert. Not surprisingly, the outcome was similar to the real world situation. Our player was beaten badly all the way through. Especially at the beginning stage, the weights hesitated to grow fast in the desired direction. It indicates that our player was not certain which board states were bad states. This was probably caused by early losses. From the weights graph, we can see the player later learned to play better, but overall performance never exceeded the expert's Figure 5: All six weights adjusting for 5 games against learning player Figure 7: All six weights adjusting for 1 games against expert player.

4 Figure 8: The distribution of games won for 1 against expert player Figure 1: The distribution of games won for 1 against random player with small update rate. 2.4 Fixing The Over-Fitting Problem Over-fitting is not an uncommon problem in artificial intelligence models. For example, neural networks and decision trees all have this problem of over-fitting the training data [1]. There are several ways to solve the problem. For instance, a decaying method can be used to reduce the update rate along the number of games played. What we did in this experiment was simply to use a much smaller update rate all the way through (We changed the update rate from.11 to.111). Although it delayed the over-fitting problem, it caused longer learning time since every improvement was very tiny and cautious. The following graphs against a random player show the steady improvement ofthelearn- ing weights. There was no over-fitting problem observed even after playing 1 games. 3 Improving The Algorithm From the above, we learned that a small update rate can improve the learning by avoiding over-fitting. In addition, the algorithm can be further improved by incorporating MIN-MAX look-ahead method. When our computer player decides to pick the next best legal move, instead of calculating only the immediate resulting board values and picking the best one, now he will look several moves ahead and calculate the resulting board values several steps ahead. Theoretically, he could see far ahead until the game is over (win, lose, or tie) and pick the winning path among this humongous decision tree Figure 9: All six weights adjusting for 1 games against random player with small update rate. Figure 11: A tree of moves which might be investigated during the look-ahead procedure. The actual branchings are much more numerous than those shown, and the tree is apt to extend to as many as2levels.

5 Figure 12: Simplified diagram showing how the evaluations are backed up through the "tree" of possible moves to arrive at the best next move. The evaluation starts at (3). When we, humans play games, we try to look ahead as many moves as possible. The average person can look ahead about 2 moves, Kasparov can probably look ahead 4 or 5 moves. How far can a machine look ahead? It depends on the complexity of different games. The computational space and time are limited in a real game. Alpha-Beta Pruning can reduce the complexity from, O(b d )too(b 1 2 ) where b is the average branches of each node and d is the depth of the tree [2]. 4 Conclusion The above experiments revealed some familiar issues in machine learning. Over-fitting is a common problem. Careful choice of an updating rate can solve or ease the problem. Human learning usually does not suffer from the over-fitting that is common in machine learning. However, machine players and human players show some similarities in the way they learn to play the game. For instance, playing against experts does not help a beginner to learn effectively. Because of the early defeats, beginners seem to hesitate to take any dramatic steps. References [1] Mitchell, Tom M. Machine Learning [2] Norving, Peter and Russell, Stuart. Artificial Intelligence: A Modern Approach

ARTIFICIAL INTELLIGENCE (CS 370D)

Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,