DVONN and Game-playing Intelligent Agents

Size: px

Start display at page:

Download "DVONN and Game-playing Intelligent Agents"

Aubrey Lyons
6 years ago
Views:

1 DVONN and Game-playing Intelligent Agents Paul Kilgo CPSC 810: Introduction to Artificial Intelligence Dr. Dennis Stevenson School of Computing Clemson University Fall 2012 Abstract Artificial intelligence is a broad discipline of Computer Science dealing with creating applications which behave in a human-like way. For many, the first exposure to the idea, and often the first notion when asked about the subject, is a computer opponent for a game. For as long as computer games have existed, so has the interest in creating a challenge for the human whether it be an algorithm that generates difficult, yet feasible puzzles or an algorithm which replaces the need for a human player in a game. This paper is a case study of implementing a computer player for the strategy board game DVONN. It presents a short survey of gameplaying AIs, a description of the DVONN game, some background on the theory of the underlying search algorithms, a description of the implementation, results of the implementation, and concluding remarks. 1 Introduction John McCarthy defines artificial intelligence as the science and engineering of making intelligent machines [6], with his definition of intelligence defined in terms of human intelligence. In this respect, the task of creating a computer opponent to replace a human in a board game falls within the realm of artificial intelligence. Designing computer opponents for board games has long been an interest of computer scientists. Discrete-time board games (or turn-taking board games) in particular have simple enough semantics that they readily translate to a computer program. However, the simplicity of the game semantics does not imply the game is simple for a computer to play. A computer opponent will easily become overwhelmed in the massive state space of a board game that is at least mildly challenging for humans. Thus, many game-playing intelligent agents (IA s) employ a optimizing search technique to find a solution using the game semantics. It is impossible to search all possible states, so much of the challenge of designing these algorithms comes from deciding the best way to prune branches from the search tree. A naïve solution crawls the search tree for a fixed number of moves, called plies, into the game tree. One will soon find that this strategy is insufficient, as some moves put the player at a greater advantage over the opponent and should be explored to a sufficient depth. Here, the concept of quiescence is introduced, which allows the IA to explore the game to a greater depth for moves which may more likely lead to a win. The rest of this paper is laid out as follows. First, we present a short survey of game-playing IA s and some of the techniques which make them fast, efficient, and challenging. Next, we give the reader a quick introduction to the strategy game of DVONN. We then introduce the 1

2 minimax algorithm and formally state the intended operation of the implementation. Some brief notes on the implementation are given, and the results of an experiment performed are given along with the analysis. Finally, we give concluding remarks regarding this implementation. 2 Game-Playing IA s Perhaps the best known examples of IA s playing board games is in Chess. Deep Blue, who defeated Chess Grandmaster Gary Kasparov, was one of the more famous Chess-playing IA s. Campbell et. al. [2] present an interesting historical review of how Deep Blue worked. Many of the techniques talked about in this paper are used in Deep Blue. However, some things which Deep Blue has which this implementation does not include are hardware generation of moves, a parallel architecture, and a team of IBM engineers. Deep Blue is also able to reach many farther ply deeper than this implementation is capable of (without being terminated due to impatience of the operator). Go is another favorite game for IA s to tackle. MoGo [5] is a more recent Computer Go implementation which has bested some notable players. It makes use of some Monte Carlo techniques for its move selection, and as well exploits a parallel architecture for move speed up. There are currently a few DVONN-playing computer programs, but not much information is given regarding their implementation. A few of these programs are shipped as a part of DVONN as a computer game [1, 7, 8]. Others [3, 4] are some of the top contenders on online board gaming sites, winning as much as 75% of their games against their challengers. 3 DVONN Basics DVONN is a pure strategy game, falling into the same category of board games such as Chess, Go, and Othello. It is for two players. The basic goal of DVONN is to have captured more game tiles Figure 1: Layout of the DVONN game board. (Courtesy: Wikimedia Commons) than your opponent by the end of the game. A capture can be performed by moving a stack of pieces under your control onto another stack of pieces. A stack is under a player s control if the top piece is the player s color. However, a stack may not be moved if it is surrounded on all six sides by other stacks. DVONN is played on hexagonal grid, with the board laid out as illustrated in Figure 1. A player may move any stack under his control in the directions of northeast, northwest, east, west, southeast, or southwest. A stack must move as many spaces as there are game tiles in the stack. The only valid destinations for a moved stack is another space with another stack on it. A stack may not move to an empty space, or off of the game board. On a standard DVONN board, there are twenty-three white and black pieces, and three red pieces. The black and white pieces belong to the players. The red pieces are called control pieces. To better understand the function of these pieces, let us envision the board as a graph. Each hexagon represents a node, and a shared border between each hexagon represents an edge. After each move, a crawl of the graph is performed from each of the positions of the control pieces (if two control pieces are in one stack, they are considered one control piece). Any stacks which were not visited on the crawl are removed from the board. A DVONN game is played in three discrete phases: the placement phase, the movement phase, and the end game. In the following sec- 2

3 4 4 The Minimax Algorithm 7 1 Figure 2: The game is over because neither White nor Black can move his pieces. tions, each of the turn-taking schemes are described. 3.1 Placement Phase In this phase, the board starts out empty and the white player is given two red pieces and twentythree white pieces; the black player is given the remaining red piece and black pieces. Each player takes turns placing a tile on an empty space on the board, being obligated to place the red pieces first. This process is continued until the board is filled. 3.2 Movement Phase For the movement phase, the white player begins again. Each player takes a turn moving a legal piece into a legal spot on the board, as pertaining to definitions of a legal move presented above. The bulk of the game takes place in this phase. The general strategy is to capture your opponents pieces while maintaining your own by staying near a control tower and keeping your own pieces in a group. 3.3 End Game Nearing the end, there may be no legal moves for a player to make. In this case, that player is skipped and the next player may make a move. This process continues until neither player is able to make a legal move. A player is obligated to make a move if he is able. Figure 2 shows an example of a game which is now terminated. 5 8 Algorithm 1 Naïve Minimax Require: node, depth 0 Ensure: α is the maximum attainable score if depth 0 node.ist erminal() then return node.rank() α if node.ism axp layer() then α + child node.f irstchild() while child nil do score minimax(child, depth 1) if node.ism axp layer() then α max (α, score) else α min (α, score) child child.nextsibling() end while return α The Minimax algorithm is the basis for the algorithm that the implementation uses. The basic function of the algorithm is to minimize the possible loss that can be incurred in a decisionmaking tree. This is done by applying an heuristic goodness metric to states in our game tree. The Minimax algorithm will seek out decisions that maximize this heuristic. A pure minimax algorithm is fixed ply. Given for a DVONN game there can be many possible pieces to move, each in possibly six different directions, this gives our game tree a very high branching factor. Thus, we are interested in the various kinds of improvements we can build upon the Minimax algorithm. For a given game with branching factor b at each possible state, and a ply depth of d, the running time of the naïve minimax is O(b d ) since all possible branches are explored. 3

4 4.1 Alpha-beta Pruning Some moves in a Minimax search do not need to be completely evaluated. If at some point during our search of the child nodes in the Minimax search, if we encounter a move which is worse than one of the moves encountered in the ancestor nodes or siblings, we can cease to evaluate the remaining children because the rest will have no effect on the outcome of the Minimax algorithm. Alpha-beta pruning is highly desirable for a Minimax search since it improves the running time on average while still returning the same solution. The only necessary changes is the addition of an α and β which are the best scores for the maximizing and minimizing players respectively. In the worst case, no nodes are pruned and we are left with a running time which is exactly the same as the simple Minimax algorithm, O(b d ). In the best case, all of the first player s moves must be explored, but only one of the second player s moves must be examined to refute all the rest with this pattern repeating into the game tree, with a final running time of O(b d 2 ). 4.2 Quiescence A fixed-ply search strategy does not work as well as one would like. It makes the assumption that all moves equally well worth exploring. However, in examining human behavior this is almost always not how humans operate. For example, in a game of Chess, a human might not consider using a move which certainly will allow his opponent to place him in checkmate the next move. For the case of DVONN, however the quiescence of a move, or more plainly how uninteresting it is, helps the IA make a move within reasonable time. At the beginning of a DVONN game there are many possible moves and they are almost all identically beneficial, so there is not much use in exploring all of them to great depth. When the IA finds a move which is interesting enough, it can explore that branch in more depth than it would in others. Algorithm 2 Alpha-Beta Pruning Require: node, depth 0 Ensure: α is the maximum attainable score if depth 0 node.ist erminal() then return node.rank() child node.f irstchild() while child nil do score search(child, α, β, depth 1) if node.ism axp layer() then α max (α, score) else β min (α, score) if β α then break child child.nextsibling() end while if node.ism axp layer() then return α else return β Algorithm 3 Quiescence Require: parent, child, depth 0 if parent.rank() child.rank() ɛ then return search(child, α, β, depth) else return search(child, α, β, depth 1) 4

5 There must be an heuristic for determining how interesting a move is. When a move is more interesting, we say it is noisy, or otherwise quiet. When a noisy node is found, we should explore it in more depth. Algorithm 3 shows an example for how quiescence is used for this implementation, with appropriate changes made to Algorithm 2 to use it as a subroutine. For this implementation we define a parameter, ɛ, which is a threshold for the amount of change in the rank between the parent state and the child state. If the change exceeds the threshold, we do not charge a depth penalty for the child node. 5 Implementation An IA was implemented in Common Lisp which employs the above algorithms. For the ranking function used by the implementation a simple score difference between the player and opponent is used. For noisiness, a simple threshold check is used between the rank of the parent and child states. The final algorithm used is very close to Algorithm 3 presented in this paper. This solution implements part of an IA and game semantics for DVONN. The tile-placing phase of the game was omitted for more focus on the move-making phase of the game. An example of two instances of these IA s facing each other may be found in Appendix A. This implementation also supports playing with a human player, albeit completely in text mode. For information on executing this implementation, please see the README file in the implementation directory. It contains the necessary information to play a match against the IA. 6 Experiment In order to evaluate the effectiveness of the implementation, several different tests were designed. Specific goals of the evaluation are to determine a few criteria: 1. The IA performs better than a random player. 2. Quiescence search performs better with lower thresholds. It should be noted we are not too interested in the timing performance, more so the quality of the move selection. To test these goals, several different IA players were pitted against one another, varying the fixed ply and the quiescence threshold for each in order to observe the result. We repeat each game for 40 trials (each time with a randomized board), with the exception being the control case of two random players, which was repeated 120 times due to cheapness of computation. For each game, only the final score for each player was measured. An additional statistic called Rank is introduced which is the difference between the black and white player s scores which is also the metric by which the minimax algorithm ranks states for the black player. In each of the cases, it is constructed such that the black player is the intended winner. In 1 a special notation is used to denote the two parameters of the players, which is (Ply, Threshold) where Ply is the fixed ply of the player and Threshold is the quiescence threshold. So, for example, a (0, ) player is a random player since it has a fixed ply of zero and a positively infinite quiescence threshold. For evaluating criteria 1, we use the null hypothesis H 0 : Rank > 0 on the (0, ) vs. (1, ) case. We can see that the 95% confidence interval is well above the zero mark, and therefore we can accept the null hypothesis. For evaluating criteria 2, we use the null hypothesis H 0 : Rank > 0 on the remaining cases (save for the control case of two random players). We find that the null hypothesis holds in some cases, however it is inconclusive in most of them. We also have the really bizarre result that it is statistically significant that white is favored to win in a (1, 8) vs. (1, 6) match. There is some trending present with a lower threshold, so we might see that a threshold of at least 3 provides 5

6 Table 1: A comparison of different IA s being pitted against one another. Game Type Wins (W) Wins (B) Ties Score (W) Score (B) Rank (1 8) vs. (1 3) ± ± ± 1.24 (1 8) vs. (1 4) ± ± ± 1.13 (1 8) vs. (1 5) ± ± ± 1.42 (1 8) vs. (1 6) ± ± ± 1.28 (1 8) vs. (1 7) ± ± ± 1.31 (1 ) vs. (1 4) ± ± ± 1.36 (0 ) vs. (1 ) ± ± ± 1.37 (0 ) vs. (0 ) ± ± ± 2.37 reasonable results for the time of computation, and also allows us to accept H 0. 7 Conclusion The minimax search algorithm, when given proper modifications, is a reasonable and inexpensive (when parametrized carefully) way to create a challenging computer opponent. At the very least, it is better than a random opponent, and with the proper adjustment of the quiescence threshold, it can perform better than just a fixedply search. The rules of DVONN do not prohibit the use of standard game-playing IA techniques, and the minimax algorithm can work quite well for it. It may be difficult to tell how well this algorithm works against humans as, for the bulk of the timeline of this project, the minimax algorithm would errantly return random moves (and still beat its human opponents!); so future work for this IA might be a more formal study of its play against humans. Also, this implementation uses a ranking heuristic which is not all that sophisticated and does not fully encapsulate more complex strategies in DVONN. For example, a seasoned player might capture a control point and move it away from a reservoir of his opponent s pieces so that they will be removed from the board, or keep his pieces grouped together so that he has more control over whether his own are removed. However, the chosen ranking function does not support such strategy as it seeks only to maximize its score. Therefore, another study could be conducted to test different ranking strategies to see which perform better than others. A Sample Game This section presents a sample game between a 1- ply quiescence-enabled player (black) and a fixed 1-ply player (white) in an attempt to demonstrate the difference which quiescence can make. More attention should be paid to moves 18 and on where black ultimately takes a valuable piece which white leaves vulnerable. A.1 White Move White: 23 Black: 23 White moves (7 4) E. 6

7 25 20 White Score Black Score Rank (1 8) vs (1 4) (1 8) vs (1 3) (1 8) vs (1 6) (1 8) vs (1 5) (1 ) vs (1 4) (1 8) vs (1 7) (0 ) vs (0 ) (0 ) vs (1 ) Figure 3: Bar chart of scores and ranks with associated confidence intervals. A.2 Black Move White moves (8 3) SW White: 24 Black: 22 Black moves (9 4) W. A.4 Black Move A.3 White Move White: 22 Black: 24 3 White: 25 Black moves (4 4) E. 7

8 A.5 White Move A.8 Black Move White: 24 Black: 22 White moves (6 4) W. White: 24 Black: 22 Black moves (5 4) NW. A.6 Black Move A.9 White Move White: 26 Black: 20 Black moves (4 3) SE. White: 23 Black: 23 White moves (4 0) W. A.7 White Move A.10 Black Move White: 23 Black: 23 White moves (6 3) NE. White: 28 Black: 18 Black moves (2 0) E. 8

9 A.11 White Move A.14 Black Move White: 22 Black: 24 White moves (3 1) NW. White: 30 Black: 16 Black moves (0 3) NW. A.12 Black Move A.15 White Move White: 29 Black: 17 Black moves (3 3) NE. White: 29 Black: 17 White moves (8 2) E. A.13 White Move A.16 Black Move White: 28 Black: 18 White moves (5 2) W. White: 30 Black: 16 Black moves (9 3) NW. 9

10 A.17 White Move A.20 Black Move White: 28 Black: 18 White moves (8 1) SE. White: 27 Black: 14 Black moves (3 2) NW. A.18 Black Move A.21 White Move White: 31 Black: 15 Black moves (10 2) W. White: 26 Black: 15 White moves (9 2) W. A.19 White Move A.22 Black Move White: 27 Black: 19 White moves (7 2) E. White: 27 Black: 14 Black moves (0 2) E. 10

11 A.23 White Move A.26 Black Move White: 19 Black: 22 White moves (4 2) W. White: 15 Black: 18 Black moves (1 1) SW. A.24 Black Move A.27 White Move White: 20 Black moves (2 1) E. White: 11 Black: 22 White moves (7 1) NE. A.25 White Move A.28 Black Move White: 12 White moves (5 1) W. White: 12 Black moves (9 0) W. 11

12 A.29 White Move A.32 Black Move White: 10 Black: 23 White moves (7 0) SW. White: 4 Black: 20 Black moves (3 4) W. A.30 Black Move A.33 White Move White: 10 Black: 20 Black moves (1 2) E. White: 4 White moves (6 1) SW. A.31 White Move A.34 Black Move White: 8 Black: 20 White moves (5 0) E. White: 3 Black moves (1 4) E. 12

13 A.35 White Move A.38 Black Move White: 3 White moves (5 3) W. White: 4 Black: 14 References A.36 Black Move White: 4 Black: 14 Black moves (1 3) NE. A.37 White Move 11 3 White: 4 Black: 14 White cannot move [1] Matthias Bodenstein. Dvonner Accessed 19 November, [2] Murray Campbell, A.Joseph Hoane Jr., and Feng hsiung Hsu. Deep blue. Artificial Intelligence, 134(12):57 83, [3] FatPhil. Rororo the bot Accessed 19 November, [4] Jan. Jan s program Accessed 19 November, [5] C.S. Lee, M.H. Wang, G. Chaslot, J.B. Hoock, A. Rimmel, O. Teytaud, S.R. Tsai, S.C. Hsu, and T.P. Hong. The computational intelligence of mogo revealed in taiwan s computer go tournaments. Computational Intelligence and AI in Games, IEEE Transactions on, 1(1):73 89, [6] John McCarthy. What is artificial intelligence?, Accessed 18 November, [7] Martin Trautmann. Holtz Accessed 19 November, [8] Nivo Zero. ddvonn Accessed 19 November,

ARTIFICIAL INTELLIGENCE (CS 370D)

Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-5) ADVERSARIAL SEARCH ADVERSARIAL SEARCH Optimal decisions Min algorithm α-β pruning Imperfect,