Red Shadow. FPGA Trax Design Competition

Design Competition placing: Red Shadow (Qing Lu, Bruce Chiu-Wing Sham, Francis C.M. Lau) for coming third equal place in the FPGA Trax Design Competition International Conference on Field Programmable Technology 7-9 December 2015 Date: 9 December, 2015 General Chair Design Competition Chair

An Architecture-Algorithm Co-Design of Artificial Intelligence for Trax Player Qing Lu, Chiu-Wing Sham and Francis C. M. Lau Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong Email: qing.v.lu@polyu.edu.hk, bruce.sham@polyu.edu.hk, encmlau@polyu.edu.hk Abstract Trax is a two-player game of simple rules but strategic depth. This article proposes an FPGA-based artificial intelligence for its endless version called Supertrax. An implementable algorithm is developed by combining several strategies and techniques functioning at various levels of software and hardware. These methods are developed using heuristics, multilevel pattern recognition, Monte-Carlo Tree Search, and pathbased scheduling. A specific architecture has also been described to accommodate this algorithm. The proposal contributes a novel idea on this subject and its performance will be shown in the design competition to be held in FPT 15. I. INTRODUCTION Trax is a modern strategic game of two players with alternate moves. At each turn, one player can place a tile of either curves or straights into a vacant position on the board. On the tile, there is one section of track in each of two colors, typically black or white, representing either player. All tiles must be placed edge by edge so that the tracks of the same color can join up. The game is won when either side has a path forming a loop or line. In almost all tournament games, a variation called Supertrax is played, which sets no limits to the board size and hence never ends with a draw. The ostensible simplicity in its rule indeed endows Trax with tremendous complexity. The bridge over this gap is the forced play rule allowing the players to place multiple tiles in one turn. As a result, the depth of the board state is not in line with the number of turns taken and a structural representation for it can be hardly developed by instinct. What s more challenging, there are no definitive rubrics to evaluate the states since the potential to win is in few cases perceptual. This work describes an artificial intelligence (AI) for Trax where the algorithm and architecture are co-designed with collaboration. The organization is as follows. In Section II, the basic strategies and algorithms are developed according to the game characteristics in a software sense. Next, an edge representation method is proposed in Section III to facilitate the implementation of these algorithm efficiently. Based on this method, Section IV then elaborates how the algorithms are interpreted into an hardware architecture at a block-to-block level. Finally, Section V will conclude our work so far and forecast some issues that is still open for investigation. Some of the terminologies and strategies are quoted or developed from [1]. Due to the lack of precedent works, our idea offers a new approach on this topic. Furthermore, this design can be implemented onto an FPGA and applied in the design competition to be held by FPT 15. A. Move Priority II. STRATEGY AND ALGORITHM Given a board state, the first priority should go to the moves that can complete a win. If such moves are unavailable, the counterpart solutions that will prevent an immediate lose are the most appreciable. This intuitive check-list can continue until there is no identifiable threats in favor of both sides which, once activated, can lead to a win in finite steps. Accordingly, the conditions under which the decisions are made fall into three categories: attacking, defending and waiting. Fig. 1 shows the hierarchy of decision priorities based on the existence of activated threats in play. The potential to win or lose has given a heuristic way of move evaluation. If we assign a k-stage win with a score α (k) and lose with β (k), respectively, the above priority list can be interpreted into the following relations α (1) > β (1) > α (2) > β (2) > α (3) > (1) where α (i) > 0 > β (j), i, j. The more attacking or defending patterns are recognized, the more chances a player is able to manage a win or avoid a lose. α(1) α(2) α(3) β(3) β(2) β(1) 2-stage win 3-stage win 3-stage loss 2-stage loss 1-stage loss attacking waiting defending Fig. 1: The priority list for making decisions. B. Multi-Level Pattern Recognition The attacking and defending patterns in a Trax game are distinctive in that their potentials are very conditional. Take 978-1-4673-9091-0/15/$31.00 c 2015 IEEE

(a) (b) searched step by step. In the above example, we start with noticing a 1 1 corner. Then the related positions are checked to see if there is a corner or a side-by-side double-path to make an L. In the latter case, the connection of the black paths is of concern and studied. For different results, some particular steps will be taken further. Actually, this procedure of checking can be represented by a tree structure and it is indeed a fragment of the holistic pattern-recognition algorithm (Fig. 3). (c) Fig. 2: Example of hierarchical pattern combinations: (a) a 1 1 corner, (b) two 1 1 corners, (c) a 1 1 corner and a double path and (d) a 1 1 corner and a double path with a large opponent corner. an L attack for example. As illustrated in Fig. 2a, a 1 1 corner can potentially form a loop but it may be easily defused so it s not a win-threat. However, if another corner exists at the specific position to form L shape with it, the suggested move then makes a two-loop (Fig. 2b). An alternative of a double-path shown in Fig. 2c can perform the same way as the corner. Nevertheless, there is a latent danger of the advantage to be overturned if the black paths are connected at the left side as shown in Fig. 2d because the forced tiles shall give the Black an immediate opportunity to complete a loop. In this case, this pattern contains a 2-stage win and a 1-stage lose and hence it is supposed to be tagged with an very negative score. Check the 2 1 tiles in the positions to form an L 1 1 corner and vacancy connected 1 1 corner side-by-side double-path not connected (d) Check if the opponent paths are connected 1-stage lose Fig. 3: A fragment of the pattern-search tree. This characteristic allows for a hierarchical order of process on pattern recognition so the pattern space is discomposed and C. Monte-Carlo Tree Search While the threats with a small number of stages are identifiable, the patterns in the waiting region are much more difficult to recognize and evaluate due to their complexity. Therefore, we resort to Monte Carlo method [2] to tackle this problem which is commonly applied to approximate the optimal solution when deterministic results are unattainable. In particular, a variation called Monte-Carlo Tree Search (MCTS) is a very useful technique in games where moves follow a tree organization. With MCTS, a game AI can make estimations much more deeply than the traverse method in a given time in spite of small errors. The MCTS is suitable to evaluate the potentials of waiting moves because they are hardly determined within just a few steps. Instead, we predefine them by training which proceeds by generating a random solution for each turn, until an attacking/defending pattern is formed. Then the result is used to calibrate each of the previous solutions. The more samples are trained, the more approximately their strength is measured. By such training, we can as precisely define the waiting part of the priority list by the potential to win or lose as possible. D. Path-Based Scheduling Given the predefined rubrics for pattern evaluation, the schedule of solution search is still problematic because the space grows after each turn. Observationally, the useful information is always on the edge of the playing area where the ends of every existing path are located. By contrast, the details of inner tiles are negligible as long as we know which two ends are connected by a same path. Therefore we search the playing area along with the paths in it for two conspicuous benefits. First, the possible solutions must be an extension from one existing path so that the search is complete and efficient. Second, we notice that at each turn no more than one path can be added into the playing area since the forced tiles cannot introduce new paths. This means that the search complexity is linearly bounded by the number of turns taken. Specifically, each path is investigated by the positions of its two ends. The distance between them clearly reveals its potential to join inward into a loop or stretch outward into a line. An example given in Fig. 4 has two 14-section paths for the White player concertedly indicating a. One is a potential loop and the other is a line. This fatal threat is easily identified by comparing the distance between the ends of these paths and examining the size of the playing area.

Loop end Loop end Line end Line end Fig. 4: The threat of a formed by two 14-section paths. III. EDGE RRESENTATION Concisely, we represent the board by encoding the edges of its grid. Similar techniques have been used to handle the routing problem in VLSI design methodologies [3], [4]. Accordingly all the sections of track are recorded by two bits corresponding to adjacent edges. In our representation method, the horizontal and vertical edges are denoted by X and Y, respectively, and positioned in two orthogonal coordinate systems. We assign two bits for each edge X/Y, the first bit x 1 /y 1 for the occupancy status and the second bit x 2 /y 2 for the color. Referring to Fig. 5, the four edges enclosing the same tile are X (i, j) on the top, Y (j, i) on the left, X (i, j + 1) on the bottom and Y (j, i + 1) on the right. According to the rule, each edge X/Y (i, j) is able to join one of six proximate edges including X/Y (i, j 1), Y/X (j 1, i), Y/X (j 1, i + 1) and X/Y (i, j + 1), Y/X (j, i), Y/X (j, i + 1). Note that these edges belong to two adjacent tile positions while X (i, j) is the center edge shared by both positions. Y1 0 1 2 3 Y2 X2 0 1 2 3 4 X1 0 1 2 3 4 0 1 2 3 X1/2:1 st /2 nd coordinate of edge X Y1/2:1 st /2 nd coordinate of edge Y : horizontal edge X : vertical edge Y Fig. 5: The edge-based representation system. Then we can scan the possible solutions from the edges perspective by regarding a new tile as equivalent to extending a path from the center edge to one of the 6 proximate edges that is untapped. Therefore, the possibility of path extension is determined by the occupancy status of both the related edges. For example, if X (i, j) can extend to Y (j, i), we must have x 1 (i, j) = 1 and y 1 (j, i) = 0. After one move from either player is launched, the forced play needs to be performed by simulation. For the same example, there are 6 symmetric cases where X (i, j) is supposed to be forced. One of them is when edge X (i, j 1) joins Y (j 1, i). In this case, the following Boolean equation is true: x 1 (i, j 1) y 1 (j 1, i) x 2 (i, j 1) y 2 (j 1, i) = 1. (2) Specially, an illegal move occurs when all the three proximate edges enclosing one position are occupied with the same color so that the tracks herein cannot join the others. We can easily detect it by finding or y 1 (j, i) x 1 (i, j + 1) y 1 (j, i + 1) = 1 y 2 (j, i) = x 2 (i, j + 1) = y 2 (j, i + 1) y 1 (j 1, i + 1) x 1 (i, j 1) y 1 (j 1, i) = 1 y 2 (j 1, i + 1) = x 2 (i, j 1) = y 2 (j 1, i). IV. ARCHITECTURE The architecture to fulfill our algorithm is illustrated in Fig. 6. It mainly consists of the board simulator (BS), the solution analyzer (SA) and the main processor (MP). The strategy is programmed into the MP which directs the operation of the BS and SA. When the universal asynchronous receiver/transmitter (UART) receives a data package, the move translator will decode it into our representation and send it into the MP. Under its control, the BS changes its state as the data described and gets ready for the scan of possible solutions. According to the path-based search scheduling, all the paths are stored in the memory by their end positions. After one path is retrieved from the memory, either end is taken at each time as the center edge and its six proximate edges are examined, of which at least two are possible solutions. Next, the BS computes the prospective board state if the possible solution is appointed, and gives a sample for the SA to evaluate. The SA analyzes the board state and computes a score for it. During the evaluation, the MP keeps adjusting the region of investigation when the playing area is too large. When a possible solution is finished, the BS returns to its original state and the MP checks the next proximate edge until all the 12 relevant edges are tried. After that, the next path will be read out and the above procedure repeated. When all the possible solutions are scanned or the time is up, the analyzing process stops and the up-to-date best solution is fed into the MP. It will compute the updated state and rewrite the memory followed by reseting all the other blocks. Finally the solution is translated and sent to the host computer by the move translator (MT) and UART, respectively. A. Board Simulator According to the representation system, the BS can be assembled by edge processors (s) each corresponding to (3) (4)

Depth 1 GR Board Simulator Solution Register Main Processor Path Memory Move Translator 2-stage win 3-stage win 2-stage lose 1-stage lose Score Synthesizer UART Solution Analyzer RS232 Fig. 6: The architecture overview. one edge. Each of the stores the current state of the edge and computes the next. Following the relationship among the edges, every communicates with six proximate s. For four s enclosing one position, there is a switching node to route their connections. Fig. 7 shows how the BS is related to the virtual board and how they are sampled. At each time, only the regional state is investigated and sampled to the SA. To move the region, the s perform a row-/column-wised shift operation. When the investigation region is right, the sample state may match a predefined pattern. In Fig. 7, the highlighted five PEs match the pattern of L -threat that will be identified by the SA. Fig. 7: Regional investigation of the board simulator and its implementation structure. B. Solution Analyzer The SA comprises the pattern recognizers (s), the general rater (GR), the score synthesizer (SS) and solution register (SR). The s are arranged in a structure following the specification of the pattern-search tree. In this way, the identification of different patterns can be processed in parallel. At the end, recognition results are output from the depth-k s, each indicating one principal pattern of threat. Meanwhile, the GR provides a summary of the potential to win or lose based on the training results stored in it. With the same architecture, the AI can be modified by using different initial training results. Therefore, we can easily improve the intelligence of the player. All the information concerning threats and potentials are combined to generate a score by the SS before sending the it to the SR. The SR always keeps track of the best solution that has been found and only replaced when a higher score is generated. V. CONCLUSION In this report, an AI of Trax is designed with collaborative algorithm and architecture. Generally, this design provides a flexible framework to accommodate intelligence adaption of the player. To strengthen this AI, however, we still need to delve into several issues. First, given a stage limit, many patterns or possible variations have not been found and we may develop an algorithm to help find them. Second, the rating formula should be refined to reflect the balance of all the factors. Third, the implementation trade-offs in relation to the AI s performance will be studied and accordingly optimized. REFERENCES [1] D. Bailey, Trax: Strategy for Beginners (2 ed.). D.G. Bailey, 1997. [2] Wikipedia, Monte carlo method, https://en.wikipedia.org/wiki/monte Carlo method. [3] J. Lou, S. Thakur, S. Krishnamoorthy, and H. S. Sheng, Estimating routing congestion using probabilistic analysis, Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 21, no. 1, pp. 32 41, 2002. [4] C.-W. Sham, E. F. Young, and J. Lu, Congestion prediction in early stages of physical design, ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 14, no. 1, p. 12, 2009.