2048: An Autonomous Solver

Size: px

Start display at page:

Download "2048: An Autonomous Solver"

Lionel Blankenship
5 years ago
Views:

1 2048: An Autonomous Solver Final Project in Introduction to Artificial Intelligence ABSTRACT. Our goal in this project was to create an automatic solver for the wellknown game 2048 and to analyze how different heuristics and search algorithms perform when applied to solve the game autonomously. In our work we compare the Alpha-Beta pruning and Expectimax algorithms as well as different heuristics and see how they perform in solving the game. Omri Lifshitz, Ehud Rot, Ido Dan,

2 1. PROBLEM DESCRIPTION 1.1 Background. The game 2048 is a very simplistic yet and enjoyable game. The game is comprised of a 4x4 grid, where each cell in the grid can hold a single block and in each block appears a number which is an exponentiation of two. The game starts with two blocks on the grid with the values 2 or 4. Each turn the player can decide to move all of the tiles on the board to one of four directions (up, down, left or right) only if moving the tiles actually causes the board to change. In each move, moving two tiles with the same number towards each other causes them to convert into a new tile whose value is the sum of the two previous tiles (this only happens if the two tiles have the same number). After each move a new tile is added in a random location on the grid (in one of the unoccupied cells with equal probability); this new tile has the value 2 with a probability of 0.9 and 4 with a probability of 0.1. The goal of the game is to keep connecting tiles until reaching the value The game ends if one of the following occurs: A score of 2048 is achieved There are no more available moves- the player cannot make any move that changes the board and thus is stuck in the current state. 1.2 Game state. Each state in the game is represented by a Python dictionary which maps a tuple of row and column to a tile object. In order to fully describe the state of the game at any given point all we need to know is the position of the tiles on the grid- this dictates all the possible moves, places to place the random tile, and all the information needed in order to continue the game. Therefore it is clear that the way in which we chose to represent the state of the game at any given time is by describing the grid at that same time. 2. IMPLEMENTATION 2.1. Gameplay. Our implementation of the game is based on an open-source python version of the game 2048 that was found online 1, created by "Gabriele Cirulli". One crucial part of the implementation was the ability to split the move into two different parts- the player's move and the random computer move (adding a tile randomly to one of the empty cells in the grid). In order to do so, we needed to reimplement the functions used to control the movements of the tiles according to the player's choice as this method did both of the aforementioned parts. We also added another field to the game which represents whose turn it is to play- this is a Boolean variable that states if the next move should be made by the computer or the player (the use of this variable is explained below). Another crucial part of the re-implementation was creating a clone version of the grid and the game state. The reason we needed to be able to clone the game state is because our algorithms assesses the states in the future and thus we needed to be able to see what the grid will be like in the future without damaging the current state so that it will be able to return to the current state and play best move possible Alpha-Beta Pruning Rationale. Each move the player makes is followed by a random move that the computer makes, each move leading to a new state. According to the scoring heuristic chosen each of 1 The game was originally created as a web game and thus the code was in JavaScript. We wanted to implement our code in Python as we were familiar with the programming language, and thus used the version we found.

3 these new states has its own score (some of these scores may be equal) and affects the progress of the game in a different way. The rationale behind using alpha-beta pruning is that we are preparing to deal with the "worst-case" scenario- each time we take into consideration the worst position where the tile could be placed and by doing so we know that the random positioning of the tile will not affect us Implementation. We implemented the alpha-beta pruning algorithm as seen in class, making our player the "max" player in the game and the computer be the "min" player. The pruning was done in order to speed up the process and avoid checking redundant states. The pruning was done by finding the worst-case scenario using our heuristics (explained below) and only expanding the states with a heuristic value equal to the worst-case scenario. This means that we will not expand any of the states whose heuristic gives a better result and thus saves us a lot of time. As for our "max" player, we give penalize the agent with a very large value if the state causes him to lose the game in order to make sure that the player will avoid losing until there is no other option available. Another feature we implemented is changing the depth of the game tree in the algorithm according to the number of empty cells left on the grid. There are two main reasons for doing so; saving time and making a smarter choice. When there are only a few empty cells left the implications of every move are more critical and each move might be the be-all or the end-all. In addition, the randomness plays a bigger factor when there are fewer places left and therefore we need to better think our move Expectimax algorithm Rationale. Firstly we must notice that the game has a probabilistic aspect to it, each of the tiles the computer places after the player's turn is positioned randomly with an equal probability for each of the unoccupied cells. We chose to implement the expectimax algorithm which takes into consideration a chance element (i.e. the positioning of the new tile). This algorithm is usually used in zero-sum games with a probabilistic element (such as backgammon). We will try to use this algorithm when only one player plays and there is a chance element in order to see if it can take into consideration the randomness in the problem and the player's desire to maximize his score Implementation. We implemented the expectimax algorithm learned in class, but in this implementation we did not have an adversary playing against us, only the computer making a random move in order to represent the random move that the computer makes in the actual game Game Heuristics Empty cell heuristic. This heuristic counts how many cells in the grid are left empty. The rationale behind this heuristic is that when there are more empty cells that chance of failure is much smaller. In addition, we want to encourage our agent to merge as many of the cells as possible (because that causes the tile value to grow) and by doing so it increases the number of empty cells and thus this heuristic also leads the agent to merge cells Score heuristic. This heuristic sums the logarithmic (base 2) value of the tiles on the grid and returns the negation of the result (multiplies the sum by -1). The idea behind this heuristic is to encourage the agent to merge cells and preferably the cells containing larger values. The reason for this is the fact that merging two cells with 4 will change the value of the state from 4 to 3 but merging two cells with 64 will change the value from 12 to 7 and thus the agent will want to merge cell and will prioritize by merging the cells with the larger values first. The

reason we want to merge the cells with the higher values is because this will help increase the value of the highest tile and progress the game towards the goal of 2048. 2.4.3. Highest tile heuristic.

The rationale for this heuristic is the fact that the goal of the game is to reach a tile with 2048 and thus the agent should try to get the tile with the highest score it possibly can. 2.4.4. Gradient heuristic.

4 reason we want to merge the cells with the higher values is because this will help increase the value of the highest tile and progress the game towards the goal of Highest tile heuristic. This heuristic encourages the agent to merge cells and try to reach the highest tile possible. The rationale for this heuristic is the fact that the goal of the game is to reach a tile with 2048 and thus the agent should try to get the tile with the highest score it possibly can Gradient heuristic. This heuristic measures the difference between each tile to its neighbors, sums these differences and returns the negation of the sum. The idea behind this heuristic is that we want to encourage the agent to organize the grid in such way that tiles with similar values (or relatively close values) will be close together so that we will be able to merge them; once the grid is disorganized it is hard to merge cells causing the grid to fill up and eventually leading to failure Direction heuristic. This heuristic iterates holds four counters, one for each direction in each axis (left/right, down/up) to hold the difference of between the cells in each direction. It iterates over all the occupied cells and in doing so checks the difference between the current cell and its neighbors to the right and to the bottom. It adds the difference between the two cells to the matching counter (if the cell to the right is larger, it adds the difference to the counter representing that the right values are larger and so on). The purpose of this heuristic is to organize the grid and give it a "direction" so it will be easier to merge a large number of cells. 3. RESULT ANALYSIS 3.1 Measuring depth success. In this part we will examine the influence of the pruning depth on the different game parameters of a single run of the game. All runs are conducted with alpha-beta pruning using the best weighing of heuristics we found during testing the game. The parameters examined are the based 2 log of the highest tile found on the board, the game's score (implemented in the original game, is a more accurate measure for the agents success when 2048 tile is not reached) and the running time of current game (until game ends by win or lose). The different parameters are averaged over 10 runs in depths 0-4 of the maximizing player. We also examined a special depth which is a combination of depths 2 and 4, as described in part Chart 1 -The game's score vs. the alpha-beta pruning Analyzing the results led depth to some characteristics of the agent's behavior: Chart 2 - Average 2 based log of the highest tile in the board vs. the alpha-beta depth. As can be seen, after depth 2 the average is not influenced much by the depth

5 Chart 3 - The success rate of thealpha-beta agent over different search depths. As seen, higher depth has significently better success rate Chart 4 The running time (in 10 based logarithmic scale) in different search depths. The running time is exponantially dependent on the search depth Success Dependence. The first, most trivial yet important conclusion is the strong dependence of the success rate in the search depths. As seen in chart 3, the success rate depends approximately linearly in the search depth; therefore, achieving high success rates is possible only when running deeper searches. This of course, makes sense since deeper search means looking deeper into the predicted future of the game and taking actions that will lead to higher tiles and better board order. The dependence strength is interesting; we believe it is so strong because in this game, making wrong moves in crucial situations may lead to bad board states which are sometimes irreversible. Moreover, winning this game is pretty complex even for humans because of the exact same reason, you must see enough steps ahead otherwise you lose. Another nice result is that we do not achieve any success when we use depth 0 or 1 searches. That comes in line with the last conclusion; if you act "greedily" you will not win Exponential time. Another result we expected is the exponential growth in the games running time. As seen in chart 4, the running time grows linearly in a log scale the growth of the running time is exponential. This growth is due to the search tree branching, as we know, increasing the depth by 1 cause the number of leaves to multiply by 4 the number of actions possible for the maximizing agent. As we continue to increase the depth, we will get unreasonable running time for the search Score and highest tile. The results are not very surprising nor interesting. The score, highest tile and success rate has a strong connection and are greater in the same depths The time/success ratio. The interesting fact we can conclude from charts 3 and 4 is the following; the ratio between the success rate and the running time has a maximum. That means that if your goal is to reach as many victories possible in given time, the best thing to do is not to run the deepest search but to run several runs of shorter run, such as the 2-4 combination or the depth 2 or 3 search. In this case, the success rate does not grow but the successes in a finite time are maximal. 3.2 Compare to expectimax. Compare with the success rate and runtime of expectimax. In this part, we will compare the two search algorithms we described above. In order to perform the comparison in reasonable time, we compared both algorithms with the same depth the 2-4 combination described above, which as mentioned, gives the best result for time ratio and also fine success rate. The parameters compared are the running time, the highest tile and the score.

expectimax As seen in chart 5 the expectimax reaches a much lower highest tile performs much worse than the alpha-beta pruning.

6 Log2 of best tile's value vs. depth expectimax alpha-beta and 4 Chart 5 based 2 log of the best tile's value in different depths, the tiles that the alpha-beta reaches are much hugher than the tiles of expectimax As seen in chart 5 the expectimax reaches a much lower highest tile performs much worse than the alpha-beta pruning. This creates a problem in comparing the running time, since the expectimax failed much earlier, it did fewer moves and that caused a much shorter running time. To overcome this problem we looked at the relative time/score value that better signifies the running time compared to the success of the agent. As can be seen in both chart 5 and 6, the alpha-beta search gives us a much better result in a much shorter relative time. At first glance, this result is surprising; we would expect a better result from the 0.15 Time/Score vs. Depth 0.2 expectimax alpha-beta and 4 Chart 3 The time/score ratio in different search depths, we see that the alpha-beta performs much better than expectimax in all depths search that considered the stochastic behavior of the game the expectimax search. The explanation we give to these results is divided into two parts, the long relative time the run takes and the poor result. The reason for the long relative running time lies in the fact that the expectimax search tree grows much faster than the alpha-beta one. The expectimax algorithm checks every possible option for the computer to position the next tile, this causes a huge branching factor of the tree, which in the beginning can reach up to 15 for the "chance" player. In contrary to the alpha beta pruning that expends only the worst-case tile for the player, leading to a much faster run. We believe that the reason for the poor result is because expectimax is usually used in zero-sum games, the 2048 game is not such a game and therefore, expectimax will not improve our search significantly.

3.3 Heuristic Factors. In this part we will show the importance of correct combination between our heuristics determining the heuristics factors.

We separate our heuristics into two groups that share common features: "The arrangers" and "The achievers". 3.3.1 Arrangers.

Without them, the board will turn messy and the tiles will not be positioned next to tiles they can merge with, leading to a more "random" board and failure.

Direction and Gradient heuristic fits in here, because their only purpose is to keep the board ordered; in a certain direction and with low difference between tiles (as seen in figure 1,2),

7 3.3 Heuristic Factors. In this part we will show the importance of correct combination between our heuristics determining the heuristics factors. The importance of weighing the heuristics correctly is obvious, each heuristic leads to certain features of the tiles layout but none of them can lead to a 2048 tile on its own. We separate our heuristics into two groups that share common features: "The arrangers" and "The achievers" Arrangers. The arrangers group contains the heuristics that leads to a more ordered board. All the heuristics in this group grades the board only by relations between tiles and never by a tiles value only. Without them, the board will turn messy and the tiles will not be positioned next to tiles they can merge with, leading to a more "random" board and failure. Alone, "arranging" heuristics would not lead to ending the game and will most fail with a very low highest tile. Direction and Gradient heuristic fits in here, because their only purpose is to keep the board ordered; in a certain direction and with low difference between tiles (as seen in figure 1,2), respectively Achievers. The achievers group contains the heuristics that leads to higher tiles. All the heuristics in it will grade better boards that have higher tiles with no reference of the order of tiles. The achieving heuristics relates only to the value of each tile and not to the connection between values. Without these heuristics the game will end very quickly because the actions taken would not encourage high tiles, leading to many tiles with similar values and quick board filling. Alone, these heuristics will promote relatively Figure 1 A state of the board during a run with only the gradient heuristic. The board tends to have identical neighboring tiles Figure 3 - A state of the board during a run with only the highest tile heuristic. The board is not ordered and only one high tile Figure 2 - A state of the board during a run with only the direction heuristic. The board tends to be ordered in a certain direction Figure 4 - A state of the board during a run with only the score heuristic. The board has "islands" of high tiles but not necessarily close "greedy" actions that will lead to an unordered board (as seen in figures 3,4). Highest tile and Score heuristics fits in this group and will lead to a high valued tile, to high score (that leads to a higher value tiles). Taking these two groups into consideration, we can clearly see that the "empty cells" heuristic fits both groups as it leads to tile merging and therefore better ordered board and higher tiles. As we did in the previous section, we compared the running time, highest tile and scored achieved by the different groups of heuristics in order to understand the correct "game plan" (how the factors affect the results of the game). Our comparisons were made using alpha-beta pruning as we saw that his algorithm lead to better results both in terms of time (it takes less time to run) and in terms of highest tile achieved and success rate.

8 We started by running the game with each of the heuristic groups and looked at the score, time and highest tile achieved. The first thing we looked at was the ration of time/score in order to see which of these heuristics is faster. The reason we looked at the ratio is because sometimes the games ended more quickly than others and we wanted to have a measure for the time it takes for the heuristic not the time of the game (which varied greatly between games as a factor of the result of the game). The graph displaying these ratios for the two heuristic groups is displayed below: Time/ Score for Acheivers and Arrangers Acheivers Arrangers 0 Chart 7 The time/score ratio for different heuristic groups using depths 2 and 4 As we can see from chart 7, the achievers give us a better score per time ratio. All in all this conclusion does make sense as the score is given based on the value of the tiles on the board and not based on the way they are organized. Therefore, heuristics that are achievers try to maximize the value of the tiles and lead to a higher score in less time. Our next step in analyzing these heuristics was to see how the average value of the highest tile was affected by the heuristic, i.e. which of the heuristics gave on average a higher tile. After running the game 10 times with only the achievers heuristic group, we got that the average highest tile had a value (log 2 of the tile's value) of We repeated this process again for the arrangers heuristic group and got that the average value was 9.5. This means that the achievers group led to better results in term of the highest tile as well. Analyzing these two tests led us to the understanding that the achievers group is of more importance than the arrangers group as it time/score ratio is better and so is its average highest tile. However, it was clear to us that these two heuristic groups can be complimentary to one another, and that we need to factor both of them in. The reason for this is the fact that when the grid is more organized, the "achieving" heuristics can also perform better and thus it makes sense to factor both of these groups. We also go the following success rates for the two heuristic groups: Achievers: 10% Arrangers: 0% In the end the factors we chose are: Empty factor: 25 2 We calculated by running the game, and when it ended (after winning or losing) we took the log 2 value of the highest tile that was on the grid.

9 Highest tile factor: 10 Score factor: 14 Gradient factor: 1 Direction factor: 15 And the results are as seen below: Time/Score for heuristic groups Final heuristic Acheivers Arrangers 0 Chart 8 The time/score ratio for different heuristic groups including the final heuristic using depths 2 and 4 We can see that the time for the final heuristic is slightly larger than that of the achievers heuristic. This makes sense as we merged the two groups and thus cause the time to grow. However when looking at the average highest tile we get that the average is now 10.2; meaning that the average highest tile did increase. In addition, the success rate was 40%, meaning that this parameter also grew. 4. CONCLUSIONS From our work it is quite clear that reaching a value of 256 even with a bad heuristic is not a difficult task (can be done even just by trying to survive) and reaching the higher value tiles is the difficult part in the game. We saw that we can split the types of heuristics in the game into two groups- heuristics whose goal is to organize the grid in such a way that makes it easier to play and heuristics whose goal is to increase the score whenever possible (even if it may not be the best move in the long run). The best results came when we tried a combination of these two groups as can be seen in the previous section. We tried two different methods to overcome the probabilistic factor in the game: expectimax and alphabeta pruning. Because of the large amount of possible moves in each state, we needed to deal with a very large branching factor and thus needed to use pruning to overcome this. From our results it is clear that alpha-beta pruning works better than expectimax in this case both in terms of runtime and success rates. This means that we can look at this game as a two player game where the computer always tries to do what is worst for us and acts as a "min" player. However, even when using pruning, the branching factor is still very large and thus we were limited to very small depths (up to 4). Our project shows that it is possible to create an autonomous solver for the game 2048!

10 5. RUNNING THE CODE 5.1 Initializing the Run. Open the file "RunGame.py" and change the constants to whatever constants you want. All the results of your run will be saved in the file you entered. The constants are: RESULTS_ADDR - The file to save the results in. SEARCH_DEPTH The depth of the search when there are more than 3 empty tiles in the board. UNDER_3_SEARCH_DEPTH The depth of the search when there are 3 or less empty tiles on the board. RUN_EXPECTI Whether to run expectimax or alpha-beta search. NUM_OF_ITERATIONS Number of times to run the game in automatic mode. After setting all the constants, run execute the code. 5.2 Playing. When the game executes you can play manually, let the agent play step by step or let it run automatically. Playing manually is conducted with the arrows like in the original game, playing an agent step is done by pressing "Home" key and letting the agent run until he ends all iterations (number of iterations is set in initialization) is done by pressing "End" key. 5.3 Extracting Data. The data of all iterations is kept in the file inserted at initialization. Each line holds values for one iteration in the following format: <highest tiles value>,<boards final score>,<running time(seconds)> The data is saved only after exiting the games window by clicking "ciao" button. REFERENCES 1. We used 2048 python implementation created by Raphael Seban, 2. We took some ideas from another ai implementation of this game (mainly got ideas for some of the heuristics). The following ai solver can be found in the address 3. The original 2048 game created by "Gabrielle Cirulli"

Using Artificial intelligent to solve the game of 2048

Using Artificial intelligent to solve the game of 2048 Ho Shing Hin (20343288) WONG, Ngo Yin (20355097) Lam Ka Wing (20280151) Abstract The report presents the solver of the game 2048 base on artificial