Monte Carlo based battleship agent Written by: Omer Haber, 313302010; Dror Sharf, 315357319 Introduction The game of battleship is a guessing game for two players which has been around for almost a century. In this paper we document our various attempts (and eventual success) to create an agent that can beat most humans. The rules of the game The game is played on four grids, two for each player. The grids are typically square usually 10 10 and the individual squares in the grid are identified by letter and number. On one grid the player arranges ships and records the shots by the opponent. On the other grid the player records their own shots. Before playing commences, each player secretly arranges their ships on their primary grid. Each ship occupies a number of consecutive squares on the grid, arranged either horizontally or vertically. The number of squares for each ship is determined by the type of the ship. The ships cannot overlap (i.e., only one ship can occupy any given square in the grid). The types and numbers of ships allowed are the same for each player. These may vary depending on the rules. Specifically, in our attempts, we have used various numbers and types of ships to see with which rules our agent performs better. After the ships have been positioned, the game proceeds in a series of rounds. In each round, each player takes a turn to announce a target square in the opponent's grid which is to be shot at. The opponent announces whether or not the square is occupied by a ship, and if it is a "miss", the player marks their primary grid with a white peg; if a "hit" they mark this on their own primary grid with a red peg. The attacking player notes the hit or miss on their own "tracking" grid with the appropriate color peg (red for "hit", white for "miss"), in order to build up a picture of the opponent's fleet. Figure 1: example of a standard battleship game. 1
When all of the squares of a ship have been hit, the ship is sunk, and the ship's owner announces this (e.g. "You sank my battleship!"). If all of a player's ships have been sunk, the game is over and their opponent wins. In our version of the game we have disregarded this property of the game (the agent doesn't know whether it sunk a ship or not) Approach and Method First of all, we state several properties of the game which we based our solution on: Our 2 player game is not really for 2 players The rules of the game state that each player makes a move in their own turn. However we can see that every pair of grids in the game is dedicated solely for the purpose of searching and destroying a player's fleet, and is not affected by the events on the other pair of grids (other than, of course, if the other player loses the game). With this we can understand that all that really matters is the amount of moves taken for a player to bring down the other player's fleet. Therefore we have decided to focus our efforts on building the best agent for bringing down the enemy as fast as possible. Our scoring system is designed to measure how fast our agent finishes destroying the enemy's fleet. The agent starts with n 2 points, and for each shot taken by our agent we reduce 1 point from the final score, meaning that if our agent tries to shoot at every square on the board before finishing the game it will get precisely 0 points. Not all squares were created equal One can see that we our game can be looked at as a partially observable Markov decision process (POMDP). Obviously our agent receives only a partial observation of the system at each stage (whether it's a hit or a miss on a certain square it previously chose to bombard) and each action has a statistical result. However, some squares have a higher probability of holding a ship. For example, let us take a look at a simple 3x3 board with a single 1x2 ship. We would like to calculate the precise chance of our ship being in a particular square, given a random distribution of placements. Figure 2: all the possible positions of a ship of length 2 on a 3x3 board. 2
As one can see, the total number of ships that are positioned in the center square across all boards is double the total number of ships that are positioned in any particular corner. Therefore, if we assume that a player will likely lay his ships in a random manner, our first move should probably be shooting down the center. We could theoretically try to do this calculation for every observation we have on the board: take all the possible boards, find the most likely square and fire at it. Yet, then we are faced with a computation problem that becomes intractable on even medium-sized boards: on a 10x10 board with 5 ships there are approximately 70 billion different layouts. Obviously, going through all of them and eliminating those which do not satisfy the constraints that stem from the history of observations is an impossible task with standard computational power, and we needed something more feasible. The Monte Carlo method The name "Monte Carlo methods" refers to a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results, usually to solve problems in which it is difficult or impossible to use precise mathematical methods. In our case, understanding the infeasibility of a full solution to our problem, we came to a conclusion that randomizing a large number of boards that agree with our observation of the world would usually give us a close approximation of the boat distribution on each square of the board. Put more precisely, our algorithm randomizes a certain number of boards at the beginning of the game, and for each turn eliminates the boards that do not agree with the observations of the real board, randomizing possible boards again until a suitable one is found. For each step, after this is done, we sum the amount of boards that have a boat in each particular square. After the scores for each of the squares are calculated, we pick the most probable one. Figure 3: On the left the board we play on. Legend: Red Square discovered square with a ship (hit). Green Square discovered square without a ship (miss). Teal Square undiscovered square with a ship. Blue Square undiscovered square without a ship. 3
On the right A heat map showing the distribution of probabilities where red is most probable and black is least. Satisfying the constraints An issue that arises is the need to randomize a board in such a way that would satisfy the constraints stemming from the observations. One could think that just trying to randomize boards until we find one that agrees with the observation is generally a good idea, but this is not the case. For example let's say that we have almost completed hitting all the squares on the board in such a manner that there is only one viable way to actually place the boats on the board. In that case our algorithm is going to try randomizing a single board out of 70 billion different options. Again, we needed something more feasible. Although this seems like just a question of feasibility, the running time was actually a major factor. The longer it took us to randomize a valid board, the less of them we could use in every single iteration. This in turn made our Monte Carlo estimation less accurate and, therefore, less reliable. In order to reduce the number of different options, we first decided to not allow our randomization algorithm to place boats on squares that we've already missed. That alone doesn t work so well, so we tried giving priority to placing boats on squares that we knew that we've hit. In order to make sure we don't always try to place the same boat at the same spot, we also randomize the order of the boats we try to place. Results Time Matters backtracking vs. fresh random boards As we've already discussed, the running time of our algorithm was a real issue, and since most of our running time was concentrated on randomizing boards that agree with our observations, we tried several approaches to this problem. The two approaches found to be most effective were: 1. When randomizing the board and hitting a dead end, try to randomize another board again and start the whole process over 2. Backtrack and try to take out the last ship that didn't fit, and change its location so it fits the constraint. After several unsuccessful tries, backtrack once more and repeat this process recursively. In order to check the effectiveness of each method, we let both of them try to completely fill out different sizes of boards with different ships, and compared the results. 4
Average Ratio between Time Taken to Fill Boards of Different Sizes 12 10 8 6 4 2 0 8 7 6 5 4 3 2 1 recursive random Figure 4: Average ratio (Y axis) between the time taken to fill boards of different sizes (X axis) (the red columns are there to give a sense of magnitude to the blue columns) As we can see, the randomization algorithm tends to fare between 1.15 and 9.5 times better at every single situation it faced (aside from one case), as well as faring better as the board size grows. This results probably stems from the fact that when we try to backtrack an already badly-positioned ship we are going to spend a lot of time trying to solve an unsolvable problem before we decide to take out the first ship. Effects of sample size Another thing we wanted to test out was the effectiveness of our algorithm as a function of the number of boards evaluated at each step of the way. The following results have been attained on a board with the following properties: Size of 10x10 or 20x20 respectively One ship of size 5 One ship of size 4 Two ships of size 3 One ship of size 2 The agents had played the same 100 boards in each test. 5
Figure 5: Average score (Y axis) as a function of the number of evaluated boards at each step(x axis) As we can see from the graph, there is about a 13% increase in the effectiveness of the algorithm (6 shots faster) on a 10x10 board, capping around 47 shots. This bound stems from the fact that not much information can be extracted regarding small ships. For example, in the following scenario the ship of size 2 can fit pretty much anywhere on the board, leading to the algorithm spending much time "shooting" to find out where it is: Figure 6: example of an extreme case the algorithm has difficulties finding the smallest ship. On the 20x20 board we can see a much larger increase in the score (which is the important statistic for us): there is about a 15% increase in the effectiveness of the algorithm (32 shots faster). These results probably stem from the fact that larger samples are required for the Monte Carlo analysis in order to get reliable results on a 20 x 20 board. Have we surpassed the humans? Our initial goal was to create an algorithm that could beat most humans in the game; however it turns out playing thousands of games against a computer is quite a tiresome task, therefore we have decided to build a heuristic that plays quite similarly to how most humans work and let it fare against our algorithm. 6
So we sat down with several people and asked them how they would play the game. Simply put, our heuristic tries to find ships randomly while keeping a margin of one square between fires since the smallest size of a ship is 2. When our heuristic finds a ship it tries it's best to destroy it, aiming for nearby squares. Figure 7: example of how the heuristic works, checkers pattern as seen near (5,8) turns out to work quite effectively. The following results have been attained on a board with the following properties: Size of 10x10 or 20x20 respectively One ship of size 5 One ship of size 4 Two ships of size 3 One ship of size 2 Each agent (completely random, heuristic, human and POMDP) had to play the same 100 boards (including the poor human ) mean score 10 by 10 mean score 20 by 20 52 43 40 60 50 40 248 229 219 300 250 200 30 150 20 100 5 10 20 50 awesome AI human heuristic random 0 awesome AI human heuristic random 0 Figure 8: comprehension of mean scores of different agents. 7
As we can see from the graph for the 10x10 board, our algorithm fares about 20% better than the human (around 9 steps faster on the board) and about 30% better than our heuristic (around 12 steps faster on the board), which means it's pretty effective and really not fun to play against. On the 20x20 board our algorithm fares about 8% better than the human agent (around 19 shots faster on the board) and about 13% than our heuristic (around 29 steps faster on the board). Conclusions We started out our journey to destroy yet another wonderful game humans tend to delve into for hours on end. We feel we have succeeded in our quest, using the power of statistical analysis. We have seen that some algorithms tend to do better than others when it comes to satisfying constraints in our problem, and that as one could've expected, using a larger sample for our statistical analysis tends to fare better results. We've seen, however, that there exists a cap which is quite hard to beat no matter what kind of algorithm you are using assuming that the board you're trying to discover is completely random. Further research could concentrate on seeing results for a similar algorithm using a significantly higher number of randomized boards, finding an upper bound on the amount of average shots needed to be taken in order to win the game or finding layouts which are likely to maximize a player's chance of survival. References "Monte-Carlo Planning in Large POMDPs" by David Silver and Joel Veness. 8