Monte Carlo based battleship agent

Similar documents
More on games (Ch )

2048: An Autonomous Solver

More on games (Ch )

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

Programming an Othello AI Michael An (man4), Evan Liang (liange)

Adversary Search. Ref: Chapter 5

CS-E4800 Artificial Intelligence

Monte Carlo tree search techniques in the game of Kriegspiel

CMSC 671 Project Report- Google AI Challenge: Planet Wars

Chapter 3 Learning in Two-Player Matrix Games

YourTurnMyTurn.com: Go-moku rules. Sjoerd Hemminga (sjoerdje) Copyright 2019 YourTurnMyTurn.com

The game of Reversi was invented around 1880 by two. Englishmen, Lewis Waterman and John W. Mollett. It later became

Optimal Yahtzee performance in multi-player games

CS221 Project Final Report Gomoku Game Agent

CSE548, AMS542: Analysis of Algorithms, Fall 2016 Date: Sep 25. Homework #1. ( Due: Oct 10 ) Figure 1: The laser game.

AI Learning Agent for the Game of Battleship

CS 188: Artificial Intelligence Spring 2007

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

Assignment 6 Play A Game: Minesweeper or Battleship!!! Due: Sunday, December 3rd, :59pm

CS221 Project: Final Report Raiden AI Agent

CS440/ECE448 Lecture 11: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Real-Time Connect 4 Game Using Artificial Intelligence

How to Beat Kindergartners at Battleship

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Artificial Intelligence. Cameron Jett, William Kentris, Arthur Mo, Juan Roman

YEDITEPE UNIVERSITY CSE331 OPERATING SYSTEMS DESIGN FALL2012 ASSIGNMENT III

Battleship Advance Rules

OCTAGON 5 IN 1 GAME SET

CS510 \ Lecture Ariel Stolerman

Techniques for Generating Sudoku Instances

B1 Problem Statement Unit Pricing

Artificial Intelligence

Playing Othello Using Monte Carlo

Experiments on Alternatives to Minimax

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Announcements. Homework 1 solutions posted. Test in 2 weeks (27 th ) -Covers up to and including HW2 (informed search)

Using Artificial intelligent to solve the game of 2048

HUJI AI Course 2012/2013. Bomberman. Eli Karasik, Arthur Hemed

Once this function is called, it repeatedly does several things over and over, several times per second:

Comp 3211 Final Project - Poker AI

An Empirical Evaluation of Policy Rollout for Clue

Game Playing: Adversarial Search. Chapter 5

Reinforcement Learning Agent for Scrolling Shooter Game

Monte Carlo Tree Search

AI Approaches to Ultimate Tic-Tac-Toe

Moneybags. by Will Chavis. Combinatorial Games. Instructor: Dr. Harold Reiter

CSC 380 Final Presentation. Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis

Artificial Intelligence. Minimax and alpha-beta pruning

Sokoban: Reversed Solving

Unofficial Bolt Action Scenario Book. Leopard, aka Dale Needham

Nested Monte-Carlo Search

AI Agent for Ants vs. SomeBees: Final Report

Practice Session 2. HW 1 Review

Unit-III Chap-II Adversarial Search. Created by: Ashish Shah 1

Lu 1. Game Theory of 2048

Adversarial Search. CS 486/686: Introduction to Artificial Intelligence

Jamie Mulholland, Simon Fraser University

Grade 7/8 Math Circles Game Theory October 27/28, 2015

CONTENTS TABLE OF BOX CONTENT SECTION SECTION SECTION SECTION SECTION SECTION SECTION

Game Theory. Vincent Kubala

37 Game Theory. Bebe b1 b2 b3. a Abe a a A Two-Person Zero-Sum Game

SCRABBLE ARTIFICIAL INTELLIGENCE GAME. CS 297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University

Tetris: A Heuristic Study

Lecture 20: Combinatorial Search (1997) Steven Skiena. skiena

Documentation and Discussion

CS 440 / ECE 448 Introduction to Artificial Intelligence Spring 2010 Lecture #5

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

Kenken For Teachers. Tom Davis January 8, Abstract

Determine the Expected value for each die: Red, Blue and Green. Based on your calculations from Question 1, do you think the game is fair?

Playing Games. Henry Z. Lo. June 23, We consider writing AI to play games with the following properties:

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Ar#ficial)Intelligence!!

Game Theory. Vincent Kubala

Fleet Engagement. Mission Objective. Winning. Mission Special Rules. Set Up. Game Length

Heuristics, and what to do if you don t know what to do. Carl Hultquist

Aggression. Mary Jane O Neill, Stephanie Fan. August 6, 2013

Artificial Intelligence Adversarial Search

Analyzing Games: Solutions

A Quoridor-playing Agent

Announcements. CS 188: Artificial Intelligence Fall Today. Tree-Structured CSPs. Nearly Tree-Structured CSPs. Tree Decompositions*

Buck Rogers Battle For The 25 th Century 1. 4 Turn Phases Complete each phase in order Definitions

Overview. Algorithms: Simon Weber CSC173 Scheme Week 3-4 N-Queens Problem in Scheme

Lecture 19 November 6, 2014

Dragon Canyon. Solo / 2-player Variant with AI Revision

MATHEMATICAL RELATIONAL SKILLS AND COUNTING

Free Cell Solver. Copyright 2001 Kevin Atkinson Shari Holstege December 11, 2001

Mind Ninja The Game of Boundless Forms

Programming Problems 14 th Annual Computer Science Programming Contest

GEO/EVS 425/525 Unit 2 Composing a Map in Final Form

CS 188: Artificial Intelligence Spring Announcements

Play with good sportsmanship and have fun!

a b c d e f g h 1 a b c d e f g h C A B B A C C X X C C X X C C A B B A C Diagram 1-2 Square names

FULL RULEBOOK GAME FLOW TABLE OF CONTENTS. Playing Scenarios... 17

Population Initialization Techniques for RHEA in GVGP

Go Combinatorics: The Recent Work of Dr. John Tromp and His Colleagues on the Number of Possible Go Positions, Games and their Length

Comp th February Due: 11:59pm, 25th February 2014

Five-In-Row with Local Evaluation and Beam Search

Transcription:

Monte Carlo based battleship agent Written by: Omer Haber, 313302010; Dror Sharf, 315357319 Introduction The game of battleship is a guessing game for two players which has been around for almost a century. In this paper we document our various attempts (and eventual success) to create an agent that can beat most humans. The rules of the game The game is played on four grids, two for each player. The grids are typically square usually 10 10 and the individual squares in the grid are identified by letter and number. On one grid the player arranges ships and records the shots by the opponent. On the other grid the player records their own shots. Before playing commences, each player secretly arranges their ships on their primary grid. Each ship occupies a number of consecutive squares on the grid, arranged either horizontally or vertically. The number of squares for each ship is determined by the type of the ship. The ships cannot overlap (i.e., only one ship can occupy any given square in the grid). The types and numbers of ships allowed are the same for each player. These may vary depending on the rules. Specifically, in our attempts, we have used various numbers and types of ships to see with which rules our agent performs better. After the ships have been positioned, the game proceeds in a series of rounds. In each round, each player takes a turn to announce a target square in the opponent's grid which is to be shot at. The opponent announces whether or not the square is occupied by a ship, and if it is a "miss", the player marks their primary grid with a white peg; if a "hit" they mark this on their own primary grid with a red peg. The attacking player notes the hit or miss on their own "tracking" grid with the appropriate color peg (red for "hit", white for "miss"), in order to build up a picture of the opponent's fleet. Figure 1: example of a standard battleship game. 1

When all of the squares of a ship have been hit, the ship is sunk, and the ship's owner announces this (e.g. "You sank my battleship!"). If all of a player's ships have been sunk, the game is over and their opponent wins. In our version of the game we have disregarded this property of the game (the agent doesn't know whether it sunk a ship or not) Approach and Method First of all, we state several properties of the game which we based our solution on: Our 2 player game is not really for 2 players The rules of the game state that each player makes a move in their own turn. However we can see that every pair of grids in the game is dedicated solely for the purpose of searching and destroying a player's fleet, and is not affected by the events on the other pair of grids (other than, of course, if the other player loses the game). With this we can understand that all that really matters is the amount of moves taken for a player to bring down the other player's fleet. Therefore we have decided to focus our efforts on building the best agent for bringing down the enemy as fast as possible. Our scoring system is designed to measure how fast our agent finishes destroying the enemy's fleet. The agent starts with n 2 points, and for each shot taken by our agent we reduce 1 point from the final score, meaning that if our agent tries to shoot at every square on the board before finishing the game it will get precisely 0 points. Not all squares were created equal One can see that we our game can be looked at as a partially observable Markov decision process (POMDP). Obviously our agent receives only a partial observation of the system at each stage (whether it's a hit or a miss on a certain square it previously chose to bombard) and each action has a statistical result. However, some squares have a higher probability of holding a ship. For example, let us take a look at a simple 3x3 board with a single 1x2 ship. We would like to calculate the precise chance of our ship being in a particular square, given a random distribution of placements. Figure 2: all the possible positions of a ship of length 2 on a 3x3 board. 2

As one can see, the total number of ships that are positioned in the center square across all boards is double the total number of ships that are positioned in any particular corner. Therefore, if we assume that a player will likely lay his ships in a random manner, our first move should probably be shooting down the center. We could theoretically try to do this calculation for every observation we have on the board: take all the possible boards, find the most likely square and fire at it. Yet, then we are faced with a computation problem that becomes intractable on even medium-sized boards: on a 10x10 board with 5 ships there are approximately 70 billion different layouts. Obviously, going through all of them and eliminating those which do not satisfy the constraints that stem from the history of observations is an impossible task with standard computational power, and we needed something more feasible. The Monte Carlo method The name "Monte Carlo methods" refers to a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results, usually to solve problems in which it is difficult or impossible to use precise mathematical methods. In our case, understanding the infeasibility of a full solution to our problem, we came to a conclusion that randomizing a large number of boards that agree with our observation of the world would usually give us a close approximation of the boat distribution on each square of the board. Put more precisely, our algorithm randomizes a certain number of boards at the beginning of the game, and for each turn eliminates the boards that do not agree with the observations of the real board, randomizing possible boards again until a suitable one is found. For each step, after this is done, we sum the amount of boards that have a boat in each particular square. After the scores for each of the squares are calculated, we pick the most probable one. Figure 3: On the left the board we play on. Legend: Red Square discovered square with a ship (hit). Green Square discovered square without a ship (miss). Teal Square undiscovered square with a ship. Blue Square undiscovered square without a ship. 3

On the right A heat map showing the distribution of probabilities where red is most probable and black is least. Satisfying the constraints An issue that arises is the need to randomize a board in such a way that would satisfy the constraints stemming from the observations. One could think that just trying to randomize boards until we find one that agrees with the observation is generally a good idea, but this is not the case. For example let's say that we have almost completed hitting all the squares on the board in such a manner that there is only one viable way to actually place the boats on the board. In that case our algorithm is going to try randomizing a single board out of 70 billion different options. Again, we needed something more feasible. Although this seems like just a question of feasibility, the running time was actually a major factor. The longer it took us to randomize a valid board, the less of them we could use in every single iteration. This in turn made our Monte Carlo estimation less accurate and, therefore, less reliable. In order to reduce the number of different options, we first decided to not allow our randomization algorithm to place boats on squares that we've already missed. That alone doesn t work so well, so we tried giving priority to placing boats on squares that we knew that we've hit. In order to make sure we don't always try to place the same boat at the same spot, we also randomize the order of the boats we try to place. Results Time Matters backtracking vs. fresh random boards As we've already discussed, the running time of our algorithm was a real issue, and since most of our running time was concentrated on randomizing boards that agree with our observations, we tried several approaches to this problem. The two approaches found to be most effective were: 1. When randomizing the board and hitting a dead end, try to randomize another board again and start the whole process over 2. Backtrack and try to take out the last ship that didn't fit, and change its location so it fits the constraint. After several unsuccessful tries, backtrack once more and repeat this process recursively. In order to check the effectiveness of each method, we let both of them try to completely fill out different sizes of boards with different ships, and compared the results. 4

Average Ratio between Time Taken to Fill Boards of Different Sizes 12 10 8 6 4 2 0 8 7 6 5 4 3 2 1 recursive random Figure 4: Average ratio (Y axis) between the time taken to fill boards of different sizes (X axis) (the red columns are there to give a sense of magnitude to the blue columns) As we can see, the randomization algorithm tends to fare between 1.15 and 9.5 times better at every single situation it faced (aside from one case), as well as faring better as the board size grows. This results probably stems from the fact that when we try to backtrack an already badly-positioned ship we are going to spend a lot of time trying to solve an unsolvable problem before we decide to take out the first ship. Effects of sample size Another thing we wanted to test out was the effectiveness of our algorithm as a function of the number of boards evaluated at each step of the way. The following results have been attained on a board with the following properties: Size of 10x10 or 20x20 respectively One ship of size 5 One ship of size 4 Two ships of size 3 One ship of size 2 The agents had played the same 100 boards in each test. 5

Figure 5: Average score (Y axis) as a function of the number of evaluated boards at each step(x axis) As we can see from the graph, there is about a 13% increase in the effectiveness of the algorithm (6 shots faster) on a 10x10 board, capping around 47 shots. This bound stems from the fact that not much information can be extracted regarding small ships. For example, in the following scenario the ship of size 2 can fit pretty much anywhere on the board, leading to the algorithm spending much time "shooting" to find out where it is: Figure 6: example of an extreme case the algorithm has difficulties finding the smallest ship. On the 20x20 board we can see a much larger increase in the score (which is the important statistic for us): there is about a 15% increase in the effectiveness of the algorithm (32 shots faster). These results probably stem from the fact that larger samples are required for the Monte Carlo analysis in order to get reliable results on a 20 x 20 board. Have we surpassed the humans? Our initial goal was to create an algorithm that could beat most humans in the game; however it turns out playing thousands of games against a computer is quite a tiresome task, therefore we have decided to build a heuristic that plays quite similarly to how most humans work and let it fare against our algorithm. 6

So we sat down with several people and asked them how they would play the game. Simply put, our heuristic tries to find ships randomly while keeping a margin of one square between fires since the smallest size of a ship is 2. When our heuristic finds a ship it tries it's best to destroy it, aiming for nearby squares. Figure 7: example of how the heuristic works, checkers pattern as seen near (5,8) turns out to work quite effectively. The following results have been attained on a board with the following properties: Size of 10x10 or 20x20 respectively One ship of size 5 One ship of size 4 Two ships of size 3 One ship of size 2 Each agent (completely random, heuristic, human and POMDP) had to play the same 100 boards (including the poor human ) mean score 10 by 10 mean score 20 by 20 52 43 40 60 50 40 248 229 219 300 250 200 30 150 20 100 5 10 20 50 awesome AI human heuristic random 0 awesome AI human heuristic random 0 Figure 8: comprehension of mean scores of different agents. 7

As we can see from the graph for the 10x10 board, our algorithm fares about 20% better than the human (around 9 steps faster on the board) and about 30% better than our heuristic (around 12 steps faster on the board), which means it's pretty effective and really not fun to play against. On the 20x20 board our algorithm fares about 8% better than the human agent (around 19 shots faster on the board) and about 13% than our heuristic (around 29 steps faster on the board). Conclusions We started out our journey to destroy yet another wonderful game humans tend to delve into for hours on end. We feel we have succeeded in our quest, using the power of statistical analysis. We have seen that some algorithms tend to do better than others when it comes to satisfying constraints in our problem, and that as one could've expected, using a larger sample for our statistical analysis tends to fare better results. We've seen, however, that there exists a cap which is quite hard to beat no matter what kind of algorithm you are using assuming that the board you're trying to discover is completely random. Further research could concentrate on seeing results for a similar algorithm using a significantly higher number of randomized boards, finding an upper bound on the amount of average shots needed to be taken in order to win the game or finding layouts which are likely to maximize a player's chance of survival. References "Monte-Carlo Planning in Large POMDPs" by David Silver and Joel Veness. 8