Training a Neural Network for Checkers

Similar documents
Training Neural Networks for Checkers

A Comparison of Particle Swarm Optimization and Gradient Descent in Training Wavelet Neural Network to Predict DGPS Corrections

Creating a Poker Playing Program Using Evolutionary Computation

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Further Evolution of a Self-Learning Chess Program

Upgrading Checkers Compositions

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Co-Evolving Checkers Playing Programs using only Win, Lose, or Draw

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

A Study of Machine Learning Methods using the Game of Fox and Geese

Creating a Dominion AI Using Genetic Algorithms

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 03, 2016 ISSN (online):

Research Article Optimization of Gain, Impedance, and Bandwidth of Yagi-Uda Array Using Particle Swarm Optimization

Game Playing. Garry Kasparov and Deep Blue. 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.

MALAYSIA. Hang Tuah Jaya, Melaka, MALAYSIA. Hang Tuah Jaya, Melaka, MALAYSIA. Tunggal, Hang Tuah Jaya, Melaka, MALAYSIA

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

The Importance of Look-Ahead Depth in Evolutionary Checkers

Particle Swarm Optimization-Based Consensus Achievement of a Decentralized Sensor Network

Research Article Analysis of Population Diversity of Dynamic Probabilistic Particle Swarm Optimization Algorithms

Radiation Pattern Reconstruction from the Near-Field Amplitude Measurement on Two Planes using PSO

Coevolution of Neuro-controllers to Train Multi-Agent Teams from Zero Knowledge

Swarm Based Sensor Deployment Optimization in Ad hoc Sensor Networks

Requirements Specification

Optimal design of a linear antenna array using particle swarm optimization

Multiple-Layer Networks. and. Backpropagation Algorithms

Documentation and Discussion

Assignment 1. Due: 2:00pm, Monday 14th November 2016 This assignment counts for 25% of your final grade.

The Evolution of Blackjack Strategies

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Obstacle Avoidance in Collective Robotic Search Using Particle Swarm Optimization

PID Controller Tuning using Soft Computing Methodologies for Industrial Process- A Comparative Approach

2048: An Autonomous Solver

Pareto Evolution and Co-Evolution in Cognitive Neural Agents Synthesis for Tic-Tac-Toe

MITOCW Project: Backgammon tutor MIT Multicore Programming Primer, IAP 2007

Four Different Methods to Hybrid Simulated Kalman Filter (SKF) with Gravitational Search Algorithm (GSA)

AIS and Swarm Intelligence : Immune-inspired Swarm Robotics

Interactive Tic Tac Toe

Practice Session 2. HW 1 Review

Stock Price Prediction Using Multilayer Perceptron Neural Network by Monitoring Frog Leaping Algorithm

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Adversarial Search and Game- Playing C H A P T E R 6 C M P T : S P R I N G H A S S A N K H O S R A V I

The Evolution of Multi-Layer Neural Networks for the Control of Xpilot Agents

Teaching a Neural Network to Play Konane

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms

Adversarial Search: Game Playing. Reading: Chapter

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

Population Adaptation for Genetic Algorithm-based Cognitive Radios

Evolutionary Computation for Creativity and Intelligence. By Darwin Johnson, Alice Quintanilla, and Isabel Tweraser

Foundations of AI. 5. Board Games. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard and Luc De Raedt SA-1

Training a Back-Propagation Network with Temporal Difference Learning and a database for the board game Pente

Adversarial Search (Game Playing)

Artificial Intelligence Search III

2 Textual Input Language. 1.1 Notation. Project #2 2

DC Shunt Motor Control using Wavelet Network

Foundations of AI. 6. Adversarial Search. Search Strategies for Games, Games with Chance, State of the Art. Wolfram Burgard & Bernhard Nebel

Discussion of Emergent Strategy

Real-Time Connect 4 Game Using Artificial Intelligence

Semi-Automatic Antenna Design Via Sampling and Visualization

Monte Carlo tree search techniques in the game of Kriegspiel

Google DeepMind s AlphaGo vs. world Go champion Lee Sedol

CS 331: Artificial Intelligence Adversarial Search II. Outline

Evolving Parameters for Xpilot Combat Agents

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws

Artificial Intelligence. Minimax and alpha-beta pruning

The Selective Harmonic Elimination Technique for Harmonic Reduction of Multilevel Inverter Using PSO Algorithm

Humanization of Computational Learning in Strategy Games

Computer Science. Using neural networks and genetic algorithms in a Pac-man game

Learning Artificial Intelligence in Large-Scale Video Games

HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player. Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, Peter Stone

Adversarial Search Aka Games

USING A FUZZY LOGIC CONTROL SYSTEM FOR AN XPILOT COMBAT AGENT ANDREW HUBLEY AND GARY PARKER

Print and Play Instructions: 1. Print Swamped Print and Play.pdf on 6 pages front and back. Cut all odd-numbered pages.

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Game Design Verification using Reinforcement Learning

More Adversarial Search

GAMES provide competitive dynamic environments that

Evolutions of communication

An evaluation of how Dynamic Programming and Game Theory are applied to Liar s Dice

Multilevel Selection In-Class Activities. Accompanies the article:

OPTIMAL PLACEMENT OF UNIFIED POWER QUALITY CONDITIONER IN DISTRIBUTION SYSTEMS USING PARTICLE SWARM OPTIMIZATION METHOD

NAVIGATION OF MOBILE ROBOT USING THE PSO PARTICLE SWARM OPTIMIZATION

Evolution, Neural Networks, Games, and Intelligence

IBM SPSS Neural Networks

More on games (Ch )

TUNING OF PID CONTROLLERS USING PARTICLE SWARM OPTIMIZATION

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

CSC 380 Final Presentation. Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis


Lecture 33: How can computation Win games against you? Chess: Mechanical Turk

Implementation and Comparison the Dynamic Pathfinding Algorithm and Two Modified A* Pathfinding Algorithms in a Car Racing Game

2 person perfect information

Adversary Search. Ref: Chapter 5

Behaviour Patterns Evolution on Individual and Group Level. Stanislav Slušný, Roman Neruda, Petra Vidnerová. CIMMACS 07, December 14, Tenerife

DIFFERENTIAL EVOLUTION TECHNIQUE OF HEPWM FOR THREE- PHASE VOLTAGE SOURCE INVERTER

CS 371M. Homework 2: Risk. All submissions should be done via git. Refer to the git setup, and submission documents for the correct procedure.

facewho? Requirements Analysis

CSC321 Lecture 23: Go

Monte Carlo Tree Search

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

UNIT 13A AI: Games & Search Strategies

Transcription:

Training a Neural Network for Checkers Daniel Boonzaaier Supervisor: Adiel Ismail June 2017 Thesis presented in fulfilment of the requirements for the degree of Bachelor of Science in Honours at the University of the Western Cape

1 Declaration I, Daniel Boonzaaier, declare that this thesis Training a Neural Network for Checkers is my own work, that it has not been submitted before for any degree or assessment at any other university, and that all the sources I have used or quoted have been indicated and acknowledged by means of complete references. Signature:.............. Date:....... Printed Name: Daniel Boonzaaie 2

2 Abstract This project will attempt to create a learning program that teaches itself how to play checkers using a neural network. Learning will be done by having the program play checkers numerous times and then evolve based on the outcomes of each game played. 3

Contents 1 Declaration 2 2 Abstract 3 3 Introduction 5 4 Proposal 6 5 Project Plan 6 6 User Requirements 7 7 Requirements Analysis 7 8 Interface 8 8.1 Interface for training......................... 8 8.2 Interface for the game........................ 8 9 High Level Design 10 9.1 Interface................................ 10 9.2 Player................................. 10 9.3 Checkers................................ 10 9.4 Training................................ 10 9.5 Neural Network............................ 11 9.6 PSO (Particle Swarm Optimization)................ 11 10 Low Level Design 12 10.1 The Checkers Game......................... 12 10.1.1 Play Checkers Game Algorithm for random moves.... 12 10.1.2 Finding Checkers Moves................... 13 10.2 Neural Network............................ 14 10.2.1 Neural Networks use within a game of checkers...... 14 10.2.2 Details on the Neural Network............... 15 10.3 Particle Swarm Optimization (PSO)................ 15 10.3.1 PSO Algorithm........................ 16 10.3.2 Particle velocity function.................. 16 10.3.3 Function for particles position................ 16 10.3.4 Fitness value......................... 16 4

3 Introduction Machine learning is not a new concept in computer science. Arthur L. Samuels Some Studies in Machine Learning Using the Game of Checkers was originally published in July 1959. Machine learning is about computer algorithms that allow a computer to learn and is a related to a branch of statistics called computational learning theory. Neural Networks are but one type of Machine Learning methods that use input nodes that are connected to a hidden layer via different weights which are in turn connected to an output layer via more weights. Checkers is a good game to train as it provides complete information. Which means that the complete status of any game is known to the players at all times throughout the game. 5

4 Proposal This project will attempt to create a learning program for the game of checkers using a neural network to autonomously learn to play checkers from scratch. In other words, the program will teach itself how to play checkers. The main focus of this project will be training of a neural network by having the program play the game of checkers a multitude of times. By having the program play against itself the goal is that this program will learn from a predetermined set of rules how to play. Consideration will have to be made in order to make sure that the neural net for this program is not over trained. 5 Project Plan This project will have four main parts that will take place throughout the year of 2017 with each part of the project taking place in each quarter of the year. The first part of the project which is the analysis of the project is the research into what the project will require and the analysis of said requirements from the stand point of the user and software. Researching past works related to the project as well as technologies and software related to the project will assist in guiding the projects development. The second part of the project is the projects design and development. This entails the creation of a User Interface Specification and prototype. From this an Object Orientated Analysis and then an Object Orientated Design can be done. This will take a closer look at the setup of the neural network and other related software. The third part of the project is the projects implementation where the design previously done will be used to create the projects learning program. The implementation will need full documentation. The final part of the project will be the projects testing, evaluation and presentation. Here the created program will be tested to determine whether it works according to expectations and refined if needed. 6

6 User Requirements The program shouldnt require much from the user. The user simply needs to determine whether or not the program has developed in its playing abilities. The program will need to play checkers in some way and learn the best moves to make in order to win. How the neural net will work, its layout and application will need to be well thought out. The interface be it graphical or otherwise will need to be thought. The program simply needs to show that it has learned how to play checkers without outside help from a user. Playing against the program should be possible and could be done further but is not the goal. 7 Requirements Analysis There are previous works done on autonomous game playing systems that involve various games, checkers included. One such work, which was mentioned in the introduction, is Arthur L. Samuels implementation. Another implementation was done by Nelis Franken and Andries P. Engelbrecht. Particle Swarm Optimisation was used in the implementation of their game playing program. Thus one implementation of the checkers game learning/playing program may be done using the neural net in conjunction with gradient decent in its back propagation and the second using the particle swarm optimisation technique in its back propagation. All possible moves need to be analysed and the system should determine which move is the best one to make in order to win. The program will need a look ahead to determine possible moves. Testing of the program should be as simple as seeing whether or not the program follows the rules set out for the game of checkers and if it has learned to play the game properly and the a decent level of competency. 7

8 Interface This program will be interacted with in two different ways. The first is when the training of the neural network is taking place. The second is that of the checkers game. 8.1 Interface for training There will not be a graphical user interface for the training portion. The interaction that takes place during the training will not be accessed by a user and will only have output to show what is happening during the training process. This output will be visible from within an integrated development environment (IDE) such as PyCharm. Figure 1: Example of output for checkers games played. 8.2 Interface for the game The game of checkers that a user can play, will have a graphical user interface. It will have a simple start button and quit button on the first screen. Figure 2: Simple Checkers Game GUI. 8

Once the start button is pressed, a game of checkers will begin where the user will play against the trained neural network. The quit button will end the program. The game board will be shown and the user can click on a piece to move and then on the tile the player wishes to move said piece to. If the move is valid then the piece will move otherwise an error message will be shown telling the user that it is an invalid move. Figure 3: Example of Checkers Game Board GUI. When it is the programs turn to play, the board will be evaluated by the neural network and its selected move will be made. The player may exit the game at any time. 9

9 High Level Design This high level design will attempt to give an explanation of the components that will make up this project. A brief explanation of each component will be given and how these components will interact. Figure 4: Interaction of Components. 9.1 Interface The interface will be the playable game that a user will interact with. This Interface will interact with the checkers game which controls the game being played and the rules for checkers. 9.2 Player The Player here refers to any user which will play the game of checkers. The player interacts with the interface which in turn interacts with the checkers game. The player will indicate to the interface what the player wants to do and the interface will interact with Checkers to determine whether whatever the player wants to do is valid for the game. 9.3 Checkers Checkers refers to the rules and computations behind the interface for the game to work. It will be responsible for the rules of the game and for the neural network that plays against a Player. Checkers will interact with the Neural Network when any game is being played, during training as well as during Training. Checkers also interacts with Training. 10

Figure 5: Basic Depiction of program playing checkers for Training. 9.4 Training Training will be done before any user can play a game of checkers. Training involves playing a game of Checkers where the neural network for each agent in the training phase determines which moves should be done. Once a number of games have been played this way a score, or victory score, is calculated from the number of wins, losses and draws. Plus one for a win, minus two for a loss and zero for a draw. Training interacts with the PSO to update the weights of all the agents Neural Networks after each agents score is calculated. 9.5 Neural Network The Neural Network will work for any board by taking in a thirty two vector input of all board positions. The Neural Network will then output a score based on the positions of the board. This score is obtained for each possible move and the move with the greatest score should be chosen as the best move for that Neural Network. 9.6 PSO (Particle Swarm Optimization) The Particle Swarm Optimization will work on all the vectors made up of all the weights for each agents Neural Network. The victory scores of the agents will be used to determine the global best. The weights in the vectors will be updated according to a velocity function that takes into account the global best and each agents personal best. 11

10 Low Level Design The low level design will provide more specific details on the components discussed in the high level design. It will attempt to provide a clear definition of the algorithms used in the creation of the program. 10.1 The Checkers Game The core of this project is the neural network which will be trained to play checkers. However, to do this, one must first have a checkers game to play. The algorithm with which a game will be played is as such: 10.1.1 Play Checkers Game Algorithm for random moves 1. Run through all current players checkers pieces. 2. If Piece is a normal piece. (a) Check positions left forward diagonal and right forward diagonal. (b) If position is open then add to list of possible moves. (c) If position contains opponents piece. i. Check positions diagonally behind opponent piece position. ii. If position contains any piece ignore. iii. If position is open then check for further jumps until all jumps have been exhausted, steps 2(a) and 2(c), then add to list of possible jump moves. 3. If Piece is a King. (a) Check positions left forward diagonal, right forward diagonal, left back diagonal and right back diagonal. (b) If position is open then add to list of possible moves. (c) If position contains opponents piece. i. Check positions diagonally behind/in front of opponent piece position. ii. If position contains any piece ignore. iii. If position is open then check for further jumps until all jumps have been exhausted, steps 3(a) and 3(c), then add to list of possible jump moves. 4. If there are possibilities to jump and take opponents piece one of these must be chosen at random. 5. Else choose random move from all possible non jump moves. 12

6. If all opponents pieces have been removed from the board. Current player wins. Stop if game has been completed. 7. Change player. If Player 1 change to Player 2. If Player 2 change to Player 1. 8. Continue above steps until game reaches a conclusion or after 100 moves resulting in a draw. 10.1.2 Finding Checkers Moves A list of all possible moves will be created for the 32 possible positions on the board where pieces can occur. Then a list of the possible moves for a certain board can be extracted from this list by comparing the positions that are open and which contain pieces to determine which moves are valid. The algorithm to create said list of all possible moves is created as such: List of possible moves: Figure 6: Checkers Board with positions 0-31. P = [0, 1, 2,..., 31] moves = [] for all n in p if (n % 8) >= 0 and (n % 8) < 4 then right move = n+4 if n % 8 = 0 then left move = no move else left move = n+3 if (n \% 8) >= 4 and (n \% 8) < 8 then left move = n+4 if n % 8 = 7 then right move = no move else right move = n+5 append [n, [left move, right move]] to list moves 13

10.2 Neural Network The neural network that will be used as the brain of the checkers playing program will first be trained. This training will be done to ensure that the neural network used will be able to play the game of checkers and play the game to a decent competency. At minimum the neural network should play better than a game played against that of randomly chosen moves. In the training of the neural network, the neural network will be used to evaluate board positions and then output a score of the board. These scores will be attributed to each of the available moves and the move with the highest score should be chosen as the best move that should then be carried out. This step will continue throughout the entirety of each played game. This part of the training is referred to as feed forward. 10.2.1 Neural Networks use within a game of checkers The move that should be played will be obtained by using the neural network to evaluate all the possible valid moves as such: maxscore = Set very large negative number moveslist = Obtain all possible valid moves index = will contain the index of selected move For each move in MovesList do: boardcopy = create a copy of the games board. The game board consists of a vector of size 32. Each element in the vector represents a place on the board, as can be seen in figure 6. Perform the current move and update the vector boardcopy value = Use boardcopy as input for the neural Network and obtain output. The output being a scalar value. If value is greater than maxscore then Make maxscore equal to value Set index to equal the index of the current move Return index as the move that should be played 14

10.2.2 Details on the Neural Network Figure 7: Neural Network Example. The vector of 32 board positions work as the input for our Neural Network. Then at each node in the hidden layer its activation value is calculated via a summation function and sigmoidal evaluation function. The summation function works, given figure 7 above, as such: Sum = X1W1 + X2W2 + X3W3 + + XnWn, where X stands for the values of the inputs and W stands for the values of the weights. The sigmoidal evaluation function is as such: f (n) = 1 / 1 + e-n, where n stands for the value of the node. 10.3 Particle Swarm Optimization (PSO) In order to improve the neural networks, that will be played against each other during training, some back propagation needs to be done in order to update the weights of the neural networks. Updating the weights iteration after iteration will lead to the neural networks evolving and eventually produce one or more networks that will be used in the final checkers game. A PSO algorithm will be used to update the weights of the neural networks. The PSO will work on all the weights of each neural network in a population of a certain size. The weights will make up a vector with each vector corresponding to a particle in the algorithm. Therefore if the swarm is of size n, there will be n particles representing the weights of each neural network, plus a copy of each 15

particle which represent its personal best. Each particle is updated according to two best values. The first is the best value that the particle has achieved so far, which is the personal best or pbest. The second is the best value is that of the best out of all the particles in the swarm, which is the global best or gbest. 10.3.1 PSO Algorithm Initialise particles from weights of neural networks For each particle Calculate the fitness value If the fitness value is better than the best fitness value, pbest, then set the current value as the new pbest. Choose the particle with the best pbest and set it as the gbest For each particle Calculate the particle velocity Update the particles position Repeat above steps for certain number of iterations or until minimum error criteria is met. 10.3.2 Particle velocity function V = V + ( C1 * rand(0,1) * ( pbest - present )) + ( C2 * rand(0,1) * ( gbest - present )), where c1 and c2 are learning factors, rand(0,1) is a random value between zero and one and present is the particle as it currently is. 10.3.3 Function for particles position Present = Present + particle velocity. 10.3.4 Fitness value The fitness value is calculated for each particle, which is a vector of each neural networks weights. The fitness value is calculated after each neural network has played a number of games against other randomly selected neural networks. After each game the fitness value is updated by +1 for a win, -2 for a loss and 0 for a draw. 16

References [1] T. O Ayodele Types of Machine Learning Algorithms. University of Portsmouth, United Kingdom (2010) [2] A. L. Samuel Some Studies in Machine Learning Using the Game of Checkers. IBM Journal, Vol 3, No. 3, (July 1959) [3] Cranfill, R., Chang, C. What is Facebook s architecture?. Quora.com/Whatis-Facebooks-architecture-6 (12/2014) [4] K. Chellapilla, D. B. Fogel Evolving Neural Networks to Play Checkers Without Relying on Expert Knowledge. IEEE Transactions on Neural Networks, Vol. 10, No 6, (Nov 1999) [5] N. Franken, A. P. Engelbrecht Evolving intelligent game-playing agents. Proceedings of SAICSIT, Pages 102-110, (2003) [6] N. Franken, A. P. Engelbrecht Comparing PSO structures to learn the game of checkers from zero knowledge. The 2003 Congress on Evolutionary Computation. (2003) [7] A. Singh, K. Deep Use of Evolutionary Algorithms to Play the Game of Checkers: Historical Developments, Challenges and Future Prospects. Proceedings of the Third International Conference on Soft Computing for Problem Solving, Advances in Intelligent Systems and Computing 259, (2014) 17