Creating a Dominion AI Using Genetic Algorithms

Similar documents
An AI for Dominion Based on Monte-Carlo Methods

Evolutionary Computation for Creativity and Intelligence. By Darwin Johnson, Alice Quintanilla, and Isabel Tweraser

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

Evolutions of communication

Balanced Map Generation using Genetic Algorithms in the Siphon Board-game

CS 229 Final Project: Using Reinforcement Learning to Play Othello

International Journal of Modern Trends in Engineering and Research. Optimizing Search Space of Othello Using Hybrid Approach

GA Optimization for RFID Broadband Antenna Applications. Stefanie Alki Delichatsios MAS.862 May 22, 2006

Optimizing the State Evaluation Heuristic of Abalone using Evolutionary Algorithms

Creating a Poker Playing Program Using Evolutionary Computation

FreeCiv Learner: A Machine Learning Project Utilizing Genetic Algorithms

CYCLIC GENETIC ALGORITHMS FOR EVOLVING MULTI-LOOP CONTROL PROGRAMS

Evolving Behaviour Trees for the Commercial Game DEFCON

Playing Othello Using Monte Carlo

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Solving Sudoku with Genetic Operations that Preserve Building Blocks

Chapter 5 OPTIMIZATION OF BOW TIE ANTENNA USING GENETIC ALGORITHM

Evolving Adaptive Play for the Game of Spoof. Mark Wittkamp

Tree depth influence in Genetic Programming for generation of competitive agents for RTS games

BIEB 143 Spring 2018 Weeks 8-10 Game Theory Lab

Population Adaptation for Genetic Algorithm-based Cognitive Radios

Pareto Evolution and Co-Evolution in Cognitive Neural Agents Synthesis for Tic-Tac-Toe

Evolutionary Neural Networks for Non-Player Characters in Quake III

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

Reactive Planning with Evolutionary Computation

Cooperative Behavior Acquisition in A Multiple Mobile Robot Environment by Co-evolution

Machine Learning Othello Project

Evolutionary Optimization for the Channel Assignment Problem in Wireless Mobile Network

LANDSCAPE SMOOTHING OF NUMERICAL PERMUTATION SPACES IN GENETIC ALGORITHMS

Training a Neural Network for Checkers

Evolving robots to play dodgeball

THE EFFECT OF CHANGE IN EVOLUTION PARAMETERS ON EVOLUTIONARY ROBOTS

Comp 3211 Final Project - Poker AI

A Genetic Algorithm for Solving Beehive Hidato Puzzles

Using Neural Network and Monte-Carlo Tree Search to Play the Game TEN

Automating a Solution for Optimum PTP Deployment

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

Real-time Grid Computing : Monte-Carlo Methods in Parallel Tree Searching

CS 441/541 Artificial Intelligence Fall, Homework 6: Genetic Algorithms. Due Monday Nov. 24.

Developing an agent for Dominion using modern AI-approaches

Ayo, the Awari Player, or How Better Represenation Trumps Deeper Search

CS221 Final Project Report Learn to Play Texas hold em

Approaching The Royal Game of Ur with Genetic Algorithms and ExpectiMax

An intelligent Othello player combining machine learning and game specific heuristics

USING GENETIC ALGORITHMS TO EVOLVE CHARACTER BEHAVIOURS IN MODERN VIDEO GAMES

Automated Evaluation for AI Controllers in Tower Defense Game Using Genetic Algorithm

The Behavior Evolving Model and Application of Virtual Robots

Evolutionary Neural Network for Othello Game

Wire Layer Geometry Optimization using Stochastic Wire Sampling

Biologically Inspired Embodied Evolution of Survival

The Gold Standard: Automatically Generating Puzzle Game Levels

IV. MAP ANALYSIS. Fig. 2. Characterization of a map with medium distance and periferal dispersion.

CSC 396 : Introduction to Artificial Intelligence

Adaptive Hybrid Channel Assignment in Wireless Mobile Network via Genetic Algorithm

Virtual Global Search: Application to 9x9 Go

Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham

Andrei Behel AC-43И 1

Ensemble Approaches in Evolutionary Game Strategies: A Case Study in Othello

TEMPORAL DIFFERENCE LEARNING IN CHINESE CHESS

An Optimized Performance Amplifier

This is a postprint version of the following published document:

Population Initialization Techniques for RHEA in GVGP

LEARNABLE BUDDY: LEARNABLE SUPPORTIVE AI IN COMMERCIAL MMORPG

Optimization of Tile Sets for DNA Self- Assembly

An Intelligent Othello Player Combining Machine Learning and Game Specific Heuristics

Exploration and Analysis of the Evolution of Strategies for Mancala Variants

Evolution of Sensor Suites for Complex Environments

GENETIC PROGRAMMING. In artificial intelligence, genetic programming (GP) is an evolutionary algorithmbased

Artificial Intelligence Lecture 3

Artificial Intelligence

More on games (Ch )

Derive Poker Winning Probability by Statistical JAVA Simulation

Evolutionary Programming Optimization Technique for Solving Reactive Power Planning in Power System

SMARTER NEAT NETS. A Thesis. presented to. the Faculty of California Polytechnic State University. San Luis Obispo. In Partial Fulfillment

Analysis of Vanilla Rolling Horizon Evolution Parameters in General Video Game Playing

The Importance of Look-Ahead Depth in Evolutionary Checkers

Online Evolution for Cooperative Behavior in Group Robot Systems

OPTIMISING OFFENSIVE MOVES IN TORIBASH USING A GENETIC ALGORITHM

Improving AI for simulated cars using Neuroevolution

Simple Poker Game Design, Simulation, and Probability

Introduction to Genetic Algorithms

Genetic Algorithms with Heuristic Knight s Tour Problem

CS221 Project Final Report Gomoku Game Agent

BLUFF WITH AI. CS297 Report. Presented to. Dr. Chris Pollett. Department of Computer Science. San Jose State University. In Partial Fulfillment

The game of Reversi was invented around 1880 by two. Englishmen, Lewis Waterman and John W. Mollett. It later became

MyPawns OppPawns MyKings OppKings MyThreatened OppThreatened MyWins OppWins Draws

A Search-based Approach for Generating Angry Birds Levels.

STIMULATIVE MECHANISM FOR CREATIVE THINKING

Evolutionary Othello Players Boosted by Opening Knowledge

THE development of AI characters has played an important

Coevolution of Neural Go Players in a Cultural Environment

Coevolving team tactics for a real-time strategy game

AN EVOLUTIONARY ALGORITHM FOR CHANNEL ASSIGNMENT PROBLEM IN WIRELESS MOBILE NETWORKS

Using Artificial intelligent to solve the game of 2048

Localized Distributed Sensor Deployment via Coevolutionary Computation

Speeding-Up Poker Game Abstraction Computation: Average Rank Strength

46.1 Introduction. Foundations of Artificial Intelligence Introduction MCTS in AlphaGo Neural Networks. 46.

Evolving Parameters for Xpilot Combat Agents

Combining Cooperative and Adversarial Coevolution in the Context of Pac-Man

COMP SCI 5401 FS2015 A Genetic Programming Approach for Ms. Pac-Man

Transcription:

Creating a Dominion AI Using Genetic Algorithms Abstract Mok Ming Foong Dominion is a deck-building card game. It allows for complex strategies, has an aspect of randomness in card drawing, and no obvious optimal solution, making a good candidate for a machine learning study. Prior works involve using genetic algorithms in conjunction with an artificial neural network (ANN), but these used many features and do not explore the significance of some features based on a player s understanding of the game. The purpose of this project is to investigate if it is possible to create a strong agent from the game using less features based on prior knowledge. In this project, a genetic algorithm is used to create an agent to play the game. Its chromosomes serve as a guide when the agent makes decisions in the Buy Phase of the game. Candidates in a generation are pit against each other in a round robin, and the number of wins serve as a measure of fitness and are used in determining the next generation. The candidates were also pit against a benchmark candidate to measure their fitness. Introduction Dominion is a deck-building card game. While numerous studies have been done on games such as Chess and Go, little emphasis is placed on modern games such as Dominion. Games such as Dominion allow for complex strategies and multiple playstyles, which may serve as a useful testbed for various machine learning approaches. The presence of circularities, where one type of deck has an advantage over another in a circular manner, also means that there is no easily obtainable optimal solution for this problem. Deck building games usually involve two key decision points in two of the game s key phases. Action Phase - Involves solving the problem of how to best play the hand for the current turn Buy Phase - Involves solving the problem of how best to build the deck/acquire new cards to optimize it for future turns For this project, the approach used to create an AI for this game was done using genetic algorithms, but only for the deck building portion of the game (Buy Phase). A chromosome consisted of 3 x 13 positive integers, where 3 integers were used to represent 3 parameters of a specific card in the game. Using these chromosomes as a guide, the agent will aim to build towards a target deck. Using a fixed card playing strategy across all candidates for the Action Phase, each candidate will play against every other candidates. Based on the number of wins of each candidate, genetic operators are then applied on the chromosomes of these candidates. The fitness of each candidate is also measured against a previously generated candidate from another experiment run. Related Work Little prior work has been done on Dominion as a machine learning study. 2 previous work which use both genetic algorithms and an Artificial Neural Network (ANN) to train a dominion AI. 4 of them use an ANN. The complexity of decision making due to the number of card effects in the game and the numerous considerations a real player would have to make is one of the key reasons why a neural network is used. The genetic algorithm serves as a good optimization solution for a problem such as Dominion, where the presence of circularities mean that there exists no single optimal solution, and different candidates are strong or weak depending on their opponent. All prior works usually involves simulating the candidates in a round robin tournament and making use of the number of wins of each candidate to measure fitness. However, each work also applies a different method of calculation for the fitness of a candidate. Many different features and neural network designs are investigated

across all the works and there appears to be no optimal model. Features A reduced number of cards were used in this machine learning study for a simpler implementation of the game while still allowing for development of complex strategies. 3 Gold Cards, 3 Victory Cards, and 7 Kingdom Cards were used in the game. Each card has its own cost and card effect (Refer to Appendix) The 3 Gold Cards used are Copper, Silver and Gold. The 3 Victory Cards used are - Estate, Duchy, Province. The 7 Kingdom Cards used are - chapel, village, woodcutter, smithy, market, laboratory, festival. Each has a different card effect. Each card makes up 3 positive integers in the chromosome, giving 39 parameters in the chromosome for these 13 cards. These parameters correspond to Preference of buying the card over other cards. A higher preference meant that the candidate would buy that card over other cards Turn delays before buying of the card. Turn delays are the number of times that the opportunity to buy the card is skipped. This is to prevent the AI from buying cards that are only important at the end of the game, at the beginning (e.g. Victory Cards) Card limit in deck. Where possible, the agent should attempt to limit the maximum number of this card to this value. These parameters are created based on a personal understanding of the important considerations required in the game. The chromosomes were randomly initialized, with constraints set to some of the parameters. Each candidate was then pit against the other candidates in a round robin, and the distribution of wins was used to determine the next generation. Methods The Dominion agent plays the Action and Buy Phase in two different ways. The significantly different decision making process of these two phases make it easier and meaningful for these two phases to be considered separately. The Action Phase will be played according to a fixed set of rules, while the Buy Phase will be played using the chromosome as a guide. The scope of the genetic algorithm is only limited to the Buy Phase for simplicity and ease of implementation. Action Phase For the Action Phase, the agent plays the game according to a set of rules. The fixed strategy consists of these rules Cards in the hands are played in the order that would maximize the expected Gold value in their turn. Permutations of all possible card playing orders are calculated, and the order which results in the highest expected Gold value is used. This is to ensure that the agent is able to consistently buy/acquire better cards. Expected Gold value is calculated based on the current Gold in hand and the average Gold per card in the deck Under these conditions, it is possible that the simulation will take unnecessarily long due to inefficient decision making by the agent. A turn limit of 100 per game is set. This is more than double the turns required by a human player, and is sufficient for the simulation. Buy Phase For the Buy Phase, the parameters in the chromosome are used as follows.

Cards would be bought/added to the deck based on a preference score in the chromosome. Should scores be equal, the card bought is randomly selected. The buying of a card is constrained by the card limit and turn delay parameter. The turn delay parameter is reduced by 1 when the opportunity to buy the card is skipped, down till a minimum value of 0. Only when it reaches 0, will the agent buy the card. Using these rules, the genetic algorithm is carried out as follows. Evaluation All candidates then played each other candidate in 3 simulated matches in a round robin manner using a fixed strategy. Each candidate then has its number of wins across all matches calculated. This value will be used with the genetic operators. Since the number of wins of a candidate is significantly affected by the fitness of other candidates, there needs to be another metric such that the effectiveness of the algorithm can be measured. For this, the fittest candidate of another run of the experiment will be used as a benchmark. Each candidate will play against the benchmark in 50 simulated games. The number of wins will be used as a measure of effectiveness. Selection The two candidates with the highest number of wins are kept as elites, and will not be altered by any of the operators below. Pipeline of genetic algorithm Initialization For the 3 types of parameters mentioned above, Preference of buying cards was initialized with integers from 0 to 5 Turn delays was initialized at 0. Card limit was initialized from 0 to 6 These values are based on a personal understanding of what a general good range of values might be in the game. This was done to speed up the growth of the candidates, and would have little effect to the convergence of the solution given their small values relative to the change in the mutation phase. Crossover Each candidate will be generated via uniform crossover of two randomly selected parents. To ensure that fitter parents are used in the crossover, a sample space with duplicate parents is created. The number of duplicated candidates is proportionate to the number of wins the candidate has. The two parents are then selected from this sample space to form the next generation. Mutation Then, 20% of the new generation will be mutated. For mutation, candidates will be randomly selected, and have a random parameter in the chromosome changed by a random value of 1 or -1. All random selections are done such that each

outcome has an equal chance of being selected. Experimental Setup The experiment was carried out with 35 candidates across 150 generations. There is a high number of generations compared to candidates. This is a workaround for the low number of candidates due to the slow performance of the algorithm, as many games have to be simulated among all candidates in a round robin. These values were chosen after several rounds of testing, and gave a good balance of attaining convergence, learning and performance. Score of fittest candidate and generation average when facing benchmark candidate In the round robin, 3 games are played between the candidates, and 50 games are played against the benchmark candidate. The benchmark candidate was generated from an experiment run with all parameters kept the same, but with only 50 generations. A low number of games are performed to speed up the algorithm. It is possible that with a larger number of candidates, less games need to be played between each pair of candidates. In the genetic algorithm, 2 out of the 35 candidates are kept as elites, while the mutation rate is set at 0.2. Crossover is uniform. The value for elitism is arbitrary, with a constraint set at less than 10% of the number of candidates in a generation. The mutation rate is arbitrary. Results The algorithm has mostly converged after 150 generations, resulting in candidates that can achieve 20 wins out of 50 against the benchmark on average, and have intersected with the fittest candidate. The fittest candidate is also able to beat the benchmark about 50% of the time. It is possible that running more generations might result in an increase in the performance of the fittest candidate, but these results are sufficient to show the algorithm s effectiveness. Discussion It appears that the algorithm is performing well, with the fittest candidate being able to beat the benchmark for the first time after 54 generations despite only 35 candidates per population. There is a large variance for the scores against the benchmark candidate. Apart from the random nature of card draws, this is likely due to the fact that only 50 games were played against the benchmark candidate. It is likely that a smoother curve will appear should more games be played, as seen from the average score curve. Due to the low number of candidates in a generation, it is possible that there are genotypes that are unaccounted for even though the candidates have been through 150 generations. In addition, it is also possible that the solution converges to a local maxima given that the sample space for the crossover was biased towards candidates with higher wins. Due to the benchmark being generated with the same algorithm as the experiment, it is possible that the fittest candidate from the experiment and the benchmark candidate are solutions that revolve about the same local maxima, and may not be effective at all relative to the global maxima. There should ultimately be a benchmark that is generated separately and based on a commonly

known good strategy. Conclusion It appears that the genetic algorithm is a good method for creating an agent to play Dominion, as seen from the experiment and prior work. From the experiment, it seems that basing the model only on few key features is sufficient to create an agent that can play Dominion. However, as mentioned in the discussion, a different benchmark should be used to measure the effectiveness of the algorithm. One suggestion is to implement the Big Money strategy, which is a well-known beginner strategy that is able to perform well with the cards used in the experiment. Multiple benchmarks can also be implemented to investigate for the presence and effect of circularities. Other Applications of Applied Intelligent Systems, IEA/AIE 2015, Seoul, South Korea, June 10-12, 2015, Proceedings (Vol. 9101, p. 43). Springer. 3. Jansen, J. V., & Tollisen, R. (2014). An AI for dominion based on Monte-Carlo methods. 4. Mahlmann, T., Togelius, J., & Yannakakis, G. N. (2012, June). Evolving card sets towards balancing dominion. In 2012 IEEE Congress on Evolutionary Computation (pp. 1-8). IEEE. 5. Dominion Strategy. (n.d.). Retrieved December 16, 2016, from https://dominionstrategy.com In addition, with more computational resources, there should be a significant increase in the number of candidates in a generation. This should allow the algorithm to better converge to a global maxima. Also, the round robin can be changed to reduce the number of games. One possible implementation is to only pit the candidates against 10% of the other candidates which are randomly sampled. Lastly, since optimize plays for the current turn (Action Phase) is an equally important part of the game, there should definitely be another agent created to handle the decision making in the Action Phase, or both phases. More cards can be added into the experiment to allow for a closer representation of the game as well. References 1. Fynbo, R. B., & Nellemann, C. S. (2010). Developing an agent for dominion using modern ai-approaches. M. Sc. IT, Media Technology and Games (MTG-T) Center for Computer Games Research. 2. Glimsdal, S. (2015, April). AIs for Dominion Using Monte-Carlo Tree Search. In Current Approaches in Applied Artificial Intelligence: 28th International Conference on Industrial, Engineering and