FINAL PROJECT ARTIFICIAL INTELLIGENCE VINDINIUM. Hosam Hakroush and Dmitry Levikov FOUR LEGENDARY HEROES, FIGHTING FOR THE LAND OF VINDINIUM

Size: px

Start display at page:

Download "FINAL PROJECT ARTIFICIAL INTELLIGENCE VINDINIUM. Hosam Hakroush and Dmitry Levikov FOUR LEGENDARY HEROES, FIGHTING FOR THE LAND OF VINDINIUM"

Rafe Hunt
5 years ago
Views:

1 FINAL PROJECT ARTIFICIAL INTELLIGENCE Hosam Hakroush and Dmitry Levikov VINDINIUM FOUR LEGENDARY HEROES, FIGHTING FOR THE LAND OF VINDINIUM ROAMING IN THE DANGEROUS WOODS SLASHING GOBLINS, STEALING GOLD MINES AND LOOKING FOR A TAVERN TO DRINK AWAY THEIR TIREDNESS

2 TABLE OF CONTENTS Introduction... 3 Background... 3 Heroes... 3 Turns... 3 Mines... 5 How To Run... 6 Implementation... 7 Important Remarks... 7 Added/Changed files... 9 Reflex Agent Wiki Implementation Results MiniMax Agent Wiki Implementation Results Q_Learning Agent Wiki Implementation Results Comparison Combined Agent Wiki Implementation Results Conclusion P a g e

3 INTRODUCTION BACKGROUND Vindinium is a multiplayer turn based game where each player chooses a coding language of his choice and then develops his ultimate legendary hero that can move across a given generated 2D map. The objective is to amass as much gold as you can during a predetermined number of turns. In order to produce gold, players must take control of mines, which are protected by goblins. HEROES Each hero can move by one square each turn, and has the following attributes: Life points: starting at the maximum value of 100. If it drops to 0, the hero dies and immediately respawns on his initial position as given in the map s layout(see below) Gold: starting at 0. Represents the hero s goal in the game, the more the better and the winner at the end of the game is the richest hero. TURNS Each bot must issue an order for his hero at the end of each turn. Possible orders are: Stay, North, South, East or West. Once the order is issued, the hero moves accordingly (see below). 3 P a g e

Tries to step out of the map or over a tree, nothing happens.

4 Movement For each step the hero takes, he loses 1 HP. However, the hero s HP never drops below 1 by walking. If the hero: Steps into another hero, he stays in place and nothing happens. They fight when the current turn ends. Tries to step out of the map or over a tree, nothing happens. Steps into a tavern, he stays there, orders a beer and restores 50HP for 2 gold coins, however, HP cannot exceed 100. Has a neighboring hero which is 1 diagonal/vertical square away, he attacks him with his vicious sharp blade, the defending hero loses 20HP, and if he dies the attacker automatically gains control of all his conquered mines. 4 P a g e

5 Maps Maps are generated randomly using a similar layout to pacman s. Here s an example to a simple map: MINES Each mine profits the owner 1 gold coin per turn, however, a mine can be conquered by another hero as detailed above, and then all of the mine s profits go to the new owner. 5 P a g e

6 HOW TO RUN The game didn t offer a quickly compiled game and many configurations had to be made before one can see the work of his bot. Therefore we changed their U.I. in order to play a game please follow these steps: Unzip the code folder Cd to that location Run the following command: python client.py [mode] [agent] [depth] 1. where mode is one of the following: arena you play online but you have to wait for other players to join your game training you play offline against supplied bots 2. agent is one of the following: reflex use our reflex agent qlearning use our qlearning agent combined use our Combined agent MiniMax must have depth after it. click 1 in order to play the game After running, if you chose arena you re ought to wait a bit until some players join. Then, after the game starts, you get at the log a link to the online game, even if its against random given agents, you browse to that link and then you re *Note that the game requires python 2.7 to run 6 P a g e

7 IMPLEMENTATION Although the challenge offered us a starter kit in Python, it lacked basic clarity and reading the code thoroughly was a difficult task itself. The kit handed a random agent as an example of how to use the supplied functions and variables including a guide to how to move your hero. However, after weeks of sleepless nights, we managed to get four great working agents based on algorithms we learned in class during the semester. Our hard work paid off, greatly. We re now even able to beat worldwide leaders. IMPORTANT REMARKS We immediately figured that we have to use smart, complex agents in order to excel in this game and as stated earlier the kit lacked certain essential components, therefore we defined many new helper files, the most important of them was the state class(defined in state.py), which represents each state in the game. An object of the State class would include the following variables/methods: init (self, game): the State s initializer. generatesuccessor(self, the_hero, new_pos): generates a new State object describing the resulting state upon moving the_hero to position new_pos game : the game itself. An object of the class Game enemy_heroes : all the other heroes in the game where a hero is an object of the inner class Game::Hero enemy_locations : the locations of all the other heroes in the game, a location is a tuple (x,y) nearest_enemy_pos, nearest_mine_pos, nearest_tavern _pos: the locations of the nearest enemy, not owned mine, tavern found by an search algorithm defined in path_tools.py. 7 P a g e

8 actions : is a list of Action objects, Action is a simple parent class (defined in actions.py) that contains the name of the action, where the action is located and the path to that location (The path to the location was calculated using an algorithm we defined in path_tools.py, given an initial position, uses a chosen heuristic to find the desired node fast, we re using the Manhattan distance as a default heuristic). An action can be one of the following o Tavern(Action) : the closest tavern to us, where we can refill our HP by 50 for 2 gold coins. o Nothing(Action) : do nothing. o Mine(Action) : only if we don t own all mines, a Mine object is appended to the actions list containing the closest unoccupied mine s details. o Fight(Action) : for each other hero in the game, we add a Fight object containing details on how to reach him and fight him. After defining the important State Class and all the Action classes, we noticed that we can t use gold as our score/goal in the game because it lacks many other aspects of what a good state and what s not, for example it s better for us to be close to a tavern when our HP isn t full, and it s better for us that our foes are doing worse than us. Therefore we then defined a state evaluating function(evaluate_state which is defined in heuristics.py); given a state, the function evaluates it using many aspects including the ones in the examples above, returns a score that s used by our agents, also, the supplied files do not contain any way to run more than 1 game in fast session(to teach our learning agent) so he had to develop a smart emulator that fights 1 learning agent vs. 3 reflex agents (defined in emulator.py) 8 P a g e

9 ADDED/CHANGED FILES File ai.py actions.py combined_agent.py config.py emulator.py heuristics.py minimax_agent.py path_tools.py q_learning_agent.py reflex_agent.py state.py util.py What it contains/what was changed Changed it so that after executing config.py one of our agents will be chosen. The action object Our Combined Agent By using this file, one can choose which agent he wants to play our defined emulator Two functions; evaluate_action and evaluate_state, defined to evaluate a given state/action Our MiniMax Agent Path finding functions 1. Our Learning Agent. 2. SimpleExtractor - returns all the features in a given game object. 3. IdentityExtractor same but returns a specific feature. Our Reflex Agent The State Class Contains Counter, PriorityQueue classes (as defined in pacman exercises) 9 P a g e

10 REFLEX AGENT WIKI Reflex agents act only on the basis of the current percept, ignoring the rest of the percept history. This agent succeeds only if the environment is fully observable. When an agent faces a partially observable environment infinite loops are often unavoidable. IMPLEMENTATION We worked very hard on our reflex agent, the agent as expected excelled in collecting gold, defending his mines and conquering other mines, it was based on calculating certain scores using personally defined functions and then choosing the action that yields us the best score. We faced difficulties because the game contains many aspects a person has to take care of. Our reflex agent does the following: 1. Gets all the legal moves using get_moves(defined in path_tools.py). 2. Calculates the maximum value we can achieve by following any of the actions in state.actions using our implemented evaluate_action function 3. Choose a random action from all of the actions that can achieve the previously calculated maximum value. 4. Returns a tuple containing the optimal path, the optimal action, and the decision. 10 P a g e

Results Unsurprisingly, our Reflex Agent worked perfectly

However, in the arena, where we fought other developed

very nice amount of gold and most of the time we finished

worldwide leaders (#1 and #2 respectfully), we

11 Results Unsurprisingly, our Reflex Agent worked perfectly and won the game against offline bots almost every time: However, in the arena, where we fought other developed bots, things got a bit harder, although we collected a very nice amount of gold and most of the time we finished 1 st or 2 nd, upon facing zuborg or Mini-Me, the worldwide leaders (#1 and #2 respectfully), we encountered some embarrassing moments, including results like these: 11 P a g e

Nevertheless, we accomplished what many couldn t and we beat zuborg not once but twice : MINIMAX AGENT WIKI Minimax (sometimes MinMax or MM) is a decision rule used in decision

Originally formulated for two-player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been

12 Nevertheless, we accomplished what many couldn t and we beat zuborg not once but twice : MINIMAX AGENT WIKI Minimax (sometimes MinMax or MM) is a decision rule used in decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. Originally formulated for two-player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been extended to more complex games and to general decision-making in the presence of uncertainty. IMPLEMENTATION Our Minimax agent does the following: 1. Uses generateenemymovepermutations(defined in MiniMax class in ai.py) to generate all of the 12 P a g e

enemies possible moves (5- moves for each hero totaling 15- total permutations). 2. Uses get_moves(defined in path_tools.py) to get all of our hero s possible moves. 3.

13 enemies possible moves (5- moves for each hero totaling 15- total permutations). 2. Uses get_moves(defined in path_tools.py) to get all of our hero s possible moves. 3. Uses recursion with a given depth to get the possible outcome for the next (depth) moves using the MiniMax algorithm, meaning, when it s the enemies turn to play, they ll do their best to minimize the score, and when it s my hero s time to play. Where the score is evaluated using evaluate_action(defined in heuristics.py) 4. Returns a tuple containing the optimal path, the optimal action, and the decision. Results Unfortunately, after full implementation and reaching a fully active MiniMax agent, the agent failed in achieving a higher score than the reflex agent which was developed perfectly, the processors in the computer labs were not good enough to go past a depth of 2, depth 3 was very slow and rarely finished the game. depth 4 even got us the following error: 13 P a g e

14 Q_LEARNING AGENT WIKI Q-learning is a model-free reinforcement learning technique. Specifically, Q-learning can be used to find an optimal action-selection policy for any given (finite) Markov decision process (MDP). It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. A policy is a rule that the agent follows in selecting actions, given the state it is in. When such an actionvalue function is learned, the optimal policy can be constructed by simply selecting the action with the highest value in each state. One of the strengths of Q- learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment. IMPLEMENTATION Our Q_Learning agent does the following: 1. Get all the possible legal actions using self.getlegalactions 2. On probability epsilon choose a random move, otherwise, choose the best move using self.getpolicy which uses the q_learning mechanism to choose the action yielding us the highest QValue where QValue is calculated using previously calculated features and values. 3. Returns a tuple containing the optimal path, the optimal action, and the decision. 14 P a g e

RESULTS Since the learning algorithm requires previous game spectating and it learns from previous actions/moves, the Q_Learning agent wasn t applicable in the online game, as the online game

15 RESULTS Since the learning algorithm requires previous game spectating and it learns from previous actions/moves, the Q_Learning agent wasn t applicable in the online game, as the online game requires fast responds when calling getmove, although we won against offline heroes, The agent failed online where people mainly used heuristics and search algorithms, approaches that use far less memory/time. The Q_learning agent wasn t created in order to be run on the online arena, it was mainly developed to be used by the amazing combined agent. COMPARISON We were almost certain that the Q_Learning agent is no match to our ferocious reflex agent, however we wanted to be sure, so after implementing the emulator(emulator.py) we compiled it and had 3 reflex agents fight 1 Q_Learning agent multiple games. Here are some of the results: 15 P a g e

16 COMBINED AGENT WIKI The combined agent uses a combination of our two best agents; the reflex agent and the q learning agent to implement a learning agent which is applicable in the real online contest, since obviously, we can t stop the online game in order to learn what to do and what not to do, we simply learn on the run. IMPLEMENTATION RESULTS Our combined agent does the following whenever getmove is called: 1. Initially, the powerful reflex agent is used. 2. While playing the game and moving, have our agent learn and study the behavior of other heroes. 3. When completing the learning process, start using the Q_Learning agent(other heroes are bots as well so the learning process is ought to end) Since the combined agent uses a reflex agent and learns by the Q_Learning agent on the run, we received a much similar agent to the Reflex Agent. We beat every offline hero and we won 60% of the times online, the combined agent mainly used the reflex agent since the learning process took a lot of time, we rarely got to the learning agent s part, but when we did. Our enemies dug their graves themselves. 16 P a g e

17 17 P a g e We even won in a game where zuborg played with 2 bots, and the 4 th hero was Mini-Me!:

18 CONCLUSION Although all of our agents work theoretically perfect, MiniMax and QLearning failed to achieve high scores due to the game s complex and difficult rules. MiniMax failed in achieving any depth greater than 2, which makes it a very lousy agent given the fact that the average map in this game has a size of 12X12. The game has 4 heroes. With 5 possible moves for each, To all that add the fact that we used an average computer (8gb ram) any depth less than 4 wouldn t be very helpful, thus proving that MiniMax isn t quite helpful when we re playing high level complex games ( it would be helpful only when using super computers, even then, we doubt it will be better than our Combined Agent). As for the QLearning agent, as I mentioned earlier, this agent wasn t meant to be tried out for combat in the arena, but a smart tweak to the already amazing reflex agent. Our ace, the Combined agent, beat up a lot of enemies and amazed us in his runs, if we keep the bot running we re certain in reaching worldwide rankings. 18 P a g e

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( )

COMP3211 Project. Artificial Intelligence for Tron game. Group 7. Chiu Ka Wa ( ) Chun Wai Wong ( ) Ku Chun Kit ( ) COMP3211 Project Artificial Intelligence for Tron game Group 7 Chiu Ka Wa (20369737) Chun Wai Wong (20265022) Ku Chun Kit (20123470) Abstract Tron is an old and popular game based on a movie of the same