Project 2: Searching and Learning in Pac-Man December 3, 2009 1 Quick Facts In this project you have to code A* and Q-learning in the game of Pac-Man and answer some questions about your implementation. You will have to work in groups of 2 people. You have to turn in source code and documentation of your project. The documentation you turn in should answer the questions asked in each of the different parts of the project. The project is designed to be coded in Java, but you can use other languages 1. For any questions, email me at: santi@iiia.csic.es 2 Pac-Man Pac-Man is a Japanese arcade game developed by Namco (now Namco Bandai) and licensed for distribution in the U.S. by Midway, first released in Japan on May 22, 1980. Immensely popular in the United States from its original release to the present day, Pac-Man is universally considered as one of the classics of the medium, virtually synonymous with video games, and an icon of 1980s popular culture. Figure 1 shows a screenshot of the original Pac-Man. In the game, the player controls Pac-Man through a maze, eating dots. When all dots are eaten, Pac- Man is taken to the next stage. Four ghosts (known to most gamers as Blinky, Pinky, Inky and Clyde) roam the maze, trying to catch Pac-Man. If a ghost touches Pac-Man, a life is lost. When all lives have been lost, the game ends. Near the corners of the maze are four larger, flashing dots known as energizers or power pellets that provide Pac-Man with the temporary ability to eat the ghosts. The ghosts turn deep blue, reverse direction, and usually move more 1 if you are willing to code your own implementation of Pac-Man (or look for an alternative one) you can use any other programming language. Another option is to write code to connect the Java version of Pac-Man we provide with your favorite language. 1
Figure 1: A screenshot of the original PACMAN. slowly when Pac-Man eats an energizer. When a ghost is eaten, its eyes return to the ghost pen where it is regenerated in its normal color. Blue ghosts flash white before they become dangerous again and the amount of time the ghosts remain vulnerable varies from one board to the next, but the time period generally becomes shorter as the game progresses. In later stages, the ghosts do not change colors at all, but still reverse direction when an energizer is eaten. In addition to dots and energizers, bonus items, usually referred to as fruits (though not all items are fruits) appear near the center of the maze twice per level. These items score extra bonus points when eaten. The items change and bonus values increase throughout the game. The AI of the original Pac-Man was limited to the movement of the ghosts, and was indeed very well thought of. However, it used very basic techniques for two main reasons. First of all, the hardware in which it ran wouldn t support any computational intensive algorithm, and second, if the AI of the ghosts is improved, Pac-Man would become impossible to win. In this project you will actually implement more advanced AI techniques for Pac-Man, but from the reverse point of view: you will code AI for Pac-Man himself instead of for the ghosts. In this project you will have to implement and experiment with a collection of search and learning algorithms. The project is divided up into two major parts. A first one where you will experiment with search algorithms, and a second where you will do so with reinforcement learning algorithms. 2
3 Preliminaries: Setting Up your Environment The file PacManSrc.zip provided to you contains the source code of a Java open source implementation of Pac-Man designed specifically to test AI algorithms. Using your favorite Java development environment (Eclipse or Netbeans are recommended) create a project containing the source code we provided you. The main class is pacman.game, which you can run to run the game. Make sure you can run the game before proceeding with the project. By default, a very simple Pac-Man AI is coded into the game. You can experiment with the values of the static variables on the top of the pacman.game file to change the speed of the game (movetime), the number of ghosts (defaultnumberghosts), and other parameters that will come in handy for testing your algorithms later in the project. 3.1 Creating a Pac-Man AI Before starting with the main parts of this projects, let us see how to create a simple Pac-Man AI. So simple, that will just ask Pac-Man to move in a random direction. 1. Create a new class in the package player (call it as you prefer, I will call it RandomPacManPlayer) and make it implement the PacManPlayer interface. 2. Define the method public Move choosemove(game game) to look like this: p u b l i c Move choosemove (Game game ) { Random r = new Random ( ) ; Move [ ] moves = {Move.LEFT, Move.RIGHT, Move.UP, Move.DOWN} ; r eturn moves [ r. nextint ( 4 ) ] ; } 3. Now, go to the main method in the pacman.game class. And change the line: PacManPlayer pacman = new SimplePacManPlayer ( ) ; by PacManPlayer pacman = new RandomPacManPlayer ( ) ; 4. Run the game, your random Pac-Man should be controlling the game right now. 3
4 Part A: Pac-Man Searches Shortest Paths In this first part, we want you to implement an A* algorithm which will be used by Pac-Man to find the optimal path in which the dots can be eaten. Moreover, in order to do so, we will initially remove the ghosts from the game. Implement a first version of the PacManPlayer, which at each step uses A* to find the shortest path to a dot in the map, and starts moving in that direction. Design an appropriate heuristic, and make sure it is admissible. Implement a second version, which at each step uses A* to find the shortest path to eat all the dots in the map (notice that the search space is much, much larger this time). Design an appropriate heuristic, and make sure it is admissible. For each PacManPlayer, evaluate the number of nodes it explores during the search (min, max, and average). As well as comparing it to when no heuristic is used (just run your PamManPlayer with a heuristic which always returns 0). Notice that the second implementation might take too long to be usable in a real game. Optional improvements: why did we decide not to use ghosts? What happens when we add ghosts to the game? Can you describe or even implement a strategy which would still use A*, and that will take into account that there are ghosts in the game? 5 Part B: Pac-Man Learns to Play the Complete Game In this second part, we want you to implement a Q-learning algorithm, to help Pac-Man learn how to play the complete game. This time, we will add ghosts to the game again. In order to implement Q-learning, the very first thing you need is to decide on a suitable state space. Notice that if you try to have one state for each possible configuration of the game state you would have an enormous number of states and Q-learning would never converge (assuming you can hold the state table in memory). A possible way to reduce it can be to consider just a window around Pac-Man. For example, let s say that we only consider the cells that are immediately north, south, east or west of Pac-Man. And that we assume each cell can be either: empty, a wall, a dot or a ghost (there might be a ghost over a dot, but let s ignore that for now). With that representation we would only have 4 4 = 256 states. This is better, but might be to little and not lead to any interesting strategy. Can you come up with a state representation which captures the useful information in the game state for playing a good game of Pac-Man, but that generates a small number of states? (try to stay in the thousands range). Try to define a representation, where each state is identified by a unique integer. 4
Create a new PacManPlayer, which will use reinforcement learning to learn how to play the game. Once you have decided a suitable state representation, you need to define a function which given an instance of the State class, returns the integer representing the state it corresponds to. Decide a proper reinforcement signal. For instance +1 for eating a dot, -1 for being killed by a ghost (design your own, we are sure you can do better than this). Implement the learning operations of reinforcement learning over your state representation using Q-learning. Notice that initially, Pac-Man will play really bad (since it has not learned). Modify the game so that it lets your new PacManPlayer play multiple games in a row. Now, make sure that the Q-table learned from one game is transferred to the next game (so that Pac-Man learns more and more with every game it plays). Make Pac-Man play a sequence of games, and record how many points does he manage to score in each of them before killed. Notice that Pac-Man might need hundreds (or even thousands) or games to learn (depending on how good is the state representation you selected). Optional improvements: do you think the ghosts are smart? why don t you apply Q-learning also to the control of the ghosts? What happens when both Pac-Man and the ghosts use reinforcement learning at the same time? do they converge to a stable behavior? does Pac-Man manage to complete the game, or do the ghosts manage to always eat Pac-Man? 6 Part C: Final Questions Compare the A* approach with the reinforcement learning approach. Which are the benefits and drawbacks of each? Pac-Man is an apparently simple domain, do you see these techniques as applicable to larger domains (either computer games or real life applications)? 7 Bibliography Concerning the A* and Q-learning algorithms, you can use Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig. It is the reference book in artificial intelligence. Concerning Java, I d recommend Effective Java by Joshua Bloch, but most Java books will do. Additionally, feel free to use any language you use (although you ll have to code your own Pac-Man game!). 5