HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player. Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, Peter Stone

Size: px

Start display at page:

Download "HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player. Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, Peter Stone"

Dayna Gibson
5 years ago
Views:

1 -GGP: A -based Atari General Game Player Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, Peter Stone

2 Motivation Create a General Video Game Playing agent which learns from visual representations Little domain specific knowledge Capable of playing different Atari 2600 games without reconfiguration

3 Introducing GVGP GGP agents given a declarative representation of the game including complete game dynamics In contrast we seek to learn from a visual representation without knowing dynamics General video game playing (GVGP) is a good challenge

4 Atari games with wildly varying dynamics Standard interface for control - 18 Actions Two player (Multi-agent) Capabilities Multiple standardized state representations Good open source emulation

6 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

7 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Raw Game Screen Visual Processing

8 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

9 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Continuously Valued Firings Visual Processing

10 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

11 -GGP Architecture Continuously Valued Outputs Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

12 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

13 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

14 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

15 Fitness Evaluation Atari 2600 Emulator Individual Game Score Neural Network CPPN At end of game, Score is given to the individual as fitness

16 Evolution Fitness Fitness Fitness Evolution then produces the next generation

17 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

18 Extension of NEAT Stanley, Ambrosio, Gauci A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks - Kenneth Stanley, David Ambrosio, Jason Gauci. Artificial Life 2009

20 Input node firings are continuous valued and taken directly from the processed screen

21 Firings are propagated through the network forming continuously valued outputs

22 How to determine connection weights?

23 Evolve Evolvethe the weights! weights! NEAT Maybe we can do better...

24 X1 Y1 X2 Y2 CPPN B

25 CPPN B

26 CPPN B

27 CPPN B

28 X1 Y1 X2 Y2 B CPPN determines all connection weights

29 CPPN Evolved by NEAT Add Nodes - Gaussian, sinusoidal, sigmoid, absolute value, linear Add links X1 Y1 X2 Y2 B Change Connection Weights

30 Advantages of vs NEAT Indirect Encoding Geometrically Aware Learn a function and apply it regardless of the absolute location Ultimately allows better policies to be found more quickly

31 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

32 Visual Processing Framework Raw Game Screen: 160x210 pixels; 256 colors

33 Visual Processing Framework Blob Detection Simple 8-connectivity check Each blob assigned a velocity based on its location in previous frame

34 Visual Processing Framework Adjacent blobs with non-zero velocity are merged into objects Object Detection

35 Visual Processing Framework Class Detection Objects with sufficient pixel similarity are said to belong to the same "object class" 3 main classes: car left, car right, chicken

36 Self Detection Information gain approach to identify the object on screen most likely to be the "self" Intuitively object that responds to actions Self circled in red

37 Atari- Interface Raw screen reduced to a 16x21 grid Mapping from object classes to continuous values

38 -GGP Architecture Action Selection Action Atari 2600 Emulator Neural Network CPPN Visual Processing

39 Action Selection Examine squares adjacent to "self" Take directional action corresponding to max valued neighbor no-op if "self" square is highest valued Up action selected in this case

40 -GGP Architecture Action Selection Action Atari 2600 Neural Network CPPN Visual Processing

41 Selected Games Examine 2 Atari games - Freeway and Asterix

42 Experimental Setup Run -GGP for 250 generations with 100 individuals in each generation Individual evaluations performed in parallel Individual fitness = raw game score Compared against previously published results using Sarsa-Lambda with linear function approximation

43 Results Freeway Asterix Sarsa-Lambda BASS* Sarsa-Lambda DISCO* Sarsa-Lambda RAM* Random GGP Avg -GGP Champ *Y. Naddaf. Game-independent ai agents for playing atari 2600 console games. Master's thesis, University of Alberta, 2010.

44 Freeway Results

45 Asterix Results

46 Related Work Atari Game Playing: Y. Naddaf. Game-independent ai agents for playing atari 2600 console games. Master's thesis, University of Alberta, GGP: M. Genesereth and N. Love. General game playing: Overview of the aaai competition. AI Magazine, 26:62-72, Ms. Pac-Man: S. M. Lucas. Ms pac-man competition (screen capturemode). uk/staff/sml/pacman/cig2011results.html. Quake II: M. Parker and B. Bryant. Backpropagation without human supervision for visual control in quake ii. Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games (CIG'09), pages , Pitfall: C. Diuk, A. Cohen, and M. L. Littman. An object-oriented representation for efficient reinforcement learning. In Proceedings of 25th International Conference on Machine Learning (ICML), pages , : P. Verbancsics and K. O. Stanley. Evolving static representations for task transfer. J. Mach. Learn. Res., 1: , August J. Gauci and K. O. Stanley. A case study on the critical role of geometric regularity in machine learning. In Proceedings of the 23rd National Conference on Articial Intelligence (AAAI), D. B. D'Ambrosio and K. O. Stanley. Generative encoding for multiagent learning. In GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages , New York, NY, USA, ACM. J. Clune, B. E. Beckmann, C. Ofria, and R. T. Pennock. Evolving coordinated quadruped gaits with the hyperneat generative encoding. In Proceedings of the Eleventh conference on Congress on Evolutionary Computation, CEC'09, pages 2764{2771, Piscataway, NJ, USA, IEEE Press.

47 Conclusion Introduce -GGP, a general Atari game playing agent Learns from the game screen using to evolve policies for gameplay Performance results exceed prior work (Sarsa-Lambda) for the games Asterix and Freeway In future work extend this system to more games Represents a first step toward the challenge of general video game playing (GVGP) from visual representations

48 Questions? -GGP Code: A.L.E. Code:

49 Self Detection Algorithm actions = set of actions applicable to this game current_blobs = set of blobs in the current game frame ActionHist = Set of action at time 0...n for blob b in current_blobs do vhist_b = Set of velocities of blob b at time 0...n H_b = H(vHist_b) for action a in actions do vhist_(b a) = [vhist_b[t] forall t s.t. ActionHist[t-1] == a] H_(b a) = H(vHist_(b a)) end for InfoGain_b = H_b - sum(p_a * H_(b a)) end for return arg_max over blobs (InfoGain_b)

Evolutionary robotics, neural networks, artificial intelligence. Assistant Professor, IT University of Copenhagen, July July 2016

Joel Lehman Contact Information Assistant Professor IT University of Copenhagen WWW: www.joellehman.com E-mail: jleh@itu.dk Research Interests Academic Experience Evolutionary robotics, neural networks,