DeepMind Lab. December 14, 2016

Similar documents
Transfer Deep Reinforcement Learning in 3D Environments: An Empirical Study

Playing Atari Games with Deep Reinforcement Learning

arxiv: v1 [cs.lg] 11 Dec 2017

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

an AI for Slither.io

arxiv: v4 [cs.ro] 21 Jul 2017

Playing FPS Games with Deep Reinforcement Learning

arxiv: v1 [cs.lg] 7 Nov 2016

Swing Copters AI. Monisha White and Nolan Walsh Fall 2015, CS229, Stanford University

General Video Game AI: Learning from Screen Capture

arxiv: v1 [cs.ai] 16 Oct 2018 Abstract

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL

Playing CHIP-8 Games with Reinforcement Learning

ViZDoom Competitions: Playing Doom from Pixels

PLAYING SNES IN THE RETRO LEARNING ENVIRONMENT ABSTRACT 1 INTRODUCTION

Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots

ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning

Robotics at OpenAI. May 1, 2017 By Wojciech Zaremba

arxiv: v1 [cs.lg] 30 May 2016

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Mastering the game of Go without human knowledge

Playing Geometry Dash with Convolutional Neural Networks

Behaviour-Based Control. IAR Lecture 5 Barbara Webb

GPU Computing for Cognitive Robotics

Reinforcement Learning Agent for Scrolling Shooter Game

arxiv: v1 [cs.ne] 3 May 2018

TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games

Deep Reinforcement Learning for General Video Game AI

CS7032: AI & Agents: Ms Pac-Man vs Ghost League - AI controller project

Virtual Worlds for the Perception and Control of Self-Driving Vehicles

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

Learning to Play Love Letter with Deep Reinforcement Learning

Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

Deep Imitation Learning for Playing Real Time Strategy Games

ADVANCED WHACK A MOLE VR

Improvised Robotic Design with Found Objects

Learning to Play 2D Video Games

Multi-Platform Soccer Robot Development System

RoboCup. Presented by Shane Murphy April 24, 2003

CS221 Project Final Report Deep Q-Learning on Arcade Game Assault

This is a postprint version of the following published document:

Deep Green. System for real-time tracking and playing the board game Reversi. Final Project Submitted by: Nadav Erell

The University of Melbourne Department of Computer Science and Software Engineering Graphics and Computation

Learning Combat in NetHack

Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks

AGENT PLATFORM FOR ROBOT CONTROL IN REAL-TIME DYNAMIC ENVIRONMENTS. Nuno Sousa Eugénio Oliveira

Semantic Segmentation on Resource Constrained Devices

arxiv: v2 [cs.lg] 13 Nov 2015

Evaluating Persuasion Strategies and Deep Reinforcement Learning methods for Negotiation Dialogue agents

Using a Team of General AI Algorithms to Assist Game Design and Testing

The role of testing in verification and certification Kerstin Eder

Augmenting Self-Learning In Chess Through Expert Imitation

Tutorial of Reinforcement: A Special Focus on Q-Learning

Deep RL For Starcraft II

FP7 ICT Call 6: Cognitive Systems and Robotics

Biologically Inspired Computation

Orchestrating Game Generation Antonios Liapis

DeepMind Self-Learning Atari Agent

Optimal Yahtzee performance in multi-player games

An Experimental Comparison of Path Planning Techniques for Teams of Mobile Robots

CS 354R: Computer Game Technology

GC Gadgets in the Rush Hour. Game Complexity Gadgets in the Rush Hour. Walter Kosters, Universiteit Leiden

SPACEYARD SCRAPPERS 2-D GAME DESIGN DOCUMENT

Learning from Hints: AI for Playing Threes

Experiments with Tensor Flow Roman Weber (Geschäftsführer) Richard Schmid (Senior Consultant)

UTILIZATION OF ROBOTICS AS CONTEMPORARY TECHNOLOGY AND AN EFFECTIVE TOOL IN TEACHING COMPUTER PROGRAMMING

Success Stories of Deep RL. David Silver

Implicit Fitness Functions for Evolving a Drawing Robot

Procedural Level Generation for a 2D Platformer

Neural Networks for Real-time Pathfinding in Computer Games

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Demystifying Machine Learning

Research on Hand Gesture Recognition Using Convolutional Neural Network

arxiv: v2 [cs.ai] 30 Oct 2017

Mobile Robot Navigation Contest for Undergraduate Design and K-12 Outreach

COMPUTER. 1. PURPOSE OF THE COURSE Refer to each sub-course.

CMSC 671 Project Report- Google AI Challenge: Planet Wars

Experiments with Learning for NPCs in 2D shooter

Human Level Control in Halo Through Deep Reinforcement Learning

Artificial Intelligence and Robotics Getting More Human

MULTI AGENT SYSTEM WITH ARTIFICIAL INTELLIGENCE

Distributed Vision System: A Perceptual Information Infrastructure for Robot Navigation

Keywords: Multi-robot adversarial environments, real-time autonomous robots

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

CS 229 Final Project: Using Reinforcement Learning to Play Othello

UvA Rescue Team Description Paper Infrastructure competition Rescue Simulation League RoboCup Jo~ao Pessoa - Brazil

arxiv: v1 [cs.lg] 16 Aug 2017

Stress Testing the OpenSimulator Virtual World Server

Verification and Validation for Safety in Robots Kerstin Eder

IMPROVING TOWER DEFENSE GAME AI (DIFFERENTIAL EVOLUTION VS EVOLUTIONARY PROGRAMMING) CHEAH KEEI YUAN

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

AN AUTONOMOUS SIMULATION BASED SYSTEM FOR ROBOTIC SERVICES IN PARTIALLY KNOWN ENVIRONMENTS

Artificial Intelligence Paper Presentation

Artificial Intelligence and Deep Learning

Dipartimento di Elettronica Informazione e Bioingegneria Robotics

Team KMUTT: Team Description Paper

Vishnu Nath. Usage of computer vision and humanoid robotics to create autonomous robots. (Ximea Currera RL04C Camera Kit)

Realistic Robot Simulator Nicolas Ward '05 Advisor: Prof. Maxwell

Artificial Intelligence. What is AI?

Transcription:

DeepMind Lab Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg and Stig Petersen arxiv:1612.03801v2 [cs.ai] 13 Dec 2016 December 14, 2016 Abstract DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community. Introduction General intelligence measures an agent s ability to achieve goals in a wide range of environments (Legg and Hutter, 2007). The only known examples of generalpurpose intelligence arose from a combination of evolution, development, and learning, grounded in the physics of the real world and the sensory apparatus of animals. An unknown, but potentially large, fraction of animal and human intelligence is a direct consequence of the perceptual and physical richness of our environment, and is unlikely to arise without it (e.g. Locke, 1690; Hume, 1739). One option is to directly study embodied intelligence in the real world itself using robots (e.g. Brooks, 1990; Metta et al., 2008). However, progress on that front will always be hindered by the too-slow passing of real time and the expense of the physical hardware involved. Realistic virtual worlds on the other hand, if they are sufficiently detailed, can get the best of both, combining perceptual and physical near-realism with the speed and flexibility of software. Previous efforts to construct realistic virtual worlds as platforms for AI research have been stymied by the considerable engineering involved. To fill the gap, we present DeepMind Lab. DeepMind Lab is a first-person 3D game platform built on top of id software s Quake III Arena (id software, 1999) engine. The world is rendered with rich science fiction-style visuals. Actions are to look around and move in 3D. Example tasks include navigation in mazes, collecting fruit, traversing dangerous passages and avoiding falling off cliffs, bouncing through space using launch pads to move between platforms, laser tag, quickly learning and remembering random procedurally generated environments, and tasks inspired by Neuroscience experiments. DeepMind Lab is already a major research platform within DeepMind. In particular, 1

it has been used to develop asynchronous methods for reinforcement learning (Mnih et al., 2016), unsupervised auxiliary tasks (Jaderberg et al., 2016), and to study navigation (Mirowski et al., 2016). DeepMind Lab may be compared to other game-based AI research platforms emphasising pixels-to-actions autonomous learning agents. The Arcade Learning Environment (Atari) (Bellemare et al., 2012), which we have used extensively at DeepMind, is neither 3D nor first-person. Among 3D platforms for AI research, DeepMind Lab is comparable to others like VizDoom (Kempka et al., 2016) and Minecraft (Johnson et al., 2016; Tessler et al., 2016). However, it pushes the envelope beyond what is possible in those platforms. In comparison, DeepMind Lab has considerably richer visuals and more naturalistic physics. The action space allows for fine-grained pointing in a fully 3D world. Compared to VizDoom, DeepMind Lab is more removed from its origin in a first-person shooter genre video game. This work is different and complementary to other recent projects which run as plugins to access internal content in the Unreal engine (Qiu and Yuille, 2016; Lerer et al., 2016). Any of these systems can be used to generate static datasets for computer vision as described e.g., in Mahendran et al. (2016); Richter et al. (2016). Artificial general intelligence (AGI) research in DeepMind Lab emphasises 3D vision from raw pixel inputs, first-person (egocentric) viewpoints, fine motor dexterity, navigation, planning, strategy, time, and fully autonomous agents that must learn for themselves what tasks to perform by exploration of their environment. All these factors make learning difficult. Each are considered frontier research questions on their own. Putting them all together in one platform, as we have, is a significant challenge for the field. DeepMind Lab Research Platform DeepMind Lab is built on top of id software s Quake III Arena (id software, 1999) engine using the ioquake3 (Nussel et al., 2016) version of the codebase, which is actively maintained by enthusiasts in the open source community. DeepMind Lab also includes tools from q3map2 (GtkRadiant, 2016) and bspc (bspc, 2016) for level generation. The bot scripts are based on code from the OpenArena (OpenArena, 2016) project. Tailored for machine learning A custom set of assets were created to give the platform a unique and stylised look and feel, with a focus on rich visuals tailored for machine learning. A reinforcement learning API has been built on top of the game engine, providing agents with complex observations and accepting a rich set of actions. The interaction with the platform is lock-stepped, with the engine stepped forward one simulation step (or multiple with repeated actions, if desired) at a time, according to a user-specified frame rate. Thus, the game is effectively paused after an observation is provided until an agent provides the next action(s) to take. Observations At each step, the engine provides reward, pixel-based observations and, optionally, velocity information (figure 1): 2

Figure 1: Observations available to the agent. In our experience, reward and pixels are sufficient to train an agent, whereas depth and velocity information can be useful for further analysis. Figure 2: The action space includes movement in three dimensions and look direction around two axes. 1. The reward signal is a scalar value that is effectively the score of each level. 2. The platform provides access to the raw pixels as rendered by the game engine from the player s first-person perspective, formatted as RGB pixels. There is also an RGBD format, which additionally exposes per-pixel depth values, mimicking the range sensors used in robotics and biological stereo-vision. 3. For certain research applications the agent s translational and angular velocities may be useful. These are exposed as two separate three-dimensional vectors. Actions Agents can provide multiple simultaneous actions to control movement (forward/back, strafe left/right, crouch, jump), looking (up/down, left/right) and tagging (in laser tag levels with opponent bots), as illustrated in figure 2. 3

Example levels Figures 7 and 8 show a gallery of screen shots from the first-person perspective of the agent. The levels can be divided into four categories: 1. Simple fruit gathering levels with a static map (seekavoid_arena_01 and stairway_to_melon). The goal of these levels is to collect apples (small positive reward) and melons (large positive reward) while avoiding lemons (small negative reward). 2. Navigation levels with a static map layout (nav_maze_static_0{1, 2, 3} and nav_maze_random_goal_0{1, 2, 3}). These levels test the agent s ability to find their way to a goal in a fixed maze that remains the same across episodes. The starting location is random. In the random goal variant, the location of the goal changes in every episode. The optimal policy is to find the goal s location at the start of each episode and then use long-term knowledge of the maze layout to return to it as quickly as possible from any location. The static variant is simpler in that the goal location is always fixed for all episodes and only the agent s starting location changes so the optimal policy does not require the first step of exploring to find the current goal location. The specific layouts are shown in figure 3. 3. Procedurally-generated navigation levels requiring effective exploration of a new maze generated on-the-fly at the start of each episode (random_maze). These levels test the agent s ability to explore a totally new environment. The optimal policy would begin by exploring the maze to rapidly learn its layout and then exploit that knowledge to repeatedly return to the goal as many times as possible before the end of the episode (three minutes). 4. Laser-tag levels requiring agents to wield laser-like science fiction gadgets to tag bots controlled by the game s in-built AI (lt_horseshoe_color, lt_chasm, lt_hallway_slope, and lt_space_bounce_hard). A reward of 1 is delivered whenever the agent tags a bot by reducing its shield to 0. These levels approximate the usual gameplay from Quake III Arena. In lt_hallway_slope there is a sloped arena, requiring the agent to look up and down. In lt_chasm and lt_space_bounce_hard there are pits that the agent must jump over and avoid falling into. In lt_horseshoe_color and lt_space_bounce_hard, the colours and textures of the bots are randomly generated at the start of each episode. This prevents agents from relying on colour for bot detection. These levels test aspects of fine-control (for aiming), planning (to anticipate where bots are likely to move), strategy (to control key areas of the map such as gadget spawn points), and robustness to the substantial visual complexity arising from the large numbers of independently moving objects (gadget projectiles and bots). Technical Details The original game engine is written in C and, to ensure compatibility with future changes to the engine, it has only been modified where necessary. DeepMind Lab provides a simple C API and ships with Python bindings. 4

Figure 3: Top-down views of static maze levels. Left: nav_maze_static_01, middle: nav_maze_static_02 and right: nav_maze_static_03. The platform includes an extensive level API, written in Lua, to allow custom level creation and mechanics. This approach has resulted in a highly flexible platform with minimal changes to the original game engine. DeepMind Lab supports Linux and has been tested on several major distributions. API for agents and humans The engine can be run either in a window, or it can be run headless for higher performance and support for non-windowed environments like a remote terminal. Rendering uses OpenGL and can make use of either a GPU or a software renderer. A DeepMind Lab instance is initialised with the user s settings for level name, screen resolution and frame rate. After initialisation a simple RL-style API is followed to interact with the environment, as per figure 4. 1 2 3 # Construct and start the environment. lab = deepmind_lab. Lab ( seekavoid_arena_01, [ RGB_INTERLACED ]) lab. reset () 4 5 6 # Create all - zeros vector for actions. action = np. zeros ([7], dtype = np. intc ) 7 8 9 # Advance the environment 4 frames while executing the action. reward = env. step ( action, num_steps =4) 10 11 12 13 14 # Retrieve the observations of the environment in its new state. obs = env. observations () # dict of Numpy arrays rgb_i = obs [ RGB_INTERLACED ] assert rgb_i. shape == (240, 320, 3) Figure 4: Python API example. Level generation Levels for DeepMind Lab are Quake III Arena levels. They are packaged into.pk3 files (which are ZIP files) and consist of a number of components, including level geometry, navigation information and textures. DeepMind Lab includes tools to generate maps from.map files. These can be cumbersome to edit by hand, but a variety of level editors are freely available, e.g. 5

GtkRadiant (GtkRadiant, 2016). In addition to built-in and user-provided levels, the platform offers Text Levels, which are simple, human-readable text files, to specify walls, spawn points and other game mechanics as shown in the example in figure 5. Refer to figure 6 for a render of the generated level. 1 map = [[ 2 ************** 3 * * ******* 4 ** * *** 5 ***** I *** 6 ***** * *** 7 ***** ******* 8 ***** ****** 9 ****** H ******* 10 * I P * 11 ************** 12 ]] Figure 5: Example text level specification, where * is a wall piece, P is a spawn point and H and I are doors. Figure 6: A level with the layout generated from the text in figure 5. In the Lua-based level API each level can be customised further with logic for bots, item pickups, custom observations, level restarts, reward schemes, in-game messages and many other aspects. Results and Performance Tables 1 and 2 show the platform s performance at different resolutions for two typical levels included with the platform. The frame rates listed were computed by connecting an agent performing random actions via the Python API. This agent has insignificant overhead so the results are dominated by engine simulation and rendering times. 6

The benchmarks were run on a Linux desktop with a 6-core Intel Xeon 3.50GHz CPU and an NVIDIA Quadro K600 GPU. CPU GPU RGB RGBD RGB RGBD 84 x 84 199.7 189.6 996.6 995.8 160 x 120 86.8 85.4 973.2 989.2 320 x 240 27.3 27.0 950.0 784.7 Table 1: Frame rate (frames/second) on nav_maze_static_01 level. CPU GPU RGB RGBD RGB RGBD 84 x 84 286.7 263.3 866.0 850.3 160 x 120 237.7 263.6 903.7 767.9 320 x 240 82.2 98.0 796.2 657.8 Table 2: Frame rate (frames/second) on lt_space_bounce_hard level. Machine learning results from early versions of the DeepMind Lab platform can be found in Mnih et al. (2016); Jaderberg et al. (2016); Mirowski et al. (2016). Conclusion DeepMind Lab enables research in a 3D world with rich science fiction visuals and game-like physics. DeepMind Lab facilitates creative task development. A wide range of environments, tasks, and intelligence tests can be built with it. We are excited to see what the research community comes up with. Acknowledgements This work would not have been possible without the support of DeepMind and our many colleagues there who have helped mature the platform. In particular we would like to thank Thomas Köppe, Hado van Hasselt, Volodymyr Mnih, Dharshan Kumaran, Timothy Lillicrap, Raia Hadsell, Andrea Banino, Piotr Mirowski, Antonio Garcia, Timo Ewalds, Colin Murdoch, Chris Apps, Andreas Fidjeland, Max Jaderberg, Wojtek Czarnecki, Georg Ostrovski, Audrunas Gruslys, David Reichert, Tim Harley and Hubert Soyer. 7

References Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2012. Rodney A Brooks. Elephants don t play chess. Robotics and autonomous systems, 6 (1):3 15, 1990. bspc. bspc, 2016. URL https://github.com/ttimo/bspc. GtkRadiant. Gtkradiant, 2016. URL http://icculus.org/gtkradiant/. David Hume. Treatise on human nature. 1739. id software. Quake3, 1999. URL https://github.com/id-software/ Quake-III-Arena. Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. arxiv preprint arxiv:1611.05397, 2016. Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The malmo platform for artificial intelligence experimentation. In International joint conference on artificial intelligence (IJCAI), 2016. Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaśkowski. Vizdoom: A doom-based ai research platform for visual reinforcement learning. arxiv preprint arxiv:1605.02097, 2016. Shane Legg and Marcus Hutter. Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4):391 444, 2007. Adam Lerer, Sam Gross, and Rob Fergus. Learning physical intuition of block towers by example. arxiv preprint arxiv:1603.01312, 2016. John Locke. An essay concerning human understanding. 1690. A Mahendran, H Bilen, JF Henriques, and A Vedaldi. Researchdoom and cocodoom: Learning computer vision with games. arxiv preprint arxiv:1610.02431, 2016. Giorgio Metta, Giulio Sandini, David Vernon, Lorenzo Natale, and Francesco Nori. The icub humanoid robot: an open platform for research in embodied cognition. In Proceedings of the 8th workshop on performance metrics for intelligent systems, pages 50 56. ACM, 2008. Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, et al. Learning to navigate in complex environments. arxiv preprint arxiv:1611.03673, 2016. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. arxiv preprint arxiv:1602.01783, 2016. 8

Ludwig Nussel, Thilo Schulz, Tim Angus, Tony J White, and Zachary J Slater. ioquake3, 2016. URL https://github.com/ioquake/ioq3. OpenArena. The openarena project, 2016. URL http://www.openarena.ws. Weichao Qiu and Alan Yuille. Unrealcv: Connecting computer vision to unreal engine. arxiv preprint arxiv:1609.01326, 2016. Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. Playing for data: Ground truth from computer games. In European Conference on Computer Vision, pages 102 118. Springer, 2016. Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J Mankowitz, and Shie Mannor. A deep hierarchical approach to lifelong learning in minecraft. arxiv preprint arxiv:1604.07255, 2016. 9

Figure 7: Example images from the agent s egocentric viewpoint from several example DeepMind Lab levels. 10

Figure 8: Example images from the agent s egocentric viewpoint from several example DeepMind Lab levels. 11