Reinforcement Learning Simulations and Robotics

Similar documents
Overview Agents, environments, typical components

Tutorial of Reinforcement: A Special Focus on Q-Learning

Vishnu Nath. Usage of computer vision and humanoid robotics to create autonomous robots. (Ximea Currera RL04C Camera Kit)

CSE-571 AI-based Mobile Robotics

ECE 517: Reinforcement Learning in Artificial Intelligence

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill

BRIDGING THE GAP: LEARNING IN THE ROBOCUP SIMULATION AND MIDSIZE LEAGUE

The next level of intelligence: Artificial Intelligence. Innovation Day USA 2017 Princeton, March 27, 2017 Michael May, Siemens Corporate Technology

CPS331 Lecture: Intelligent Agents last revised July 25, 2018

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Using Policy Gradient Reinforcement Learning on Autonomous Robot Controllers

Biologically Inspired Embodied Evolution of Survival

Deep Learning for Autonomous Driving

Mental rehearsal to enhance navigation learning.

Informatics 2D: Tutorial 1 (Solutions)

CS343 Artificial Intelligence

Hierarchical Controller for Robotic Soccer

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

Intelligent Agents p.1/25. Intelligent Agents. Chapter 2

Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment

Playing CHIP-8 Games with Reinforcement Learning

Agent. Pengju Ren. Institute of Artificial Intelligence and Robotics

Q Learning Behavior on Autonomous Navigation of Physical Robot

Intelligent Agents & Search Problem Formulation. AIMA, Chapters 2,

City Research Online. Permanent City Research Online URL:

USING VALUE ITERATION TO SOLVE SEQUENTIAL DECISION PROBLEMS IN GAMES

COOPERATIVE STRATEGY BASED ON ADAPTIVE Q- LEARNING FOR ROBOT SOCCER SYSTEMS

Plan Execution Monitoring through Detection of Unmet Expectations about Action Outcomes

SPQR RoboCup 2016 Standard Platform League Qualification Report

A Comparison of PSO and Reinforcement Learning for Multi-Robot Obstacle Avoidance

the question of whether computers can think is like the question of whether submarines can swim -- Dijkstra

CandyCrush.ai: An AI Agent for Candy Crush

Administrivia. CS 188: Artificial Intelligence Spring Agents and Environments. Today. Vacuum-Cleaner World. A Reflex Vacuum-Cleaner

Reinforcement Learning

Multi robot Team Formation for Distributed Area Coverage. Raj Dasgupta Computer Science Department University of Nebraska, Omaha

A Novel Cognitive Anti-jamming Stochastic Game

Developing Frogger Player Intelligence Using NEAT and a Score Driven Fitness Function

Learning to play Dominoes

HIT3002: Introduction to Artificial Intelligence

COMPACT FUZZY Q LEARNING FOR AUTONOMOUS MOBILE ROBOT NAVIGATION

Embodiment from Engineer s Point of View

CPS331 Lecture: Agents and Robots last revised November 18, 2016

Evolutionary robotics Jørgen Nordmoen

Russell and Norvig: an active, artificial agent. continuum of physical configurations and motions

Section Marks Agents / 8. Search / 10. Games / 13. Logic / 15. Total / 46

Disturbance Rejection Using Self-Tuning ARMARKOV Adaptive Control with Simultaneous Identification

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS

Mutliplayer Snake AI

[31] S. Koenig, C. Tovey, and W. Halliburton. Greedy mapping of terrain.

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

CS494/594: Software for Intelligent Robotics

Push Path Improvement with Policy based Reinforcement Learning

Ball Dribbling for Humanoid Biped Robots: A Reinforcement Learning and Fuzzy Control Approach

Structure and Synthesis of Robot Motion

E190Q Lecture 15 Autonomous Robot Navigation

Multi-Robot Teamwork Cooperative Multi-Robot Systems

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

CS 730/830: Intro AI. Prof. Wheeler Ruml. TA Bence Cserna. Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1

the question of whether computers can think is like the question of whether submarines can swim -- Dijkstra

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation

10703 Deep Reinforcement Learning and Control

UNIT 2 TOPICS IN COMPUTER SCIENCE. Emerging Technologies and Society

Gilbert Peterson and Diane J. Cook University of Texas at Arlington Box 19015, Arlington, TX

Surveillance strategies for autonomous mobile robots. Nicola Basilico Department of Computer Science University of Milan

First Name: Last Name: Lab Cover Page. Teaching Assistant to whom you are submitting

A Mobile Robot Behavior Based Navigation Architecture using a Linear Graph of Passages as Landmarks for Path Definition

How to Use the Method of Multivariate Statistical Analysis Into the Equipment State Monitoring. Chunhua Yang

Frugal Sensing Spectral Analysis from Power Inequalities

Elements of Artificial Intelligence and Expert Systems

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

POKER AGENTS LD Miller & Adam Eck April 14 & 19, 2011

Artificial Intelligence

Trajectory Generation for a Mobile Robot by Reinforcement Learning

Team-Triggered Coordination of Robotic Networks for Optimal Deployment

An Agent-based Heterogeneous UAV Simulator Design

Confidence-Based Multi-Robot Learning from Demonstration

CS123. Programming Your Personal Robot. Part 3: Reasoning Under Uncertainty

Survivor Identification and Retrieval Robot Project Proposal

Closed-Loop Transportation Simulation. Outlines

Booklet of teaching units

CMU-Q Lecture 20:

Application Areas of AI Artificial intelligence is divided into different branches which are mentioned below:

Reinforcement Learning for Penalty Avoiding Policy Making and its Extensions and an Application to the Othello Game

ACHIEVING SEMI-AUTONOMOUS ROBOTIC BEHAVIORS USING THE SOAR COGNITIVE ARCHITECTURE

Verification and Validation for Safety in Robots Kerstin Eder

Structure of Intelligent Agents. Examples of Agents 1. Examples of Agents 2. Intelligent Agents and their Environments. An agent:

The UPennalizers RoboCup Standard Platform League Team Description Paper 2017

Planning in autonomous mobile robotics

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment

Component Based Mechatronics Modelling Methodology

Learning Attentive-Depth Switching while Interacting with an Agent

Software-Intensive Systems Producibility

AI Agent for Ants vs. SomeBees: Final Report

Using Reactive Deliberation for Real-Time Control of Soccer-Playing Robots

Subsumption Architecture in Swarm Robotics. Cuong Nguyen Viet 16/11/2015

Transcription:

Reinforcement Learning Simulations and Robotics

Models Partially observable noise in sensors Policy search methods rather than value functionbased approaches Isolate key parameters by choosing an appropriate representation for a value function or policy Incorporating prior knowledge and transfer knowledge from simulations

Safety Key issue of the learning process Doesn't apply to the rest of the RL community Perkins and Barto RL agents based on Lyapunov functions Switching between the underlying controllers Always safe and offers basic performance guarantees.

Grid World Themed Movements Classical RL approaches Discrete states and actions Projected for navigational tasks Use actions like move to the cell to the left Use a lower level controller to take care of accelerating moving and stopping while ensuring precision

Quick Reward Shaping Rewards Quick success Real-world experience costly Specifying good reward functions Requires domain knowledge Difficult in practice Intermediate rewards instead of binary

Tracking Solution Used to help convergence The dynamics of a robot can change Temperature Wear on gears or motors Other external factors

Building an Accurate Model Challenging Requires very many data samples Under-modeling errors accumulate Simulated robot can quickly diverge from the real-world system Transfer requires significant modifications if model is not accurate

Approximate models Verifying and testing algorithms in simulation Establishing proximity to theoretical optimal solution Calculating approximate gradients for local policy improvement Identifying strategies for collecting more data Performing Mental rehearsal

Mental Rehearsal Practicing in simulation The simulated learning step Used after learning a forward model from real world Only the resulting policy is transferred to the robot Model-based methods Sample efficient Often requires a great deal of memory

Mental Rehearsal Issues Simulation Biases Stochasticity of the real world Efficient optimization when sampling from a simulator

Mental Rehearsal Solutions Add a stochastic model of distribution to your simulation Average results over model uncertainty Artificially add noise the the simulation Avoids policy over-fitting Smooths model errors Explicity model uncertainty

Grounded Simulation Learning Iterative optimization framework for speeding up robot learning using an imperfect simulator 1. Behavior is optimized in simulation 2. Behavior is tested on robot and compared to expected results from the simulation 3. Simulator is modified using machinelearning approach to come closer to reality

GSL: Fitness Sim Imperfect simulation of the robot Evaluates the parametrized behavior of the robot Function must be modifiable Used to make the simulation better match the real robot s behavior.

GSL Fitness Robot Small number of evaluations Evaluates the fitness of the parametrized behavior on the robot itself

GSL Explore Robot A small number of explorations can be run on the real robot Collect states and actions relevant to the current parameterization of the behavior While exploring

GSL Learn Used to learn a model of the effects of actions on state of the real robot. This model will be used to modify Fitness sim to make it better reflect the behavior on the real robot.

GSL Optimize In simulation Optimization to find better parameters

Ball in Cup Real Robot Example

Ball in Cup Real Robot Results 42-45 episodes to get the ball n the cup 70-80 episodes to be consistent Always converged tot he maximum after 100 episodes

Simulation in Robot RL Simulation matched recorded data very well Simulated policies usually missed First improve a demonstrated policy in simulation and only perform the fine-tuning on the real robot Importance sampler Considers only the n best previous episodes

SARSA Popular base RL algorithm for robotics Compatible with Q-Value reuse The mapping Q-Value Reuse function

Q-Value Reuse

Transfer Methods

Weak Transfer: Time spent in source task doesn't count against the learner in the target Strong Tranfer: Source time does count

Two Step Transfer Learned sequentially from multiple source tasks The Q-Value Reuse function for two step