Reinforcement Learning Simulations and Robotics

Size: px

Start display at page:

Download "Reinforcement Learning Simulations and Robotics"

Annabelle Walton
5 years ago
Views:

1 Reinforcement Learning Simulations and Robotics

2 Models Partially observable noise in sensors Policy search methods rather than value functionbased approaches Isolate key parameters by choosing an appropriate representation for a value function or policy Incorporating prior knowledge and transfer knowledge from simulations

3 Safety Key issue of the learning process Doesn't apply to the rest of the RL community Perkins and Barto RL agents based on Lyapunov functions Switching between the underlying controllers Always safe and offers basic performance guarantees.

4 Grid World Themed Movements Classical RL approaches Discrete states and actions Projected for navigational tasks Use actions like move to the cell to the left Use a lower level controller to take care of accelerating moving and stopping while ensuring precision

5 Quick Reward Shaping Rewards Quick success Real-world experience costly Specifying good reward functions Requires domain knowledge Difficult in practice Intermediate rewards instead of binary

6 Tracking Solution Used to help convergence The dynamics of a robot can change Temperature Wear on gears or motors Other external factors

7 Building an Accurate Model Challenging Requires very many data samples Under-modeling errors accumulate Simulated robot can quickly diverge from the real-world system Transfer requires significant modifications if model is not accurate

8 Approximate models Verifying and testing algorithms in simulation Establishing proximity to theoretical optimal solution Calculating approximate gradients for local policy improvement Identifying strategies for collecting more data Performing Mental rehearsal

9 Mental Rehearsal Practicing in simulation The simulated learning step Used after learning a forward model from real world Only the resulting policy is transferred to the robot Model-based methods Sample efficient Often requires a great deal of memory

10 Mental Rehearsal Issues Simulation Biases Stochasticity of the real world Efficient optimization when sampling from a simulator

11 Mental Rehearsal Solutions Add a stochastic model of distribution to your simulation Average results over model uncertainty Artificially add noise the the simulation Avoids policy over-fitting Smooths model errors Explicity model uncertainty

12 Grounded Simulation Learning Iterative optimization framework for speeding up robot learning using an imperfect simulator 1. Behavior is optimized in simulation 2. Behavior is tested on robot and compared to expected results from the simulation 3. Simulator is modified using machinelearning approach to come closer to reality

13 GSL: Fitness Sim Imperfect simulation of the robot Evaluates the parametrized behavior of the robot Function must be modifiable Used to make the simulation better match the real robot s behavior.

14 GSL Fitness Robot Small number of evaluations Evaluates the fitness of the parametrized behavior on the robot itself

15 GSL Explore Robot A small number of explorations can be run on the real robot Collect states and actions relevant to the current parameterization of the behavior While exploring

16 GSL Learn Used to learn a model of the effects of actions on state of the real robot. This model will be used to modify Fitness sim to make it better reflect the behavior on the real robot.

17 GSL Optimize In simulation Optimization to find better parameters

18 Ball in Cup Real Robot Example

Ball in Cup Real Robot Results 42-45 episodes to get the ball n the cup

19 Ball in Cup Real Robot Results episodes to get the ball n the cup episodes to be consistent Always converged tot he maximum after 100 episodes

20 Simulation in Robot RL Simulation matched recorded data very well Simulated policies usually missed First improve a demonstrated policy in simulation and only perform the fine-tuning on the real robot Importance sampler Considers only the n best previous episodes

21 SARSA Popular base RL algorithm for robotics Compatible with Q-Value reuse The mapping Q-Value Reuse function

22 Q-Value Reuse

23 Transfer Methods

25 Weak Transfer: Time spent in source task doesn't count against the learner in the target Strong Tranfer: Source time does count

27 Two Step Transfer Learned sequentially from multiple source tasks The Q-Value Reuse function for two step

Overview Agents, environments, typical components

Overview Agents, environments, typical components CSC752 Autonomous Robotic Systems Ubbo Visser Department of Computer Science University of Miami January 23, 2017 Outline 1 Autonomous robots 2 Agents