CSE-571 AI-based Mobile Robotics

Similar documents
Learning and Using Models of Kicking Motions for Legged Robots

Learning and Using Models of Kicking Motions for Legged Robots

Reinforcement Learning Simulations and Robotics

Iteration. Many thanks to Alan Fern for the majority of the LSPI slides.

Overview Agents, environments, typical components

Plan Execution Monitoring through Detection of Unmet Expectations about Action Outcomes

ECE 517: Reinforcement Learning in Artificial Intelligence

Team Edinferno Description Paper for RoboCup 2011 SPL

CS295-1 Final Project : AIBO

Game Design Verification using Reinforcement Learning

Soccer Server: a simulator of RoboCup. NODA Itsuki. below. in the server, strategies of teams are compared mainly

Hierarchical Controller for Robotic Soccer

Cooperative Behavior Acquisition in A Multiple Mobile Robot Environment by Co-evolution

CISC 1600, Lab 2.2: More games in Scratch

How a mobile robot selects landmarks to make a decision based on an information criterion

Using Reactive and Adaptive Behaviors to Play Soccer

Test Plan. Robot Soccer. ECEn Senior Project. Real Madrid. Daniel Gardner Warren Kemmerer Brandon Williams TJ Schramm Steven Deshazer

FAST GOAL NAVIGATION WITH OBSTACLE AVOIDANCE USING A DYNAMIC LOCAL VISUAL MODEL

Multi Robot Object Tracking and Self Localization

CMDragons 2009 Team Description

NaOISIS : A 3-D Behavioural Simulator for the NAO Humanoid Robot

Multi-Humanoid World Modeling in Standard Platform Robot Soccer

Task and Motion Policy Synthesis as Liveness Games

Recommended Text. Logistics. Course Logistics. Intelligent Robotic Systems

The Dutch AIBO Team 2004

Path Planning in Dynamic Environments Using Time Warps. S. Farzan and G. N. DeSouza

INTRODUCTION TO KALMAN FILTERS

Playing CHIP-8 Games with Reinforcement Learning

Robot Motion Control and Planning

Multi-Fidelity Robotic Behaviors: Acting With Variable State Information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

RoboCup 2012 Best Humanoid Award Winner NimbRo TeenSize

JavaSoccer. Tucker Balch. Mobile Robot Laboratory College of Computing Georgia Institute of Technology Atlanta, Georgia USA

4D-Particle filter localization for a simulated UAV

Behavior generation for a mobile robot based on the adaptive fitness function

Cooperative Active Perception using POMDPs

Reinforcement Learning Applied to a Game of Deceit

The UPennalizers RoboCup Standard Platform League Team Description Paper 2017

Multi Robot Localization assisted by Teammate Robots and Dynamic Objects

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

The description of team KIKS

Chapter 31. Intelligent System Architectures

A Lego-Based Soccer-Playing Robot Competition For Teaching Design

Game Theoretic Control for Robot Teams

NTU Robot PAL 2009 Team Report

EDUCATIONAL ROBOTICS' INTRODUCTORY COURSE

RCAP CoSpace Rescue Rules 2017

Self-Tuning Nearness Diagram Navigation

Team Playing Behavior in Robot Soccer: A Case-Based Reasoning Approach

ConvNets and Forward Modeling for StarCraft AI

RoboCup. Presented by Shane Murphy April 24, 2003

Team KMUTT: Team Description Paper

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment

EE631 Cooperating Autonomous Mobile Robots. Lecture 1: Introduction. Prof. Yi Guo ECE Department

Autonomous Learning of Ball Trapping in the Four-legged Robot League

COS Lecture 7 Autonomous Robot Navigation

Confidence-Based Multi-Robot Learning from Demonstration

Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball

RoboCupJunior CoSpace Rescue Rules 2015

A Probabilistic Method for Planning Collision-free Trajectories of Multiple Mobile Robots

S.P.Q.R. Legged Team Report from RoboCup 2003

Distributed, Play-Based Coordination for Robot Teams in Dynamic Environments

Fuzzy Logic for Behaviour Co-ordination and Multi-Agent Formation in RoboCup

AI Plays Yun Nie (yunn), Wenqi Hou (wenqihou), Yicheng An (yicheng)

Improvements on Learning Tetris with Cross Entropy

Outline. Introduction to AI. Artificial Intelligence. What is an AI? What is an AI? Agents Environments

Cooperative Transportation by Humanoid Robots Learning to Correct Positioning

CSCI 445 Laurent Itti. Group Robotics. Introduction to Robotics L. Itti & M. J. Mataric 1

The UT Austin Villa 3D Simulation Soccer Team 2008

Real-World Reinforcement Learning for Autonomous Humanoid Robot Charging in a Home Environment

CS 229 Final Project: Using Reinforcement Learning to Play Othello

Robotic Systems ECE 401RB Fall 2007

Multi-Platform Soccer Robot Development System

PROJECTS 2017/18 AUTONOMOUS SYSTEMS. Instituto Superior Técnico. Departamento de Engenharia Electrotécnica e de Computadores September 2017

NuBot Team Description Paper 2008

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

RoboCupJunior Rescue Simulation(CoSpace) 2018

TEST PROJECT MOBILE ROBOTICS FOR JUNIOR

A World Model for Multi-Robot Teams with Communication

Intelligent Robotics Sensors and Actuators

Behaviour-Based Control. IAR Lecture 5 Barbara Webb

Proactive Indoor Navigation using Commercial Smart-phones

An Experimental Comparison of Path Planning Techniques for Teams of Mobile Robots

CMDragons 2006 Team Description

Designing Probabilistic State Estimators for Autonomous Robot Control

NimbRo 2005 Team Description

*Contest and Rules Adapted and/or cited from the 2007 Trinity College Home Firefighting Robot Contest

CS 380: ARTIFICIAL INTELLIGENCE RATIONAL AGENTS. Santiago Ontañón

Game Playing for a Variant of Mancala Board Game (Pallanguzhi)

Obstacle Displacement Prediction for Robot Motion Planning and Velocity Changes

Genetic Programming of Autonomous Agents. Senior Project Proposal. Scott O'Dell. Advisors: Dr. Joel Schipper and Dr. Arnold Patton

Using Reactive Deliberation for Real-Time Control of Soccer-Playing Robots

BRIDGING THE GAP: LEARNING IN THE ROBOCUP SIMULATION AND MIDSIZE LEAGUE

Transferring Deep Reinforcement Learning from a Game Engine Simulation for Robots

A METHOD FOR DISTANCE ESTIMATION USING INTRA-FRAME OPTICAL FLOW WITH AN INTERLACE CAMERA

Hierarchical Case-Based Reasoning Behavior Control for Humanoid Robot

Search and Rescue Maze. Robotics Instructor: Ms. Sicola

Learning Reliable and Efficient Navigation with a Humanoid

Online Interactive Neuro-evolution

Two Dimensional Motion Activity (Projectile Motion)

Transcription:

CSE-571 AI-based Mobile Robotics Approximation of POMDPs: Active Localization Localization so far: passive integration of sensor information Active Sensing and Reinforcement Learning 19 m 26.5 m Active Localization: Idea Actions Target point relative to robot Two-dimensional search space Choose action based on utility and cost 19 m 26.5 m Efficient, autonomous localization by active disambiguation 1

Utilities Costs: Occupancy Probabilities Given by change in uncertainty Uncertainty measured by entropy H ( X ) Bel ( x) log Bel ( x) x Costs are based on occupancy probabilities U ( a) H ( X ) E a [ H ( X )] p ( a) Bel ( x) p ( f ( x)) occ x occ a H ( X ) p( z x) Bel ( x a ) log p ( z x) Bel ( x a) z, a p( z a) Costs: Optimal Path Action Selection Given by cost-optimal path to the target Cost-optimal path determined through value iteration C ( a) p ( a) min [ C ( b)] occ b Choose action based on expected utility and costs a arg max ( U ( a ) a C ( a )) Execution: cost-optimal path reactive collision avoidance 2

Experimental Results RL for Active Sensing Random navigation failed in 9 out of 1 test runs Active localization succeeded in all 2 test runs Active Sensing Sensors have limited coverage & range Question: Where to move / point sensors? Typical scenario: Uncertainty in only one type of state variable Robot location [Fox et al., 98; Kroese & Bunschoten, 99; Roy & Thrun 99] Object / target location(s) [Denzler & Brown, 2; Kreuchner et al., 4, Chung et al., 4] Predominant approach: Minimize expected uncertainty (entropy) Active Sensing in Multi-State Domains Uncertainty in multiple, different state variables Robocup: robot & ball location, relative goal location, Which uncertainties should be minimized? Importance of uncertainties changes over time. Ball location has to be known very accurately before a kick. Accuracy not important if ball is on other side of the field. Has to consider sequence of sensing actions! RoboCup: typically use hand-coded strategies. 3

Converting Beliefs to Augmented States Projected Uncertainty (Goal Orientation) g r State variables Goal (a) (b) Uncertainty variables Belief Augmented state (c) (d) Why Reinforcement Learning? Least-squares Policy Iteration No accurate model of the robot and the environment. Particularly difficult to assess how (projected) entropies evolve over time. Possible to simulate robot and noise in actions and observations. Model-free approach Approximates Q-function by linear function of state features Q a ) Qˆ a; w) No discretization needed No iterative procedure needed for policy evaluation Off-policy: can re-use samples k j 1 a ) j w j [Lagoudakis and Parr 1, 3] 4

Mar ker Least-squares Policy Iteration ' Repeat Estimate Q-function from samples S w Update policy '( s) Until ( ' ) ' Qˆ a; w) LSTD Q ( S, arg max Qˆ a, w) a A k j 1, ) a ) j w j Application: Active Sensing for Goal Scoring Task: AIBO trying to score goals Sensing actions: looking at ball, or the goals, or the markers Fixed motion control policy: Uses most likely states to dock the robot to the ball, then kicks the ball into the goal. Find sensing strategy that best supports the given control policy. Robot Ball Goa l Augmented State Space and Features State variables: Distance to ball Ball Orientation Uncertainty variables: Ent. of ball location Ent. of robot location Ent. of goal orientation Features: Goal a, d ), H, H, H,,1 b b b r a g g Robot b Ball Experiments Strategy learned from simulation Episode ends when: Scores (reward +5) Misses (reward 1.5.1) Loses track of the ball (reward -5) Fails to dock / accidentally kicks the ball away (reward -5) Applied to real robot Compared with 2 hand-coded strategies Panning: robot periodically scans Pointing: robot periodically looks up at markers/goals 5

Average rewards Success Ratio Rewards (simulation) Success Ratio (simulation) 4 1 2.8-2.6-4.4-6 -8 Learned Pointing Panning -1 1 2 3 4 5 6 7 Episodes.2 Learned Pointing Panning 1 2 3 4 5 6 7 Episodes Learned Strategy Results on Real Robots Initially, robot learns to dock (only looks at ball) Then, robot learns to look at goal and markers 45 episodes of goal kicking Goals Misses Avg. Miss Distance Kick Failures Learned 31 1 6±.3cm 4 Robot looks at ball when docking Briefly before docking, adjusts by looking at the goal Prefers looking at the goal instead of markers for location information Pointing 22 19 9±2.2cm 4 Panning 15 21 22±9.4cm 9 6

Lost Ball Ratio Adding Opponents Learning With Opponents 1.8 Learned with pre-trained data Learned from scratch Pre-trained Robot.6 Goal ou o d Opponent Ball vb.4.2 Additional features: ball velocity, knowledge about other robots 1 2 3 4 5 6 7 Episodes Robot learned to look at ball when opponent is close to it. Thereby avoids losing track of it. Summary Learned effective sensing strategies that make good trade-offs between uncertainties Results on a real robot show improvements over carefully tuned, hand-coded strategies Augmented-MDP (with projections) good approximation for RL LSPI well suited for RL on augmented state spaces 7