Perception and Perspective in Robotics

Similar documents
Manipulation. Manipulation. Better Vision through Manipulation. Giorgio Metta Paul Fitzpatrick. Humanoid Robotics Group.

Perception. Read: AIMA Chapter 24 & Chapter HW#8 due today. Vision

YDDON. Humans, Robots, & Intelligent Objects New communication approaches

Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self

The Whole World in Your Hand: Active and Interactive Segmentation

Object Perception. 23 August PSY Object & Scene 1

Sketching Interface. Larry Rudolph April 24, Pervasive Computing MIT SMA 5508 Spring 2006 Larry Rudolph

Sketching Interface. Motivation

Social Constraints on Animate Vision

Bottom-up and Top-down Perception Bottom-up perception

CHAPTER 1. Introduction. 1.1 The place of perception in AI

Perception. Introduction to HRI Simmons & Nourbakhsh Spring 2015

APPLICATION OF COMPUTER VISION FOR DETERMINATION OF SYMMETRICAL OBJECT POSITION IN THREE DIMENSIONAL SPACE

Human Vision and Human-Computer Interaction. Much content from Jeff Johnson, UI Wizards, Inc.

A developmental approach to grasping

Applying Vision to Intelligent Human-Computer Interaction

Unit IV: Sensation & Perception. Module 19 Vision Organization & Interpretation

Module 2. Lecture-1. Understanding basic principles of perception including depth and its representation.

Teaching robots: embodied machine learning strategies for networked robotic applications

Thinking About Psychology: The Science of Mind and Behavior 2e. Charles T. Blair-Broeker Randal M. Ernst

This histogram represents the +½ stop exposure from the bracket illustrated on the first page.

CSE 165: 3D User Interaction. Lecture #14: 3D UI Design

Photographing Long Scenes with Multiviewpoint

9.5 symmetry 2017 ink.notebook. October 25, Page Symmetry Page 134. Standards. Page Symmetry. Lesson Objectives.

the dimensionality of the world Travelling through Space and Time Learning Outcomes Johannes M. Zanker

Lecture 8. Human Information Processing (1) CENG 412-Human Factors in Engineering May

DepthTouch: Using Depth-Sensing Camera to Enable Freehand Interactions On and Above the Interactive Surface

4 Perceiving and Recognizing Objects

Chapter 17. Shape-Based Operations

Insights into High-level Visual Perception

Computer Vision. Howie Choset Introduction to Robotics

Real- Time Computer Vision and Robotics Using Analog VLSI Circuits

COPYRIGHTED MATERIAL. Overview

Fig Color spectrum seen by passing white light through a prism.

COPYRIGHTED MATERIAL OVERVIEW 1

Angle Measure and Plane Figures

Virtual Reality I. Visual Imaging in the Electronic Age. Donald P. Greenberg November 9, 2017 Lecture #21

Perception. What We Will Cover in This Section. Perception. How we interpret the information our senses receive. Overview Perception

Interacting within Virtual Worlds (based on talks by Greg Welch and Mark Mine)

Occlusion. Atmospheric Perspective. Height in the Field of View. Seeing Depth The Cue Approach. Monocular/Pictorial

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and

Perception. The process of organizing and interpreting information, enabling us to recognize meaningful objects and events.

CMSC 426, Fall 2012 Problem Set 4 Due October 25

Toward Interactive Learning of Object Categories by a Robot: A Case Study with Container and Non-Container Objects

CS 465 Prelim 1. Tuesday 4 October hours. Problem 1: Image formats (18 pts)

Detection and Tracking of the Vanishing Point on a Horizon for Automotive Applications

SECOND YEAR PROJECT SUMMARY

VICs: A Modular Vision-Based HCI Framework

Abstract shape: a shape that is derived from a visual source, but is so transformed that it bears little visual resemblance to that source.

IV: Visual Organization and Interpretation

Visual Interpretation of Hand Gestures as a Practical Interface Modality

Overview. Pinhole camera model Projective geometry Vanishing points and lines Projection matrix Cameras with Lenses Color Digital image

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS

Making Representations: From Sensation to Perception

Introduction to robotics. Md. Ferdous Alam, Lecturer, MEE, SUST

CHAPTER. Line and Shape

Image Capture and Problems

Perceived depth is enhanced with parallax scanning

MULTI-LAYERED HYBRID ARCHITECTURE TO SOLVE COMPLEX TASKS OF AN AUTONOMOUS MOBILE ROBOT

Chapter 1 Introduction

Bandit Detection using Color Detection Method

Perceptual Characters of Photorealistic See-through Vision in Handheld Augmented Reality

You ve heard about the different types of lines that can appear in line drawings. Now we re ready to talk about how people perceive line drawings.

Evaluation of Guidance Systems in Public Infrastructures Using Eye Tracking in an Immersive Virtual Environment

Introduction. Related Work

IMAGE PROCESSING PAPER PRESENTATION ON IMAGE PROCESSING

Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball

Vishnu Nath. Usage of computer vision and humanoid robotics to create autonomous robots. (Ximea Currera RL04C Camera Kit)

Quality Measure of Multicamera Image for Geometric Distortion

Driver Assistance for "Keeping Hands on the Wheel and Eyes on the Road"

Announcements. HW 6: Written (not programming) assignment. Assigned today; Due Friday, Dec. 9. to me.

Assignment: Cameras and Light

MITOCW watch?v=7bachnlg8co

Problem of the Month: Between the Lines

Chapter 12 Image Processing

3D display is imperfect, the contents stereoscopic video are not compatible, and viewing of the limitations of the environment make people feel

Salient features make a search easy

Behaviour-Based Control. IAR Lecture 5 Barbara Webb

Adding Content and Adjusting Layers

Multi Viewpoint Panoramas

Graz University of Technology (Austria)

The introduction and background in the previous chapters provided context in

Touch Perception and Emotional Appraisal for a Virtual Agent

Before How does the painting compare to the original figure? What do you expect will be true of the painted figure if it is painted to scale?

Auditory System For a Mobile Robot

p. 2 21st Century Learning Skills

Visual Perception Based Behaviors for a Small Autonomous Mobile Robot

ROBOT VISION. Dr.M.Madhavi, MED, MVSREC

During What could you do to the angles to reliably compare their measures?

Human Vision. Human Vision - Perception

System of Recognizing Human Action by Mining in Time-Series Motion Logs and Applications

3. Sound source location by difference of phase, on a hydrophone array with small dimensions. Abstract

COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES

Computer Vision Slides curtesy of Professor Gregory Dudek

Experiments with An Improved Iris Segmentation Algorithm

JEPPIAAR ENGINEERING COLLEGE

Sound Automata. Category: Physics: Force & Motion; Sound & Waves. Type: Make & Take. Rough Parts List: Tools List: Video:

Sensor system of a small biped entertainment robot

Color. Used heavily in human vision. Color is a pixel property, making some recognition problems easy

Evolutions of communication

Transcription:

Perception and Perspective in Robotics Paul Fitzpatrick MIT CSAIL USA

experimentation helps perception Rachel: We have got to find out if [ugly naked guy]'s alive. Monica: How are we going to do that? There's no way. Joey: Well there is one way. His window's open I say, we poke him. (brandishes the Giant Poking Device)

robots can experiment Robot: We have got to find out where this object s boundary is. Camera: How are we going to do that? There's no way. Robot: Well there is one way. Looks reachable I say, let s poke it. (brandishes the Giant Poking Limb)

the root of all vision object segmentation poking affordance exploitation (rolling) edge catalog object detection (recognition, localization, contact-free segmentation) manipulator detection (robot, human)

theoretical goal: a virtuous circle familiar activities use constraint of familiar activity to discover unfamiliar entity used within it reveal the structure of unfamiliar activities by tracking familiar entities into and through them familiar entities (objects, actors, properties, )

practical goal: adaptive robots Motivated by fallibility Complex action and perception will fail Need simpler fall-back methods that resolve ambiguity, learn from errors Motivated by transience Task for robot may change from day to day Ambient conditions change Best to build in adaptivity from very beginning Motivated by infants Perceptual development outpaces motor Able to explore despite sloppy control

giant poking device: Cog Head (7 DOFs) Right arm (6 DOFs) Left arm (6 DOFs) Torso (3 DOFs) Stand (0 DOFs)

giant poking device: Cog

talk overview Learning from an activity Poking: to learn to recognize objects, manipulators, etc. Chatting: to learn the names of objects Learning a new activity Searching for an object Then back to learning from the activity

virtuous circle poking objects,

virtuous circle poking, chatting ball! objects, words, names,

virtuous circle poking, chatting, search search objects, words, names,

virtuous circle poking, chatting, search search objects, words, names,

talk overview Learning from an activity Poking: to learn to recognize objects, manipulators, etc. Chatting: to learn the names of objects Learning a new activity Searching for an object Then back to learning from the activity

talk overview Learning from an activity Poking: to learn to recognize objects, manipulators, etc. Chatting: to learn the names of objects Learning a new activity Searching for an object Then back to learning from the activity

object segmentation poking affordance exploitation (rolling) edge catalog object detection (recognition, localization, contact-free segmentation) manipulator detection (robot, human)

object segmentation poking

Active Segmentation segmenting objects through action

Active Segmentation segmenting objects by coming into contact with them

a simple scene? Edges of table and cube overlap Cube has misleading surface pattern Color of cube and table are poorly separated Maybe some cruel grad-student faked the cube with paper, or glued it to the table

active segmentation

active segmentation

Sandini et al, 1993

where to poke? Visual attention system Robot selects a region to fixate based on salience (bright colors, motion, etc.) Region won t generally correspond to extent of object Poking activation Region is stationary Region reachable (right distance, not too high up) Distance measured through binocular disparity

visual attention system person approaches shakes object moves object hides object stands up attracted to skin color attracted to bright color, movement smooth pursuit attracted to skin color smooth pursuit (Collaboration with Brian Scassellati, Giorgio Metta, Cynthia Breazeal)

tracking

poking activation

evidence for segmentation Areas where motion is observed upon contact classify as foreground Areas where motion is observed immediately before contact classify as background Textured areas where no motion was observed classify as background Textureless areas where no motion was observed no information

minimum cut allegiance to foreground foreground node pixel-to-pixel allegiance pixel nodes allegiance to background background node allegiance = cost of assigning two nodes to different layers (foreground versus background)

minimum cut allegiance to foreground foreground node pixel-to-pixel allegiance pixel nodes allegiance to background background node allegiance = cost of assigning two nodes to different layers (foreground versus background)

grouping (on synthetic data) proposed segmentation figure points (known motion) ground points (stationary, or gripper)

point of contact 1 2 3 4 5 6 7 8 9 10 Motion spreads continuously (arm or its shadow) Motion spreads suddenly, faster than the arm itself contact

point of contact

segmentation examples Side tap Back slap Impact event Motion caused (red = novel, Purple/blue = discounted) Segmentation (green/yellow)

segmentation examples

segmentation examples car table

boundary fidelity measure of anisiotropy (square root of) second Hu moment 0.2 0.16 0.12 0.08 0.04 cube car bottle ball 0 0 0.05 0.1 0.15 0.2 0.25 0.3 measure of size (area at poking distance)

signal to noise

object segmentation poking edge catalog

Appearance Catalog exhaustively characterizing the appearance of a low-level feature

sampling oriented regions

sample samples

most frequent samples 1 st 21 st 41 st 61 st 81 st 101 st 121 st 141 st 161 st 181 st 201 st 221 st

selected samples

some tests Red = horizontal Green = vertical

natural images 0 0 ±22.5 0 90 0 ±22.5 0 45 0 ±22.5 0 45 0 ±22.5 0

object segmentation poking edge catalog object detection (recognition, localization, contact-free segmentation)

Open Object Recognition detecting and recognizing familiar objects, enrolling unfamiliar objects

object recognition Geometry-based Objects and images modeled as set of point/surface/volume elements Example real-time method: store geometric relationships in hash table Appearance-based Objects and images modeled as set of features closer to raw image Example real-time method: use histograms of simple features (e.g. color)

geometry+appearance angles + colors + relative sizes invariant to scale, translation, In-plane rotation Advantages: more selective; fast Disadvantages: edges can be occluded; 2D method Property: no need for offline training

details of features Distinguishing elements: Angle between regions (edges) Position of regions relative to their projected intersection point (normalized for scale, orientation) Color at three sample points along line between region centroids Output of feature match: Predicts approximate center and scale of object if match exists Weighting for combining features: Summed at each possible position of center; consistency check for scale Weighted by frequency of occurrence of feature in object examples, and edge length

localization example look for this in this

localization example

localization example

localization example just using geometry geometry + appearance

other examples

other examples

other examples

just for fun look for this in this result

real object in real images

yellow on yellow

multiple objects camera image implicated edges found and grouped response for each object

extending the attention system low-level salience filters object recognition/ localization (wide) object recognition/ localization (foveal) poking sequencer tracker motor control (arm) motor control (eyes, head, neck) egocentric map

attention

open object recognition robot s current view recognized object (as seen during poking) pokes, segments ball sees ball, thinks it is cube correctly differentiates ball and cube

open object recognition

object segmentation poking edge catalog object detection (recognition, localization, contact-free segmentation) manipulator detection (robot, human)

finding manipulators Analogous to finding objects Object Actor Definition: physically coherent structure How to find one: poke around and see what moves together Definition: something that acts on objects How to find one: see what pokes objects

similar human and robot actions Object connects robot and human action

catching manipulators in the act manipulator approaches object contact!

modeling manipulators

manipulator recognition

object segmentation poking affordance exploitation (rolling) edge catalog object detection (recognition, localization, contact-free segmentation) manipulator detection (robot, human)

Affordance Recognition switching from object-centric perception to recognizing action opportunities (collaboration with Giorgio Metta)

what is an affordance? A leaf affords rest/walking to an ant but not to an elephant

exploring affordances

objects roll in different ways a bottle it rolls along its side a toy car it rolls forward a toy cube it doesn t roll easily a ball it rolls in any direction

preferred direction of motion 0.5 0.5 0.4 0.4 0.3 Bottle, pointedness=0.13 0.3 Car, pointedness=0.07 frequency of occurrence 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 0.5 0.4 0.3 Cube, pointedness=0.03 Rolls at right angles to principal axis 0.2 0.1 0 0 10 20 30 40 50 60 70 80 90 0.5 0.4 0.3 Rolls along principal axis Ball, pointedness=0.02 0.2 0.2 0.1 0.1 0 0 10 20 30 40 50 60 70 80 90 0 0 10 20 30 40 50 60 70 80 90 difference between angle of motion and principal axis of object (degrees)

affordance exploitation Caveat: this work uses an early version of object detection (not the one presented today)

mimicry test Invoking the object s natural rolling affordance Going against the object s natural rolling affordance Demonstration by human Mimicry in similar situation Mimicry when object is rotated

mimicry test

object segmentation poking affordance exploitation (rolling) edge catalog object detection (recognition, localization, contact-free segmentation) manipulator detection (robot, human)

talk overview Learning from an activity Poking: to learn to recognize objects, manipulators, etc. Chatting: to learn the names of objects Learning a new activity Searching for an object Then back to learning from the activity

open speech recognition Vocabulary can be extended at any time Assumes active vocabulary is small Isolated words only

keeping track of objects EgoMap short term memory of objects and their locations so out of sight is not out of mind

keeping track of objects

speech and space: chatting

talk overview Learning from an activity Poking: to learn to recognize objects, manipulators, etc. Chatting: to learn the names of objects Learning a new activity Searching for an object Then back to learning from the activity

Tomasello s experiments Designed experiments to challenge constraint-based theory of language acquisition in infants Wants to show infants learn words through real understanding of activity ( flow of interaction ), not hacks Great test cases! Get beyond direct association (But where does knowledge of activity come from?)

let s go find the toma! Infant plays with set of objects Then adult says let s go find the toma! (nonce word) Acts out a search, going to several objects first before finally finding the toma Later, infant tested to see which object it thinks is the toma Several variants (e.g. toma placed in inaccessible location with the infant watching adult is upset when trying to get it)

let s go find the toma!

goal Have robot learn about search activity from examples of looking for known objects Then apply that to a find the toma -like scenario

virtuous circle poking, chatting discover car, ball, and cube through poking; discover their names through chatting car, ball, cube, and their names

virtuous circle poking, chatting, search follow named objects into search activity, and observe the structure of search car, ball, cube, and their names

virtuous circle poking, chatting, searching discover object through poking, learn its name ( toma ) indirectly during search car, ball, cube, toma, and their names

learning about search

what the robot learns Find is followed by mention of an absent object Yes is said when a previously absent object is in view

how it learns this Look for reliable event/state combinations, sequences Events are: hearing a word seeing an object States are: recent events situation evaluations (object corresponding to word not present, mismatch between word and object, etc.)

finding the toma

caveats Much much less sophisticated than infants! Cues the robot is sensitive to are very impoverished Slightly different from Tomasello s experiment Saved state between stages wasn t one complete continuous run

conclusions: why do this? Uses all the alternative essences of intelligence Development Social interaction Embodiment Integration Points the way to really flexible robots today the robot should sort widgets from wombats (neither of which it has seen before) who knows what it will have to do tomorrow

conclusions: contributions active segmentation appearance catalog open object recognition affordance recognition open speech recognition virtuous circle of development through contact for oriented features for correction, enrollment for rolling for isolated words learning about and through activity

conclusions: the future Dexterous manipulation Object perception (visual, tactile, acoustic) During dextrous manipulation During failed manipulation Integration with useful platform Socially enabled Mobile

Tom Murphy (www.cs.cmu.edu/~tom7)