Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self

Similar documents
Teaching robots: embodied machine learning strategies for networked robotic applications

Manipulation. Manipulation. Better Vision through Manipulation. Giorgio Metta Paul Fitzpatrick. Humanoid Robotics Group.

YDDON. Humans, Robots, & Intelligent Objects New communication approaches

Perception. Read: AIMA Chapter 24 & Chapter HW#8 due today. Vision

Perception and Perspective in Robotics

Learning the Proprioceptive and Acoustic Properties of Household Objects. Jivko Sinapov Willow Collaborators: Kaijen and Radu 6/24/2010

Toward Automatic Transcription -- Pitch Tracking In Polyphonic Environment

Humanoid robot. Honda's ASIMO, an example of a humanoid robot

Salient features make a search easy

Auditory System For a Mobile Robot

Integrated Vision and Sound Localization

Separation and Recognition of multiple sound source using Pulsed Neuron Model

Monaural and Binaural Speech Separation

Speech and Music Discrimination based on Signal Modulation Spectrum.

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Imaging Process (review)

Music. Fill their world with rhythm, improve social and emotional well-being.

ARTIFICIAL INTELLIGENCE - ROBOTICS

A developmental approach to grasping

EE631 Cooperating Autonomous Mobile Robots. Lecture 1: Introduction. Prof. Yi Guo ECE Department

Multi-Modal User Interaction

Speech/Music Change Point Detection using Sonogram and AANN

Years 5 and 6 standard elaborations Australian Curriculum: Dance

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

Industrial Imaging Today & Tomorrow AIA 2016 The Vision Show. Steve Varga, Procter & Gamble

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING

Evolving Robot Empathy through the Generation of Artificial Pain in an Adaptive Self-Awareness Framework for Human-Robot Collaborative Tasks

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Visual Interpretation of Hand Gestures as a Practical Interface Modality

Social Interaction and the Development of Artificial Consciousness

MIST A Musical Interactive Space for Therapy

Service Robots in an Intelligent House

Graz University of Technology (Austria)

HRTF adaptation and pattern learning

Paper Body Vibration Effects on Perceived Reality with Multi-modal Contents

Enhancing 3D Audio Using Blind Bandwidth Extension

Evaluating the stability of SIFT keypoints across cameras

Image processing & Computer vision Xử lí ảnh và thị giác máy tính

EXPERIMENTAL BILATERAL CONTROL TELEMANIPULATION USING A VIRTUAL EXOSKELETON

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

The Whole World in Your Hand: Active and Interactive Segmentation

A Robust Neural Robot Navigation Using a Combination of Deliberative and Reactive Control Architectures

Single Chip for Imaging, Color Segmentation, Histogramming and Pattern Matching

Rhythm Analysis in Music

Background. Computer Vision & Digital Image Processing. Improved Bartlane transmitted image. Example Bartlane transmitted image

VIRTUAL REALITY Introduction. Emil M. Petriu SITE, University of Ottawa

Research on Extracting BPM Feature Values in Music Beat Tracking Algorithm

Chapter 3 Part 2 Color image processing

Elements of Art and Principles of Design. Drawing I- Course Instructor: Dr. Brown

PARAMETER IDENTIFICATION IN RADIO FREQUENCY COMMUNICATIONS

A Responsive Vision System to Support Human-Robot Interaction

Learning to Order Objects using Haptic and Proprioceptive Exploratory Behaviors

A Computational Efficient Method for Assuring Full Duplex Feeling in Hands-free Communication

S.P.Q.R. Legged Team Report from RoboCup 2003

REpeating Pattern Extraction Technique (REPET)

Humanoid Robotics (TIF 160)

AUGMENTED VIRTUAL REALITY APPLICATIONS IN MANUFACTURING

Drum Transcription Based on Independent Subspace Analysis

E90 Project Proposal. 6 December 2006 Paul Azunre Thomas Murray David Wright

Intelligent Robotics Sensors and Actuators

WB2306 The Human Controller

Sensors & Systems for Human Safety Assurance in Collaborative Exploration

Real-time beat estimation using feature extraction

Sensing and Perception

NTU Robot PAL 2009 Team Report

INFLUENCE OF FREQUENCY DISTRIBUTION ON INTENSITY FLUCTUATIONS OF NOISE

Tapping into Touch. Eduardo Torres-Jara Lorenzo Natale Paul Fitzpatrick

Sound rendering in Interactive Multimodal Systems. Federico Avanzini

Jane Li. Assistant Professor Mechanical Engineering Department, Robotic Engineering Program Worcester Polytechnic Institute

MECHANICAL DESIGN LEARNING ENVIRONMENTS BASED ON VIRTUAL REALITY TECHNOLOGIES

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

RESEARCH ON METHODS FOR ANALYZING AND PROCESSING SIGNALS USED BY INTERCEPTION SYSTEMS WITH SPECIAL APPLICATIONS

Complex Continuous Meaningful Humanoid Interaction: A Multi Sensory-Cue Based Approach

Nonuniform multi level crossing for signal reconstruction

Sensing. Autonomous systems. Properties. Classification. Key requirement of autonomous systems. An AS should be connected to the outside world.

Visual Effects of. Light. Warmth. Light is life. Sun as a deity (god) If sun would turn off the life on earth would extinct

Sketching Interface. Larry Rudolph April 24, Pervasive Computing MIT SMA 5508 Spring 2006 Larry Rudolph

Environmental Sound Recognition using MP-based Features

Sketching Interface. Motivation

Binaural Hearing. Reading: Yost Ch. 12

Why interest in visual perception?

Introduction to cochlear implants Philipos C. Loizou Figure Captions

Optic Flow Based Skill Learning for A Humanoid to Trap, Approach to, and Pass a Ball

ROBOT VISION. Dr.M.Madhavi, MED, MVSREC

An Auditory Localization and Coordinate Transform Chip

Available theses in robotics (March 2018) Prof. Paolo Rocco Prof. Andrea Maria Zanchettin

Analytical Analysis of Disturbed Radio Broadcast

Available theses in robotics (November 2017) Prof. Paolo Rocco Prof. Andrea Maria Zanchettin

An introduction to physics of Sound

Color. Used heavily in human vision. Color is a pixel property, making some recognition problems easy

Range Sensing strategies

Fundamentals of Music Technology

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Unit IV: Sensation & Perception. Module 19 Vision Organization & Interpretation

Chapter 17. Shape-Based Operations

Humanoid Robotics (TIF 160)

Fast, Robust Colour Vision for the Monash Humanoid Andrew Price Geoff Taylor Lindsay Kleeman

Sound Source Localization using HRTF database

An Example of robots with their sensors

Transcription:

Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self Paul Fitzpatrick and Artur M. Arsenio CSAIL, MIT

Modal and amodal features

Modal and amodal features (following Lewkowicz)

Motivation Tools and toys are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files, Rhythmic information across the visual and acoustic sensory modalities have complementary properties Features extracted from visual and acoustic processing are what is needed to build an object recognition system

Talk Outline Hardware Matching sound and vision Priming for attention Differentiation Integration The self and others

Cog s Perceptual System COG S PERCEPTUAL SYSTEM 3 cameras on active vision head microphone array above torso proprioceptive feedback from all joints periodically moving object (hammer) + periodically generated sound (banging) 2 1

Interacting with the robot robot

Making sense of the senses Bang, Bang! Who is he?

Talk Outline Hardware Matching sound and vision Priming for attention Differentiation Integration The self and others

Matching sound and vision energy frequency (khz) 8 6 4 2 1 15 2 5 1 15 2 25 5 1 15 time (ms) 2 25 15 1 5 hammer position The sound intensity peaks once per visual period of the hammer (CIRAS 23) 5-5 -6-7 -8

Matching algorithm Estimate signal period (histogram technique from CIRAS 23) Cluster rising and falling intervals, guided by the scale of estimated period Merge sufficiently close clusters Segment full periods in the signal

Binding Sounds to Toys

Playing a tambourine Appearance and sound of tambourine are bound together Object Segmentation Sound Segmentation (window divided in 4x4 images) robot sees and hears a tambourine shaking Multiple Object Tracking Cog s view Object Recognition (window divided in 2x2 images) tambourine segmentations

Robustness to random visual disturbances to auditory disturbances Person talks sound not matched to object!

Talk Outline Hardware Matching sound and vision Priming for attention Priming visual foreground with sound Priming acoustic foreground with vision Matching multiple sources Differentiation Integration The self and others

Priming visual foreground with sound One object (the car) making noise Another object (the ball) in view Problem: which object goes with the sound? Solution: Match periods of motion and sound

Comparing periods 8 car position 5 7 1 15 2 25 frequency (khz) 5 5 4 5 4 3 2 1 5 1 15 time (ms) 2 25 The sound intensity peaks twice per visual period of the car 5 1 5 1 15 2 1 energy ball position -5 6 6 5 15 2 25

car position Matching with acoustic distraction 9 8 7 6 5 snake position 5 1 15 2 12 1 8 6 Sound 4 5 1 1 15 15 frequency (khz) 8 6 4 2 5 2 2 25

Matching multiple sources Two objects making sounds with distinct spectrums Problem: which object goes with which sound? Solution: Match periods of motion and sound

Binding periodicity features 8 6 4 35 3 2 4 6 8 112 1416182-2 frequency (khz) rattle position 4 2 car position -4-6 2 4 6 8 112 1416182 5 1 15 2 The sound intensity peaks twice per visual period of the car. For the cube rattle, the sound/visual signals have different ratios according to the frequency bands

Cross-modal association errors

Talk Outline Hardware Matching sound and vision Priming for attention Differentiation Visual Recognition Sound Recognition Integration The self and others

Visual Object Segmentation/Recognition Object Segmentation Object Recognition see Arsenio, MIT PhD thesis, 24 for visual objectsegmentation/recognition

Sound Segmentation frequency (khz) Goal: Extract acoustic signatures from repetitive data Problem: STFTs applied for spectral analysis, but not ideal for irregular signals Solution: Build histograms of hypothesized periods 6 4 2 5 1 15 2 Time (msecs) 7 random sound samples for each of 4 objects. From top to bottom: hammer, cube rattle, car and snake rattle, respectively.

Sound Recognition khz Normalized time Recognition rate: 82% Average sound images Eigenobjects corresponding to the three highest eigenvalues

Talk Outline Hardware Matching sound and vision Priming for attention Differentiation Integration Cross-modal segmentation/recognition Cross-modal enhancement of detection The self and others

Cross-modal object recognition Causes sound when changing direction after striking object; quiet when changing direction to strike again Causes sound while moving rapidly with wheels spinning; quiet when changing direction Causes sound when changing direction, often quiet during remainder of trajectory (although bells vary)

Cross-modal object recognition Log(visual peak energy/acoustic peak energy) 4 2 Car Hammer Cube rattle Snake rattle Crossmodal recognition rate: 92% -2-4 -6-8 -1 -.8 -.6 -.4 -.2.2.4 Log(acoustic period/visual period) Dynamic Programming is applied to match previously segmented sensory signals: visual trajectories to the sound energy signal

Cross-modal recognition confusion table

Cross-modal enhancement of detection frequency (khz) 8 6 4 2 Signals in Phase 2 4 6 time (ms) 2 4 6 time (ms) frequency (khz) 8 6 4 2 8

Signals out of phase! frequency (khz) 8 6 4 2 2 4 6 time (ms) frequency (khz) 8 6 4 2 2 4 time (ms) 6 8

Talk Outline Hardware Matching sound and vision Priming for attention Differentiation Integration The self and others Learning about people Learning about the self

Cross-modal rhythm to integrate perception of Control Experiment others 1 Experiment 2 the robot sees a person shaking head no periodic sound the robot sees a person shaking head and saying no

Cross-modal rhythm to integrate perception of others Jumping and clapping Small visual/sound delay gap network delay

Binding Sound and Proprioceptive Data visual segmentation detected correlations multiple obj. tracking Detecting ones own rhythms sound segmentation Cog s view object recognition

Binding Vision, Sound and Proprioceptive Data Visual image segmented, sound detected, and all bounded to the motion of the arm robot is looking towards its arm as human moves it Video

Binding Vision, Sound and Proprioceptive Data visual segmentation detected correlations multiple obj. tracking sound segmentation Cog s view object recognition

Cog s mirror image

So, how does Cog perceive himself?

The robot's experience of an event Object Visual Segmentation Detection of cross-modal correlations Multiple Object Tracking Sound Segmentation (window separated on a set of 4x4 images each image contains spectogram over 1 period of the signal) Cog s view Object Recognition (window divided on a set of 2x2 images downloaded from the class assigned to the object)

Conclusions Amodal features are key to detecting relationships across senses Useful for learning to recognize an object in different senses (e.g. by its appearance or its sound) There are features for object recognition that exist only in relationships across senses and do not exist in any one sense Useful both for perception of external objects and robot s own body, by incorporating proprioception as another sense