Multichannel Robot Speech Recognition Database: MChRSR

Similar documents
Speech Processing and Transmission Laboratory, University of Chile Av. Tupper 2007, Santiago, Chile

Simultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array

Active Control of Energy Density in a Mock Cabin

arxiv: v1 [cs.sd] 4 Dec 2018

Microphone Array Design and Beamforming

Exploratory Study of a Robot Approaching a Person

SYNTHESIS OF DEVICE-INDEPENDENT NOISE CORPORA FOR SPEECH QUALITY ASSESSMENT. Hannes Gamper, Lyle Corbin, David Johnston, Ivan J.

Audio data fuzzy fusion for source localization

Low frequency sound reproduction in irregular rooms using CABS (Control Acoustic Bass System) Celestinos, Adrian; Nielsen, Sofus Birkedal

KINECT CONTROLLED HUMANOID AND HELICOPTER

IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH

3D Intermodulation Distortion Measurement AN 8

Application Note 3PASS and its Application in Handset and Hands-Free Testing

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Causes for Amplitude Compression AN 12

Case study for voice amplification in a highly absorptive conference room using negative absorption tuning by the YAMAHA Active Field Control system

2 Focus of research and research interests

Resonant MEMS Acoustic Switch Package with Integral Tuning Helmholtz Cavity

University of Huddersfield Repository

Adaptive Human aware Navigation based on Motion Pattern Analysis Hansen, Søren Tranberg; Svenstrup, Mikael; Andersen, Hans Jørgen; Bak, Thomas

Name: Lab Partner: Section:

MAXXSPEECH PERFORMANCE ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION

ODEON APPLICATION NOTE ISO Open plan offices Part 2 Measurements

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

RIR Estimation for Synthetic Data Acquisition

Figure 2.1 a. Block diagram representation of a system; b. block diagram representation of an interconnection of subsystems

Context-sensitive speech recognition for human-robot interaction

Physics 131 Lab 1: ONE-DIMENSIONAL MOTION

Tablet System for Sensing and Visualizing Statistical Profiles of Multi-Party Conversation

Teaching Mechanical Students to Build and Analyze Motor Controllers

29th TONMEISTERTAGUNG VDT INTERNATIONAL CONVENTION, November 2016

Dynamic Generation of DC Displacement AN 13

Computer Vision in Human-Computer Interaction

Multi-channel Active Control of Axial Cooling Fan Noise

HOW CAN PUBLIC ART BE A STORYTELLER FOR THE 21 ST CENTURY?

Meeting Corpora Hardware Overview & ASR Accuracies

A Road Traffic Noise Evaluation System Considering A Stereoscopic Sound Field UsingVirtual Reality Technology

Robot to Human Approaches: Preliminary Results on Comfortable Distances and Preferences

Easy Robot Programming for Industrial Manipulators by Manual Volume Sweeping

The effects of the excitation source directivity on some room acoustic descriptors obtained from impulse response measurements

Wang Nan, Pang Bo and Zhou Sha-Sha College of Mechanical and Electrical Engineering, Hebei University of Engineering, Hebei, Handan, , China

Auditory System For a Mobile Robot

HEAD. Advanced Filters Module (Code 5019) Overview. Features. Module with various filter tools for sound design

3D Distortion Measurement (DIS)

A Comparative Study of Structured Light and Laser Range Finding Devices

Resonance Tube. 1 Purpose. 2 Theory. 2.1 Air As A Spring. 2.2 Traveling Sound Waves in Air

Convention Paper Presented at the 130th Convention 2011 May London, UK

GE 320: Introduction to Control Systems

Airborne Sound Insulation

Designing and Implementing an Interactive Social Robot from Off-the-shelf Components

or Op Amps for short

Measuring impulse responses containing complete spatial information ABSTRACT

Resonance Tube. 1 Purpose. 2 Theory. 2.1 Air As A Spring. 2.2 Traveling Sound Waves in Air

ModaDJ. Development and evaluation of a multimodal user interface. Institute of Computer Science University of Bern

UNIVERSIDAD CARLOS III DE MADRID DEPARTMENT OF MECHANICAL ENGINEERING MACHINE THEORY ANALYSIS AND DESIGN OF CAMS. Course 2010/11

Keywords: Ultrasonic Testing (UT), Air-coupled, Contact-free, Bond, Weld, Composites

Carbon microphone. Roman Doronin Vitaliy Matiunin Aleksandr Severinov Vladislav Tumanov Maksim Tumakov. Russia IYPT

University of Southampton Research Repository eprints Soton

AMERICAN UNIVERSITY EAST CAMPUS DEVELOPMENT WASHINGTON, D.C. Environmental Noise Study. Project Number

High-Level Programming for Industrial Robotics: using Gestures, Speech and Force Control

Natural Interaction with Social Robots

HIGH-LEVEL MULTI-STEP INVERTER OPTIMIZATION, USING A MINIMUM NUMBER OF POWER TRANSISTORS.

Week 1. Signals & Systems for Speech & Hearing. Sound is a SIGNAL 3. You may find this course demanding! How to get through it:

The psychoacoustics of reverberation

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES

DESIGN AND CAPABILITIES OF AN ENHANCED NAVAL MINE WARFARE SIMULATION FRAMEWORK. Timothy E. Floore George H. Gilman

Speech Intelligibility Enhancement using Microphone Array via Intra-Vehicular Beamforming

WHAT ELSE SAYS ACOUSTICAL CHARACTERIZATION SYSTEM LIKE RON JEREMY?

SECTION A Waves and Sound

Influence of artificial mouth s directivity in determining Speech Transmission Index

Multiple Sound Sources Localization Using Energetic Analysis Method

Subject: Pappy s Grill and Sports Bar DJ System Acoustical Isolation Study

Investigating Electromagnetic and Acoustic Properties of Loudspeakers Using Phase Sensitive Equipment

The file. signal, and. the. from

Resonance Tube Lab 9

Using the VM1010 Wake-on-Sound Microphone and ZeroPower Listening TM Technology

Spatial Sounds (100dB at 100km/h) in the Context of Human Robot Personal Relationships

RELATED WORK Gaze model Gaze behaviors in human-robot interaction have been broadly evaluated: turn-taking [6], joint attention [7], influences toward

NEXT-GENERATION AUDIO NEW OPPORTUNITIES FOR TERRESTRIAL UHD BROADCASTING. Fraunhofer IIS

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Eye-to-Hand Position Based Visual Servoing and Human Control Using Kinect Camera in ViSeLab Testbed

Revision 1.1 May Front End DSP Audio Technologies for In-Car Applications ROADMAP 2016

1 Publishable summary

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Navigation in the Presence of Humans

Integrated Driving Aware System in the Real-World: Sensing, Computing and Feedback

Image Manipulation Interface using Depth-based Hand Gesture

Sensors and Actuators

LCC 3710 Principles of Interaction Design. Readings. Sound in Interfaces. Speech Interfaces. Speech Applications. Motivation for Speech Interfaces

LAB 5: Mobile robots -- Modeling, control and tracking

Measurement of Amplitude Modulation AN 6

Acoustic Resonance Lab

Robotics and Artificial Intelligence. Rodney Brooks Director, MIT Computer Science and Artificial Intelligence Laboratory CTO, irobot Corp

TOUCH & FEEL VIRTUAL REALITY. DEVELOPMENT KIT - VERSION NOVEMBER 2017

Socially Acceptable Robot Navigation in the Presence of Humans

TurtleBot2&ROS - Learning TB2

EFFECT OF ARTIFICIAL MOUTH SIZE ON SPEECH TRANSMISSION INDEX. Ken Stewart and Densil Cabrera

3D impulse response measurements of spaces using an inexpensive microphone array

A Simple Two-Microphone Array Devoted to Speech Enhancement and Source Tracking

SECTION A Waves and Sound

Transcription:

Multichannel Robot Speech Recognition Database: MChRSR José Novoa, Juan Pablo Escudero, Josué Fredes, Jorge Wuth, Rodrigo Mahu and Néstor Becerra Yoma Speech Processing and Transmission Lab. Universidad de Chile Av. Tupper 2007, P.O.Box 412-3 Santiago-CHILE E-mail: nbecerra@ing.uchile.cl Tel: +56-2-29784205 Abstract In real human-robot interaction (HRI) scenarios, speech recognition represents a major challenge due to robot noise, background noise and time-varying acoustic channel. This document describes the procedure used to obtain the Multichannel Robot Speech Recognition Database (MChRSR). It is composed of 12 hours of multichannel evaluation data recorded in a real mobile HRI scenario. This database was recorded with a PR2 robot performing different translational and azimuthal movements. Accordingly, 16 evaluation sets were obtained re-recording the clean set of the Aurora-4 database in different movement conditions. 1. Database Recording The experimental setup used in the database recording employs a PR2 robot which is a state-of-the-art mobile manipulation robot. It has a Microsoft Xbox 360 Kinect sensor mounted on the top. We re-record the clean test set from Aurora-4 database [1] in a meeting room considering different relative movements between the speech source and the robot. A TANNOY 501a loudspeaker was used as the audio source. The recording process was performed with the PR2 s Microsoft Kinect sensor which has a four-microphone array. The re-recording was carried out while the robot was performing translational and head rotation movements simultaneously. Before the recording of each robot movement condition, the background noise was measured and the maximum of equivalent sound pressure level (Leq) over ten minutes was 39dBA. 1

2. Robot Movements During the recording of the different sets of the evaluation database, the robot made two types of movements that typically could be found in an HRI scenario: robot translation and robot s head azimuthal rotation. 2.1 Robot Translation The first movement defined for the robot was the translational movement. Here, the robot moved towards and away from the loudspeaker between points P1 and P2 in Fig. 1. Three values for the robot displacement velocity were selected: 0.30 m/s, 0.45 m/s and 0.60 m/s. Those values were inspired by the discussions in [2], where a robot approached to a seated person at 0.25 m/s and 0.40 m/s. In those conditions, none of the human participants found these robot speeds too fast or disturbing. The selected velocities were multiplied by the speed factor function shown in Fig. 2. An additional test database was recorded with the robot at point P1. Thus, four robot displacement Figure 1: Recording diagram in the meeting room floor plan. In the static condition, i.e. without robot translation, the robot was positioned at P1. In the dynamic conditions the robot moved towards and away from the loudspeaker between points P1 and P2 which are located at 1 and 3 meters from the loudspeaker, respectively. 2

Speed factor Figure 2: Speed factor function, T corresponds to the one-way travel time from point P1 to point P2 in the Fig. 1 or vice-verse. conditions were considered for the data recording: a static condition at position P1, and three translational movements between points P1 and P2. 2.2 Robot Head Azimuthal Rotation The robot rotates its head making an azimuthal sweep, as can be seen in Fig. 3a. The robot s head moved periodically, sweeping between 150º and 150º, at three different angular velocities. During the movements, the front of the robot was towards to the loudspeaker which was situated at 0. The 0 Loudspeaker Angular velocity w Tangent velocity v Virtual target 150 Non-swept area -150 PR2 robot a) b) Figure 3: a) Angle swept by the azimuthal movement performed by the robot's head during the utterances recording. The loudspeaker was located at 0º. The robot s head moves periodically from -150º to 150º at different angular velocities. Recordings with static head were performed at 0º (i.e., oriented towards the source). b) The selected angular velocities of the robot s head movement correspond to the velocity necessary to follow with the head a virtual target located two meters away and moving with tangential velocities equal to 2 km/h, 3 km/h and 4 km/h. 3

a) b) c) d) e) Figure 4: Original and resulting spectrograms for a given utterance recorded with the Kinect microphones: a) the original clean utterance; and, b), c), d) and e) correspond to the four Kinect microphones when the angular and displacement velocities were made equal to 0.56 rad/s and 0.60 m/s, respectively. 4

three selected values for the angular velocity were: 0.28 rad/s, 0.42 rad/s and 0.56 rad/s. These values correspond to the angular velocity necessary to follow with the robot s head a virtual target. The virtual target was considered placed two meters from the robot and moving with a tangential velocity equal to 2 km/h, 3 km/h and 4 km/h, respectively, as shown in Fig. 3b. Additionally, a fourth angular robot s head motion condition was generated by positioning the head oriented towards the loud speaker and making the head angular velocity equal to zero. Figure 4 shows the clean and the resulting spectrograms for a given utterance recorded with the Kinect microphones at the most severe motion condition, i.e. with angular and displacement velocities equal to 0.56 rad/s and 0.60 m/s, respectively. Summarizing, 16 evaluation sets were generated by re-recording the clean test sets of the Aurora-4 database in different translational and azimuthal motion conditions which correspond to 12 hours of multichannel evaluation data. The MChRSR data is available at http://www.lptv.cl/en/hri-asr/. More inquiries can be made directly to the last author of this paper. Acknowledgements The research reported here was funded by Grants Conicyt-Fondecyt 1151306 and ONRG N 62909-17-1-2002. José Novoa was supported by Grant CONICYT-PCHA/Doctorado Nacional/2014-21140711. References [1] G. Hirsch, Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task, Version 2.0, AU/417/02, ETSI STQ Aurora DSR Working Group, 2002. [2] K. Dautenhahn, M. Walters, S. Woods, K. L. Koay, C. L. Nehaniv, A. Sisbot, R. Alami and T. Siméon, How may I serve you?: a robot companion approaching a seated person in a helping context, in Proceedings of ACM Conference on Human Robot Interaction, Salt Lake City, UT, USA, 2006. 5