Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

Similar documents
Local Relative Transfer Function for Sound Source Localization

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

LOCAL RELATIVE TRANSFER FUNCTION FOR SOUND SOURCE LOCALIZATION

Recent Advances in Acoustic Signal Extraction and Dereverberation

arxiv: v1 [cs.sd] 4 Dec 2018

Multiple Sound Sources Localization Using Energetic Analysis Method

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

ESTIMATION OF TIME-VARYING ROOM IMPULSE RESPONSES OF MULTIPLE SOUND SOURCES FROM OBSERVED MIXTURE AND ISOLATED SOURCE SIGNALS

Robust Low-Resource Sound Localization in Correlated Noise

Sound Source Localization using HRTF database

Microphone Array Design and Beamforming

Speech Enhancement Based On Noise Reduction

Localization of underwater moving sound source based on time delay estimation using hydrophone array

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

Clustered Multi-channel Dereverberation for Ad-hoc Microphone Arrays

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

MULTIMODAL BLIND SOURCE SEPARATION WITH A CIRCULAR MICROPHONE ARRAY AND ROBUST BEAMFORMING

Acoustic Echo Cancellation: Dual Architecture Implementation

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Automotive three-microphone voice activity detector and noise-canceller

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

Online Version Only. Book made by this file is ILLEGAL. 2. Mathematical Description

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

NOISE ESTIMATION IN A SINGLE CHANNEL

Performance Analysis of Feedforward Adaptive Noise Canceller Using Nfxlms Algorithm

Binaural Speaker Recognition for Humanoid Robots

Dual-Microphone Speech Dereverberation in a Noisy Environment

Mel Spectrum Analysis of Speech Recognition using Single Microphone

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

Single-channel late reverberation power spectral density estimation using denoising autoencoders

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Frequency Domain Implementation of Advanced Speech Enhancement System on TMS320C6713DSK

Sound Source Localization in Median Plane using Artificial Ear

Speech Enhancement Using Microphone Arrays

Microphone Array Power Ratio for Speech Quality Assessment in Noisy Reverberant Environments 1

From Monaural to Binaural Speaker Recognition for Humanoid Robots

A New Framework for Supervised Speech Enhancement in the Time Domain

Blind Beamforming for Cyclostationary Signals

Michael E. Lockwood, Satish Mohan, Douglas L. Jones. Quang Su, Ronald N. Miles

SOUND SOURCE LOCATION METHOD

Calibration of Microphone Arrays for Improved Speech Recognition

Auditory System For a Mobile Robot

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Speaker Localization in Noisy Environments Using Steered Response Voice Power

ROBUST echo cancellation requires a method for adjusting

ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY

Audio data fuzzy fusion for source localization

Level I Signal Modeling and Adaptive Spectral Analysis

EXPERIMENTS IN ACOUSTIC SOURCE LOCALIZATION USING SPARSE ARRAYS IN ADVERSE INDOORS ENVIRONMENTS

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Robust Near-Field Adaptive Beamforming with Distance Discrimination

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

arxiv: v3 [cs.sd] 31 Mar 2019

Application of Affine Projection Algorithm in Adaptive Noise Cancellation

Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network

Different Approaches of Spectral Subtraction Method for Speech Enhancement

TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION

RIR Estimation for Synthetic Data Acquisition

IMPROVED COCKTAIL-PARTY PROCESSING

Adaptive Systems Homework Assignment 3

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

Active Noise Cancellation System Using DSP Prosessor

A Fast and Accurate Sound Source Localization Method Using the Optimal Combination of SRP and TDOA Methodologies

All-Neural Multi-Channel Speech Enhancement

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

OPTIMUM POST-FILTER ESTIMATION FOR NOISE REDUCTION IN MULTICHANNEL SPEECH PROCESSING

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

Simultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array

arxiv: v1 [cs.sd] 30 Nov 2017

REAL-TIME SRP-PHAT SOURCE LOCATION IMPLEMENTATIONS ON A LARGE-APERTURE MICROPHONE ARRAY

Informed Sound Source Localization Using Relative Transfer Functions for Hearing Aid Applications

Real Time Noise Suppression in Social Settings Comprising a Mixture of Non-stationary and Transient Noise

Comparison of LMS and NLMS algorithm with the using of 4 Linear Microphone Array for Speech Enhancement

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

A SUPERVISED SIGNAL-TO-NOISE RATIO ESTIMATION OF SPEECH SIGNALS. Pavlos Papadopoulos, Andreas Tsiartas, James Gibson, and Shrikanth Narayanan

ROOM IMPULSE RESPONSE SHORTENING BY CHANNEL SHORTENING CONCEPTS. Markus Kallinger and Alfred Mertins

Transcription:

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function Xiaofei Li, Laurent Girin, Fabien Badeig, Radu Horaud PERCEPTION Team, INRIA Grenoble Rhone-Alpes October 12 th, 2016

Sound Localization with a Robot Head! Considered Scenario Humanoid robot NAO (version 5) Speaker direction relative to the robot should be estimated Microphone array (NAO robot) Sound localization scene 2

Sound Localization with a Robot Head! Challenges Room reverberation Robot ego-noise and ambient noise! Proposed method Estimation of the Direct-Path Relative Transfer Function (DP-RTF) Sound source localization (DoA) calculated from DP-RTF Robustness towards noise increased by Spectral Subtraction 3

Microphone Signals! Two-channel microphone signal: x(n)=a(n)*s(n), y(n)=b(n)*s(n) x(n), y(n): microphone signals s(n): source signal a(b), b(n): room impulse response including direct-path sound propagation and reflections. (The direct-path propagation indicates the sound direction.)! Apply STFT to obtain the Convolutive Transfer Function (CTF): xp,k = ap,k* sp,k, yp,k = bp,k* sp,k p, k: frame and frequency indices 4

Convolutive Transfer Function (CTF)! Problem: Assumption of multiplicative transfer function not fulfilled if DFT size lower than room impulse response (RIR) length! CTF needed in such cases given by the convolution depends the length of the RIR 5

Direct-Path Relative Transfer Function! CTF ap,k, with frame index p=0,...,q-1 is composed of a0,k: direct-path transfer function (at frame instance 0) ap,k,(unwanted) reverberation at frame instances p=1,...,q-1! Direct-Path Relative Transfer Function (DP-RTF) given by the ratio contains information about the source direction (by the phase difference for numerator and denominator) robust to reverberation (since late reverberant part excluded) 6

DP-RTF Estimation! Estimation from noise-free microphone signals Two channel convolutive relation: xp,k* bp,k = yp,k* ap,k Division by a0,k and rearranging the terms leads to a set of linear equation: yp,k = zp,k' gk with zp,k = [xp,k,..., xp-q+1,k, yp-1,k,..., yp-q+1,k] ', gk = [b0,k / a0,k,...,bq-1,k / a0,k, -a1,k / a0,k,... -aq-1,k / a0,k ] '. Taking the expectation leads to an expression in terms of the cross- and auto power spectral density (PSD): ϕyy(p,k) = ϕzy(p,k)' gk At frequency k, DP-RTF is estimated by solving an overdetermined set of linear equations 7

Noisy Recordings! DP-RTF estimation in the presence of noise Noisy signal microphone signal: ŷ (n) = y(n) + v(n), Source and noise signal are (assumed to be) uncorrelated. PSD of noisy signal ϕŷŷ(p,k) = ϕyy(p,k)+ϕvv(p,k). Clean PSDs can be obtained by Spectral Subtraction Estimation of noise PSDs and easily obtained for stationary noise 8

Calculation of Sound Source Location! DP-RTF feature vector c: concatenates DP-RTFs across microphone pairs and frequencies.! Calculation of sound direction d Probablistic piecewise-linear regression d = f(c) [Deleforge et al., IEEE Trans. 2015]. The regression model f is learned from training data (feature-direction pairs) {ci,di }i=1,...,i. 9

Experiments with the NAO Robot! Experimental environments Cafeteria, office, laboratory, and meeting room. Reverberation time T60: 0.24s, 0.47s, 0.52s, and 1.04s.! Noise signals Mainly the stationary fan-noise of robot head. The signal-to-noise-ratio (SNR) is about 5 db.! Related methods MTF-based RTF estimator (RTF-MTF) [Li et al., ICASSP 2015]. Coherence test (RTF-CT) [MOHAN et al., IEEE Trans. 2008]. SRP-PHAT [Do et al., ICASSP 2007]. 10

Experiments with the NAO Robot! Results for laboratory room Azimuth angle from -120º to 120º (T60 of approx. 0.5s) Proposed method shows the best results - Related methods fail especially for large azimuths that are closer to the wall due to the strong reflections 11

Experiments with the NAO Robot! Audio-visual: localize speaker position in the camera image Metric: average absolute localization error in degrees Azimuth (Azi.) and elevation (Ele.) Cafeteria Office Laboratory Meeting Room Azi. Ele. Azi. Ele. Azi. Ele. Azi. Ele. RTF-MTF 0.45 1.57 0.62 2.14 1.44 2.31 1.87 3.66 RTF-CT 0.44 1.50 0.64 2.25 1.61 2.36 1.77 3.44 SRP-PHAT 0.77 1.95 1.03 2.80 1.41 3.33 2.04 3.52 Proposed 0.47 1.47 0.55 1.87 0.82 1.84 0.95 2.12 The proposed localization method performs better, especially for high reverberation time. Azimuth results are better than elevation results since the coplanar microphone array has a low elevation resolution. 12

Conclusions! A direct-path RTF estimator for sound source localization! Robust to reverberation and noise.! More details are available in the extended paper: X. Li et al., Estimation of the direct-path RTF for supervised soundsource localization, IEEE/ACM Trans. ASLP, 2016.! In future studies, the extension to the multiple-speaker case could be investigated. 13