Local Relative Transfer Function for Sound Source Localization

Similar documents
LOCAL RELATIVE TRANSFER FUNCTION FOR SOUND SOURCE LOCALIZATION

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

arxiv: v1 [cs.sd] 4 Dec 2018

Recent Advances in Acoustic Signal Extraction and Dereverberation

Robust Low-Resource Sound Localization in Correlated Noise

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Multiple Sound Sources Localization Using Energetic Analysis Method

LETTER Pre-Filtering Algorithm for Dual-Microphone Generalized Sidelobe Canceller Using General Transfer Function

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

MULTIMODAL BLIND SOURCE SEPARATION WITH A CIRCULAR MICROPHONE ARRAY AND ROBUST BEAMFORMING

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

/$ IEEE

IN REVERBERANT and noisy environments, multi-channel

Audiovisual speech source separation: a regularization method based on visual voice activity detection

Auditory System For a Mobile Robot

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

Microphone Array Power Ratio for Speech Quality Assessment in Noisy Reverberant Environments 1

STAP approach for DOA estimation using microphone arrays

Estimation of Non-stationary Noise Power Spectrum using DWT

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Mikko Myllymäki and Tuomas Virtanen

MULTICHANNEL AUDIO DATABASE IN VARIOUS ACOUSTIC ENVIRONMENTS

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

Speech enhancement with ad-hoc microphone array using single source activity

MULTICHANNEL systems are often used for

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Sound Source Localization using HRTF database

Time Delay Estimation: Applications and Algorithms

ONE of the most common and robust beamforming algorithms

Michael E. Lockwood, Satish Mohan, Douglas L. Jones. Quang Su, Ronald N. Miles

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

ACOUSTIC feedback problems may occur in audio systems

Gerhard Schmidt / Tim Haulick Recent Tends for Improving Automotive Speech Enhancement Systems. Geneva, 5-7 March 2008

OPTIMUM POST-FILTER ESTIMATION FOR NOISE REDUCTION IN MULTICHANNEL SPEECH PROCESSING

An Integrated Real-Time Beamforming and Postfiltering System for Nonstationary Noise Environments

RIR Estimation for Synthetic Data Acquisition

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 5, MAY

Voice Activity Detection

Speech Enhancement Using Microphone Arrays

Noise Reduction for L-3 Nautronix Receivers

Automotive three-microphone voice activity detector and noise-canceller

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Robust telephone speech recognition based on channel compensation

Nonlinear postprocessing for blind speech separation

MATCHED FIELD PROCESSING: ENVIRONMENTAL FOCUSING AND SOURCE TRACKING WITH APPLICATION TO THE NORTH ELBA DATA SET

IMPROVED COCKTAIL-PARTY PROCESSING

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

Mathematical Problems in Networked Embedded Systems

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY

Sound Processing Technologies for Realistic Sensations in Teleworking

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Noise Reduction: An Instructional Example

Evaluation of a Multiple versus a Single Reference MIMO ANC Algorithm on Dornier 328 Test Data Set

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

VM2000. Low-Noise Bottom Port Piezoelectric MEMS Microphone Data Sheet Vesper Technologies Inc. Differential Analog Output

THE problem of acoustic echo cancellation (AEC) was

A robust dual-microphone speech source localization algorithm for reverberant environments

Microphone Array Design and Beamforming

Speech Enhancement Based On Noise Reduction

A Novel Hybrid Approach to the Permutation Problem of Frequency Domain Blind Source Separation

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Microphone Array Feedback Suppression. for Indoor Room Acoustics

Applications of Acoustic-to-Seismic Coupling for Landmine Detection

Simulation and design of a microphone array for beamforming on a moving acoustic source

Title. Author(s)Sugiyama, Akihiko; Kato, Masanori; Serizawa, Masahir. Issue Date Doc URL. Type. Note. File Information

Broadband Microphone Arrays for Speech Acquisition

ROBUST echo cancellation requires a method for adjusting

Optimization Techniques for Alphabet-Constrained Signal Design

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

Single-channel late reverberation power spectral density estimation using denoising autoencoders

Speech enhancement with a GSC-like structure employing sparse coding

Live multi-track audio recording

Speech Enhancement for Nonstationary Noise Environments

From Monaural to Binaural Speaker Recognition for Humanoid Robots

Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming

REAL-TIME BLIND SOURCE SEPARATION FOR MOVING SPEAKERS USING BLOCKWISE ICA AND RESIDUAL CROSSTALK SUBTRACTION

DISTANT or hands-free audio acquisition is required in

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION

Robust speech recognition using temporal masking and thresholding algorithm

for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong,

Lecture 3: Data Transmission

Time-of-arrival estimation for blind beamforming

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

Blind Beamforming for Cyclostationary Signals

ESTIMATION OF TIME-VARYING ROOM IMPULSE RESPONSES OF MULTIPLE SOUND SOURCES FROM OBSERVED MIXTURE AND ISOLATED SOURCE SIGNALS

Enhancement of Speech in Noisy Conditions

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

Analysis and Improvements of Linear Multi-user user MIMO Precoding Techniques

Ricean Parameter Estimation Using Phase Information in Low SNR Environments


Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Transcription:

Local Relative Transfer Function for Sound Source Localization Xiaofei Li 1, Radu Horaud 1, Laurent Girin 1,2, Sharon Gannot 3 1 INRIA Grenoble Rhône-Alpes. {firstname.lastname@inria.fr} 2 GIPSA-Lab & Univ. Grenoble Alpes 3 Faculty of Engineering, Bar-Ilan University September 1, 2015 X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 1 / 16

Outline 1 Introduction 2 Problem formulation and usual RTF 3 Local relative transfer function 4 Sound source localization using local-rtf vector 5 Experiments 6 Conclusions X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 2 / 16

Introduction Task & The scenario Sound source localization. Microphone array with an arbitrary topology. Single static desired speech source. Baseline method & Challenge Relative transfer function (RTF): as a funtion of direction of arrival. Challenge: It is hard to select a good reference channel in a complex acoustic environment. Proposed method To avoid a potential bad unique reference channel, we propose local RTF that takes local reference channel. a biased local-rtf estimator and a unbiased estimator. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 3 / 16

Problem formulation In the STFT domain, the signals received by the M microphones are approximated as: x(ω, l) h(ω)s(ω, l) + n(ω, l) ω and l are the indices of frequency-bin and time-frame. s(ω, l) is the source signal. x(ω, l) = [x 1 (ω, l),..., x M (ω, l)] T is the sensor signal vector. n(ω, l) = [n 1 (ω, l),..., n M (ω, l)] T is the sensor noise vector. h(ω) = [h 1 (ω),..., h M (ω)] T is the acoustic transfer function (ATF) vector. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 4 / 16

Relative transfer function RTF Definition ATF ratio r m (ω) = hm(ω) h 1 (ω), where the first channel is taken as the reference. RTF Estimation 1 The cross-spectral method: ˆr m (ω) = ˆΦ xmx1 (ω) ˆΦ x1 x 1 (ω). ˆΦ xmx 1 (ω) and ˆΦ x1 x 1 (ω) are the cross and auto-psd of sensor signals. 2 An unbiased estimator based on the nonstationarity of speech [Gannot01] 1. In [Gannot01], it is proved that the RTF estimation error are inversely proportional to the SNR at the reference channel. 1 S. Gannot, et al. Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Trans. Signal Proc., vol. 49, no. 8, pp. 1614-1626, 2001. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 5 / 16

Local relative transfer function: Definition 1 We should select the channel with the highest SNR as the reference. However, it is hard to precisely estimate the SNR at each channel in a complex environment. As an alternative solution, we define local-rtf a m (ω) = h m(ω) h(ω) ej(arg[hm(ω)] arg[h m 1(ω)]) where arg[ ] is the phase of complex number, is the l 2 -norm. Local phase difference & Normalized level. Avoid a potential bad global reference channel. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 6 / 16

Local relative transfer function: Definition 2 The corresponding local-rtf vector is a(ω) = [a 1 (ω),..., a M (ω)] T. It is NOT an actual transfer function vector that can be directly used for beamforming. It is rather a robust feature expected to be appropriate for sound source localization due to its lower sensitivity to noise (compared to regular RTF vector). X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 7 / 16

Local relative transfer function: Biased estimator The local-rtf of the m-th channel can be estimated by the cross-spectral method: ˆΦ xmxm (ω) â m (ω) = M ˆΦ e jarg[ˆφ xmxm 1 (ω)]. m=1 xmxm (ω) This estimator is biased, and in high SNR the bias is small. It is suitable for high SNR scenarios, due to the bias and low computational load. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 8 / 16

Local relative transfer function: Unbiased estimator (1) Inspired by [Cohen04] 2, we propose an unbiased local-rtf estimator. [Cohen04] provides: ˆρ m (ω): an unbiased estimation of the ATF ratio ρ m (ω) = hm(ω) h m 1 (ω). ˆΦ sms m (ω, l): a PSD estimation of the image source h m (ω)s(ω, l). ˆΦ sms m (ω) = 1 L L l=1 ˆΦ sms m (ω, l): the frame-averaged power of the image source signal over frames. 2 I. Cohen. Relative transfer function identification using speech signals, IEEE Trans. Speech and Audio Proc., vol. 12, no. 5, pp. 451-459, 2004. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 9 / 16

Local relative transfer function: Unbiased estimator (2) Based on ˆρ m (ω) and ˆΦ smsm (ω), the local-rtf is estimated as ˆΦ smsm (ω) â m (ω) = M ˆΦ e jarg[ˆρm(ω)] m=1 smsm (ω) The estimation error of this estimator depends on the estimate accuracy of ˆρ m (ω) and ˆΦ sms m (ω). The detailed analysis can be found in [Cohen04]. This unbiased estimator is more suitable for low SNRs. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 10 / 16

Sound source localization using local-rtf vector Concatenate the local-rtf vectors across frequencies: â = [â T (0),..., â T (ω),..., â T (Ω 1)] T. Lookup table dataset: {a k, d k } K k=1. a k and d k denote the feature vector and source direction. Localization method Lookup: find the I best directions {a ki, d ki } I i=1. Interpolation: weighted mean I i=1 ˆd = â a k i 1 d ki I i=1 â a k i 1 where the reciprocal of the feature difference â a ki the weight. 1 is taken as X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 11 / 16

Experiments: Audio-visual data set Audio-visual data set. Lookup table: 432 source directions in the camera field-of-view. Test data: the speech signal is emited from other 108 directions in the camera field-of-view. Figure: (left) Dummy head with four microphones (red circles) and cameras. (right) The lookup source directions. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 12 / 16

Experiments: Noise and comparison method Two types of noise are added into the test data with various SNRs. Environmental noise is recorded in a noisy office environment, includes people movements, devices, outside environment (passing cars, street noise), etc. Directional WGN is emitted by a loudspeaker with a direction beyond the camera field-of-view in the same noisy office. Comparison method (Regular RTF): RTF with a unique reference derived from [Cohen04], using the reference channel with the highest input SNR 3. 3 Note that the input SNR is computed using the estimated noise and speech power provided by [Cohen04]. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 13 / 16

Experiments: Results for environmental noise Localization errors 4 for Biased estimator (Local-RTF 1), Unbiased estimator (Local-RTF 2) and the comparison method (Regular RTF). The bold values are the minimum error at each SNR. SNR Local-RTF 1 Local-RTF 2 Regular RTF (db) Azi. Ele. Azi. Ele. Azi. Ele. 10 0.83 0.51 0.85 0.47 0.96 0.76 5 0.83 0.56 0.86 0.47 0.95 0.82 0 0.85 0.62 0.89 0.46 1.02 0.74-5 1.00 0.76 1.02 0.51 1.20 1.05-10 1.53 1.22 1.51 0.75 1.79 1.30 Local-RTF 1 vs 2: The biased estimator has comparable performance with the unbiased estimator in high SNRs, however larger elevation error in low SNRs. Local-RTF 2 vs Regular RTF: Regular RTF perform worse than the proposed, due to its imprecise input SNR estimation. 4 The absolute angle error (in degrees) in azimuth (Azi.) and elevation (Ele.). X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 14 / 16

Experiments: Results for directional WGN SNR Local-RTF 1 Local-RTF 2 Regular RTF (db) Azi. Ele. Azi. Ele. Azi. Ele. 10 0.80 0.49 0.82 0.49 0.80 0.87 5 1.24 0.65 0.80 0.54 0.87 0.80 0 3.39 1.31 0.91 0.56 1.11 0.64-5 8.33 2.74 1.40 0.77 1.31 0.75-10 11.2 3.87 3.82 1.48 1.64 1.00 Local-RTF 1 vs 2: Compared to the unbiased estimator, the biased estimator performs better slightly for 10 db SNR, however deteriorates abruptly with the decreasing of SNR. Because the directional noise brings a larger estimation bias. Local-RTF 2 vs Regular RTF: Regular RTF performs better when the SNR is low (-5, -10 db). This indicates that the highest SNR channel are correctly selected in Regular RTF. Because 1 the noise directivity induces a large noise power difference among channels for low SNRs. 2 the noise signal is relatively stationary. X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 15 / 16

Conclusions Local-RTF and two estimators are proposed. Experiments show that local-rtf is more robust than the regular RTF when the noise power cannot be precisely estimated. Thank you very much! Q & A xiaofei.li@inria.fr X. Li, R. Horaud, L. Girin, S. Gannot Local-RTF September 1, 2015 16 / 16