Linear and Parametric Microphone Array Processing

Similar documents
Recent Advances in Acoustic Signal Extraction and Dereverberation

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays

arxiv: v1 [cs.sd] 4 Dec 2018

EUSIPCO

Blind Beamforming for Cyclostationary Signals

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 5, MAY

Speech Enhancement Using Microphone Arrays

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

Microphone Array Design and Beamforming

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

A MULTI-CHANNEL POSTFILTER BASED ON THE DIFFUSE NOISE SOUND FIELD. Lukas Pfeifenberger 1 and Franz Pernkopf 1

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Dual-Microphone Speech Dereverberation in a Noisy Environment

Microphone Array Power Ratio for Speech Quality Assessment in Noisy Reverberant Environments 1

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

MULTICHANNEL AUDIO DATABASE IN VARIOUS ACOUSTIC ENVIRONMENTS

OPTIMUM POST-FILTER ESTIMATION FOR NOISE REDUCTION IN MULTICHANNEL SPEECH PROCESSING

Flexible and efficient spatial sound acquisition and subsequent. Parametric Spatial Sound Processing

260 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

HUMAN speech is frequently encountered in several

Performance Evaluation of Capon and Caponlike Algorithm for Direction of Arrival Estimation

IN REVERBERANT and noisy environments, multi-channel

/$ IEEE

Multiple Sound Sources Localization Using Energetic Analysis Method

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm

DISTANT or hands-free audio acquisition is required in

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

Single channel noise reduction

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

MULTICHANNEL systems are often used for

Adaptive Beamforming. Chapter Signal Steering Vectors

ONE of the most common and robust beamforming algorithms

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

Speech Enhancement for Nonstationary Noise Environments

Design of Robust Differential Microphone Arrays

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

LETTER Pre-Filtering Algorithm for Dual-Microphone Generalized Sidelobe Canceller Using General Transfer Function

COMPARISON OF TWO BINAURAL BEAMFORMING APPROACHES FOR HEARING AIDS

A BINAURAL HEARING AID SPEECH ENHANCEMENT METHOD MAINTAINING SPATIAL AWARENESS FOR THE USER

Chapter 4 SPEECH ENHANCEMENT

JOINT DOA AND FUNDAMENTAL FREQUENCY ESTIMATION METHODS BASED ON 2-D FILTERING

Calibration of Microphone Arrays for Improved Speech Recognition

IN DISTANT speech communication scenarios, where the

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

NOISE ESTIMATION IN A SINGLE CHANNEL

arxiv: v1 [cs.sd] 17 Dec 2018

SPEECH signals are inherently sparse in the time and frequency

Recent advances in noise reduction and dereverberation algorithms for binaural hearing aids

Dual-Microphone Speech Dereverberation using a Reference Signal Habets, E.A.P.; Gannot, S.

Broadband Microphone Arrays for Speech Acquisition

PATH UNCERTAINTY ROBUST BEAMFORMING. Richard Stanton and Mike Brookes. Imperial College London {rs408,

High-speed Noise Cancellation with Microphone Array

Local Relative Transfer Function for Sound Source Localization

Uplink and Downlink Beamforming for Fading Channels. Mats Bengtsson and Björn Ottersten

In air acoustic vector sensors for capturing and processing of speech signals

Introduction to distributed speech enhancement algorithms for ad hoc microphone arrays and wireless acoustic sensor networks

Direction of Arrival Algorithms for Mobile User Detection

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS

SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

BREAKING DOWN THE COCKTAIL PARTY: CAPTURING AND ISOLATING SOURCES IN A SOUNDSCAPE

Eigenvalues and Eigenvectors in Array Antennas. Optimization of Array Antennas for High Performance. Self-introduction

STAP approach for DOA estimation using microphone arrays

Title. Author(s)Sugiyama, Akihiko; Kato, Masanori; Serizawa, Masahir. Issue Date Doc URL. Type. Note. File Information

Sound Source Localization using HRTF database

Subspace Noise Estimation and Gamma Distribution Based Microphone Array Post-filter Design

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Speech Enhancement using Wiener filtering

Speech enhancement with ad-hoc microphone array using single source activity

Springer Topics in Signal Processing

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Speech Enhancement using Multiple Transducers

Indoor Localization based on Multipath Fingerprinting. Presented by: Evgeny Kupershtein Instructed by: Assoc. Prof. Israel Cohen and Dr.

ENERGY-VS-PERFORMANCE TRADE-OFFS IN SPEECH ENHANCEMENT IN WIRELESS ACOUSTIC SENSOR NETWORKS

Adaptive Systems Homework Assignment 3

Speech Signal Enhancement Techniques

Joint DOA and Array Manifold Estimation for a MIMO Array Using Two Calibrated Antennas

Different Approaches of Spectral Subtraction Method for Speech Enhancement

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING

Noise Reduction: An Instructional Example

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Towards an intelligent binaural spee enhancement system by integrating me signal extraction. Author(s)Chau, Duc Thanh; Li, Junfeng; Akagi,

Transcription:

Part 5 - Joint Linear and Parametric Spatial Processing Emanuël A. P. Habets 1 and Sharon Gannot 2 1 International Audio Laboratories Erlangen, Germany A joint institution of University of Erlangen-Nuremberg and Fraunhofer IIS 2 Faculty of Engineering, Bar-Ilan University, Israel ICASSP 213, Vancouver, Canada

Overvie 1 Motivation 2 Informed Spatial Filtering 3 Examples Page 1/31

1. Motivation Classical Linear Spatial Filtering: + High amount of noise plus interference reduction + Controllable tradeoff beteen speech distortion and noise reduction + Controllable tradeoff beteen different noise types Not very robust.r.t. estimation errors, position changes, etc. Relatively slo response time Parametric Spatial Filtering: + Fast response time + Relatively robust.r.t. estimation errors, position changes, etc. + Possibility to manipulate parameters (e.g., virtual source displacement) Inherent tradeoff beteen speech distortion and noise reduction Model violations can introduce audible artifacts [Thiergart and Habets, 212] Relatively poor interference reduction due to tradeoff and model violations Page 2/31

Overvie 1 Motivation 2 Informed Spatial Filtering 3 Examples Page 3/31

2. Informed Spatial Filtering The main idea behind informed filtering is to incorporate relevant information about specific problem into design of filters and estimation of required statistics. Microphone Signals Informed Multichannel Spatial Filter Processed Signals Estimate Second Order Statistics Estimate Parameters (e.g., diffuseness, DOA) Figure: Informed filtering approach. Page 4/31

2. Informed Spatial Filtering A selection of parameters that can be used (see Part 4): Signal-to-diffuse ratio (SDR): Γ(k, m, p i) = P dir(k, m, p i) P diff (k, m), here P dir is poer of direct component at position p i and P diff is poer of diffuse component (assuming a ly homogenous sound field). Time and frequency dependent direction-of-arrival estimates. Time and frequency dependent interaural level differences. Time and frequency dependent interaural phase differences.... Page 5/31

Overvie 1 Motivation 2 Informed Spatial Filtering 3 Examples Example A: Extracting Coherent Sound Sources Example B: Dereverberation in SH Domain Example C: Directional Filtering Example D: Source Extraction Page 6/31

3.1 Example A: Extracting Coherent Sound Sources Signal model: y(k, m) = x(k, m) + v(k, m). Assumption: Desired signals are strongly coherent across array. Aim: Estimate X1(k, m) using a parametric multichannel Wiener filter [Benesty et al., 211]: Φ 1 v (k, m)φ x(k, m) h PMWF(k, m) = λ(k, m) + tr { Φ 1 v (k, m)φ x(k, m) } u1 5 λ -15-1 -5 5 1 15 Figure: Mapping from input signal-to-diffuse ratio to tradeoff parameter λ [Taseska and Habets, 212]. Page 7/31

Proposed Solution [Taseska and Habets, 212] y(k, m) Z(k, m) Parametric Multichannel Wiener Filter Estimate Noise PSD Matrix v(k, m) Estimate Speech Presence Probability P [H 1 y(k, m)] (k, m) Estimate Signal-to- Diffuse Ratio Figure: Block diagram of proposed system. Page 8/31

Algorithm Summary High-level description of proposed algorithm [Taseska and Habets, 212]: 1. Compute signal-to-diffuse ratio (SDR) using [Thiergart et al., 212]. 2. Compute a priori speech presence probability (SPP) based on SDR. 3. Compute multichannel a posteriori SPP [Souden et al., 21]. 4. Update noise PSD matrix using a posteriori SPP. 5. Compute tradeoff parameter for parametric multichannel Wiener filter (PMWF) based on SDR: - When SDR is high, e decrease amount of speech distortion. - When SDR is lo, e increase amount of noise reduction. 6. Compute and apply parametric multichannel Wiener filter. Page 9/31

Results (1).8 λ = λ = 1 λ = f(γ) λ = f(γ) + SPP.8 Δ PESQ.6.4.2.6.4.2 5 5 1 15 2 25 input SNR (db) 5 5 1 15 2 25 input SNR (db) Figure: Performance evaluation: PESQ improvement for stationary diffuse noise (left) and diffuse babble speech (right) [Taseska and Habets, 212]. Page 1/31

Results (2) λ = λ = 1 λ = f(γ) λ = f(γ) + SPP SNR gain (db) 8 6 4 2 8 6 4 2 5 5 1 15 2 25 input SNR (db) 5 5 1 15 2 25 input SNR (db) Figure: Performance evaluation: segmental SNR improvement for stationary diffuse noise (left) and diffuse babble speech (right) [Taseska and Habets, 212]. Page 11/31

Results (3) 8 2 8 2 7 1 7 1 Frequency [khz] 6 5 4 3 2 1 2 3 Frequency [khz] 6 5 4 3 2 1 2 3 1 4 1 4 1 2 3 4 5 6 Time [s] 5 1 2 3 4 5 6 Time [s] 5 (a) First Microphone Signal (b) MVDR 8 2 8 2 7 1 7 1 Frequency [khz] 6 5 4 3 2 1 2 3 Frequency [khz] 6 5 4 3 2 1 2 3 1 4 1 4 1 2 3 4 5 6 Time [s] (c) Parametric MWF 5 1 2 3 4 5 6 Time [s] (d) Parametric MWF ith MC-SPP 5 Figure: Examples obtained using M=4 microphone signals corrupted by sensor noise and babble speech (input SNR = 1 db). Audio Examples Page 12/31

3.2 Example B: Dereverberation in SH Domain Assumed signal model ith stacked spherical harmonic components: p(k, m) = x(k, m) + d(k, m) + ṽ(k, m) }{{} = γ(k, m) X (k, m) + ũ(k, m) γ(k, m) = x(k, m) X (k, m) = y(ω dir) Y (Ω dir ) = γ dir, here Y is zero-order spherical harmonic and Ω dir is DOA. p d v x Spherical Harmonics up to order 3 Page 13/31

Proposed Solution [Braun et al., 213] Desired signal: The direct signal component X(k, m) hich corresponds to sound pressure measured at center of array in absence of spherical microphone array. Assumption: Direct, diffuse and noise components are mutually uncorrelated. Proposed solution: The (rank-1) MWF provides an MMSE estimate of X (k, m). For practical reasons, e split MWF into an MVDR filter folloed by a single-channel Wiener filter: h MWF(k, m) = φ X (k, m) Φ 1 ũ (k, m) γ dir φ X (k, m) γ H dir Φ 1 ũ (k, m) γ dir + 1 φ X = Φ 1 ũ (k, m) γ dir γ H dir Φ 1 ũ (k, m) γ dir φ }{{} X + [ γ H dir Φ 1 ũ (k, m) γ ] 1 dir }{{} h MVDR (k,m) H W (k,m) Page 14/31

Parameter-based PSD Matrix Estimation Required information: Diffuse PSD matrix estimation: Direction of arrival (DOA) γ dir Interference PSD matrix: Φũ(k, m) = Φ d(k, m) + Φṽ(k, m) Assume model for diffuse sound component: Φ d(k, m) = φ D (k, m) I (L+1) 2 Calculate diffuse sound PSD using an estimate of diffuseness Ψ: φ D (k, m) = φ P (k, m) φṽ (k, m) Ψ 1 (k, m) STFT SHT STFT -1 Diffuseness Estimation Residual Interf. PSD Diffuse PSD Estimation Page 15/31

Results 8 2 8 2 7 1 7 1 Frequency [khz] 6 5 4 3 2 1 2 3 Frequency [khz] 6 5 4 3 2 1 2 3 1 4 1 4.5 1 1.5 2 2.5 Time [s] 5.5 1 1.5 2 2.5 Time [s] 5 (a) Reference X (k, m) (b) Received P (k, m) 8 2 8 2 7 1 7 1 Frequency [khz] 6 5 4 3 2 1 2 3 Frequency [khz] 6 5 4 3 2 1 2 3 1 4 1 4.5 1 1.5 2 2.5 Time [s] 5.5 1 1.5 2 2.5 Time [s] 5 (c) Processed: MVDR (d) Processed: MWF Figure: Examples obtained using simulated signals [Jarrett et al., 212] (source-array distance is 2 m, SNR = 2 db, T 6 =4 ms). Audio examples. Page 16/31

3.3 Example C: Directional Filtering Flexible sound acquisition in noisy and reverberant environments ith rapidly changing acoustic scenes is a common problem in modern communication systems. A filter is proposed that provides an arbitrary response for J sources being simultaneously active per time and frequency. The proposed filter provides an optimal tradeoff beteen hite noise gain (WNG) and directivity index. The filter exploits instantaneous information on sound (narroband DOAs, diffuse-to-noise ratio) hich allos a nearly immediate adaption to changes in acoustic scene. Page 17/31

Problem Formulation Assuming three components in (1) are mutually uncorrelated, e canthe expressdesired poer signal spectral density is given (PSD) by: matrix of microphone signals as { J } Φ(k, n) Z(k, =E m) x(k, n) = x H (k, n) G(k, ϕ j)x (j) 1 L j=1 = Φl(k, n)+φd(k, n)+φn(k, n), (2) ith Signal model: Based on a multi-ave sound field model, M microphone signals can be expressed as: J y(k, m) = x (j) (k, m) + d(k, m) + v(k, m) }{{}}{{} j=1 }{{} diffuse sound sensor noise J plane aves Aim: Capturing J plane aves (J M) ith desired arbitrary gain hile attenuating sensor noise and reverberation. l=1 (k, m) Φd(k, n) =φd(k, n) Γd(k), (3) Φn(k, n) =φn(k, n) I. (4) Here, I is an identity matrix, φn(k, n) is expected poer of microphone self-noise, hich is identical for all microphones, and φd(k, Linear n) isand Parametric expected poer Microphone of Array diffuse Processing field, hich can vary rapidly Emanuël across Habets time and (FAU) frequency. and Sharon The Gannot ij-th(biu) element of coher- G(k, ϕ) 2 [db] 5 1 15 2 G1 G2 ϕa ϕb 9 45 45 9 DOA ϕ [ ] The desired signal is estimated using an informed LCMV filter: Ẑ(k, m) = h H ilcmv(k, m) y(k, m) 1. Toarbitrarydirectivityfunctions&sourcepositions 3.1. Existing Spatial Filters 3. OPTIMAL SPATIAL FILTERS While PSD φn(k, c International n) can beaudio estimated Laboratories during Erlangen, periods of213 silence, φd(k, n) is commonly assumed unknon and unobservable. Page 18/31 We

Proposed Solution (1) The proposed informed LCMV filer is given by: h ilcmv = argmin h h H [Φ d (k, m) + Φ v(k, m)] h s. t. h H (k, m) a(k, ϕ j) = G(k, ϕ j), j {1, 2,..., J} here a(k, ϕ j) denotes steering vector for jth plane ave at time m and frequency k. For assumed signal model, e can alternatively minimize h H [Ψ(k, m) Γ d (k) + I] h, here Ψ(k, m) denotes instantaneous diffuse-to-noise ratio (DNR) and Γ d (k) denotes coherence matrix of diffuse sound field. The filter is updated for each time and frequency given instantaneous parametric information (DOAs, DNR). The filter requires knoledge of DNR, hich can be estimated using an auxiliary filter (see poster session AASP-P8 on Friday or [Thiergart and Habets, 213]). Page 19/31

Proposed Solution (2) DI [db] WNG [db] 12 9 6 3 6 n 5d 4nd nd 6 3 frequency [khz] DNR Estimation DOA Estimation Desired Response 9 45 Filter Weights 12 2 45 2 15 18 1 1.5 9 2 1 1 1.5 2 2.5 3.5 1 1.5 2 2.5 3 frequency time [khz] [s] time [s] (b) Mean (a) DOA WNG ϕ1(k, n) [ ] (b) G(k, ϕ1) 2 [db] 3. DI and WNG of filters in Sec. 3. For d, Figure: Left: DOA ϕ minimum WNG 5 1 (k, m) as a function of time 4. EstimatedDOAϕ1(k, and frequency. n) Right: and resulting Desired gains G(k, ϕ1) as set to 12 db to make filter robust response G(k, ϕ 5 against microphone 4 self-noise. 1 ) 2 in db for DOA ϕ 1 (k, m) as a function of time and frequency. fit. The best performance in terms of SSNR is obtained using n. 3 1 In terms of PESQ, nd and d outperform n. Usinginstanta- neous directional c constraints International (asaudio in this Laboratories section) instead Erlangen, of 213 time- for alinear silent part and 2 Parametric of signal Microphone and duringarray speech Processing activity [both signal 15 partsemanuël marked in 2(b)]. During silence, proposed filter 1 Habets (FAU) and Sharon Gannot (BIU) invariant constrains (as in Sec. 5.1, values in brackets) Page mainly 2/31 re- frequency [khz] n d nd nd 1 frequency [khz] (a) Mean DI frequency [khz] frequency [khz] 5 4 3 2 1.5 1 1.5 2 2.5 3 time [s] (a) DOA ϕ1(k, n) [ ] 5 4 3 9 45 45 9 5 1

Informed InformedLCMV LCMVFilter Filter Results (1) 5 5 2 2 4 4 1 1 3 3 2 2 9 9 9 9 DIDI[dB] [db] DIDI[dB] [db].(9). t. (9). (12) (12) pressed essed as as 12 12 12 12 3 3 6 6 nn dn dn dnd dnd nd nd nd nd ndnd sources active sources active sources silent sources silent 6 6 6 6 1 1.5.5 1 1 1.5 1.5 2 2 2.5 2.5 time time [s] [s] True Ψ(k, [db] (a)(a) True Ψ(k, n) n) [db] 3 3 8 8 nes. The solu. The solu- 4 4 7 7 frequency [khz] frequency [khz] (14) (14) 3 3 6 6 5 5 2 2 4 4 1 1 3 3 (15) (15) 1 1 2 2 1 1.5.5 1 1 1.5 1.5 2 2 2.5 2.5 3 3 1 1 3 3 3 3 6 6 6 6 6 6 6 6 12 12 12 12 18 18 18 18 WNG WNG[dB] [db] WNG WNG[dB] [db] t.9),(9), (13) (13) 1 1 1 1 frequency [khz] frequency [khz] frequency [khz] frequency [khz] Mean (a)(a) Mean DIDI Mean (a)(a) Mean DIDI nn dn dn nd nd d d nd nd nd nd ndnd sources silent sources silent sources active sources active 1 1 1 1 frequency [khz] frequency [khz] 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 frequency frequency[khz] [khz] frequency frequency[khz] [khz] 7 7 frequency [khz] frequency [khz] time time [s] [s] he propagation propagation Mean WNG (b)(b) Mean WNG Estimated Ψ(k, [db] (b)(b) Estimated Ψ(k, n) n) [db] Mean WNG (b)(b) Mean WNG ns given areare given byby WNG filters Sec., 3. 3.DIDI andand WNG of of filters in in Sec. 3. 3.ForFor d, d nf of Ψ(k, Ψ(k, n)n) is is 3.DIDI and WNG of filters Sec., True estimated DNR Ψ(k, The to marked areas 3. and WNG of filters in in Sec. 3. 3.ForFor d, robust 2. 2.True andand estimated DNR Ψ(k, n).n).the to marked areas d minimum WNG as 12 make filter minimum WNG as setset to to 12 dbdb to to make filter robust Figure: Top: Directivity index (DI) in db. Figure: Top: True DNR in db. Bottom: indicate respectively a silent and active part signal. indicate respectively a silent and active part of of signal. minimum WNG as set to 12 db to make filter robust minimum WNG as set to 12 db to make filter robust against microphone self-noise. against microphone self-noise. against microphone self-noise. against microphone self-noise. Bottom: White noise gain (WNG) in db. Estimated DNR in db. frequency frequency[khz] [khz] frequency frequency[khz] [khz] 4 4 8 8 frequency [khz] frequency [khz] lem m in in (8)(8) areare er diffuse andand diffuse Es 4. 4. Estim Es 4. 4. Estim Table 1 Table 1 su Table 1 Table 1 su International Erlangen International AudioLabs Erlangen terms in in terms of of sigs The required expected poer microphone self-noise n) minimizes The required expected poer of of microphone self-noise φn φ (k, n)audiolabs n (k, noise poer, The in terms nactivity terms of of sigs speech activity reverberant environment. The estimated in destimated speech duedue to to reverberant environment. ratio (SRR) ratio (SRR), a example estimated during silence assuming that poer cancan forfor example be be estimated during silence assuming that poer 6 activity 6speech activity due to reverberant environment. The estimated speech due to reverberant environment. The estimated ratio (SRR) (SRR), DNR 2(b) possesses a limited temporal resolution due to DNR in in 2(b) possesses a limited temporal resolution due to ratio ed approach approach in in minimizes diffuse poer, is SRR (sourc SRR (source a nd duedue is constant over time.note Note that proposed DNR estimator does is constant over time. that proposed DNR estimator does DNR 2(b) possesses a limited temporal resolution DNR in in 2(b) possesses a limited temporal resolution to to SRR (sourc (source an obtained incorporated temporal averaging process. Neverless, Ψ(k, be be obtained incorporated temporal averaging process. Neverless, Ψ(k, n)n) SRR ( ) and th ( ) and necessarily provide loest estimation variance practice nd notnot necessarily provide loest estimation variance in in practice nd incorporated temporal averaging process. Neverless, Ψ(k, n) incorporated temporal averaging process. Neverless, Ψ(k, n) proposed ilcmv filter that minimizes (( ) and th ssprit ESPRIT [24] [24] estimates sufficiently accurate shon folloing results. estimates areare sufficiently accurate as as shon byby folloing results. nd )ndand formance. HoH chosen optimization criteria (16), provides unbiased duedue to to chosen optimization criteria (16), butbut provides unbiased estimates sufficiently accurate shon folloing results. formance. estimates areare sufficiently accurate as as shon byby folloing results. formance.hoh estimation stimation of of Figure 3(a) depicts mean and both formance. Figure 3(a) depicts mean DIDI forfor n[shon d(hich areare both diffuse plus noise poer hen n and d (hich than SS SSNR results. results. Figure 3(a) depicts mean DI (hich are both than Figure 3(a) depicts for mean DIproposed forfor n and dfilter both n and d (hich than SS than SSNR signal-independent), for filter are (hich signal-independent), andand proposed nd (hich nd The bes fit.fit.the best p sources are active (red solid line) and signal-independent), and for proposed filter (hich signal-independent), and for proposed filter (hich nd nd onal fit.the The besp l fil-filfit.in best is signal-dependent). proposed filter, sho is signal-dependent). ForFor proposed filter, ee sho DIDI In terms of terms of P is signal-dependent). proposed filter, sho is signal-dependent). ForFor proposed filter, ee sho DIDI In In EXPERIMENTAL RESULTS 5. 5.EXPERIMENTAL RESULTS ffuse sound use sound is is terms of terms of P silent (red dashed line)]. a silent part signal during speech activity [both signal forfor a silent part of of signal andand during speech activity [both signal neousdirectio direc neous for a silent part of signal and during speech activity [both signal for a silent part of signal and during speech activity [both signal by maximizing maximizing neousdirectio direc neous parts marked 2(b)]. During silence, proposed filter invariant parts marked in in 2(b)]. During silence, proposed filter invariant co cons assume L= 2 plane aves model ULA LetLet us us assume L= 2 plane aves in in model in in (1)(1) andand an an ULA parts marked in 2(b)]. During silence, proposed filter invariant parts marked in During silence, filter invariant co cons (dashed line ) 2(b)]. provides same lo DI as. During speech (dashed line same lo DI asproposed n. nduring speech nd )ndprovides duced duced acha ith M 4 microphones ith an inter-microphone spacing 3 cm. ith MParametric = 4=microphones ith anarray inter-microphone of of 3 cm. (dashed line )ndprovides ) provides same lo as. During speech213 (dashed line nd same lo DIisDI as. nduring speech c), International Linear and Microphone Processing spacing Audio Laboratories Erlangen, n duced duced acha activity (solid line ), obtained DI as high as for robust activity (solid line obtained DI is as high as for robust 3 3 nd nd varying sou varying source reverberant shoebox room (7., RT 38 ms) AA reverberant shoebox room (7. 5.45.4 2.42.4 mm, RT 38 ms) 6 6 activity (solid line ),nd ), obtained DI is as high as for robust (16) activity (solid line obtained DI is as high as for robust (16) nd varying sou varying source Emanue l Habets (FAU) and Sharon Gannot (BIU) Page 21/31 beamformer ). Figure 3(b) shos corresponding WNGs. quired SDSD beamformer (( 3(b) shos corresponding WNGs. d ).dfigure quired comp comple as simulated using source-image method [26, ith to as simulated using source-image method [26, 27]27] ith to beamformer ). Figure 3(b) shos corresponding WNGs. SDSD beamformer (( ). Figure 3(b) shos corresponding WNGs.

Results (2) The proposed filter provides a high DI hen sound field is diffuse and a high WNG hen sensor noise is dominant. Interfering sound can be strongly attenuated if desired. The proposed DNR estimator provides a sufficiently high accuracy and temporal resolution to allo signal enhancement under adverse conditions even in changing acoustic scenes. SegSIR [db] SegSRR [db] SegSNR [db] PESQ 11 (11) 7 ( 7) 26 (26) 1.5 (1.5) n 21 (32) 2 ( 3) 33 (31) 2. (1.7) d 26 (35) ( 1) 22 (24) 2.1 (2.) nd 25 (35) 1 ( 1) 28 (26) 2.1 (2.) Table: Performance of all filters [ unprocessed, first sub-column using true DOAs (of sources), second sub-column using estimated DOAs (of plane aves)]. Audio Examples Page 22/31

3.4 Example D: Source Extraction Scenario Multiple talkers Additive background noise Distributed sensor arrays Applications Spatial filter Estimate of desired source at microphone Teleconferencing systems Automatic speech recognition Spatial sound reproduction Signal model: y(k, m) = x (d) (k, m) + i d x (i) (k, m) + v(k, m). (d) Aim: Obtain an MMSE estimate of X 1 (k, m). Page 23/31

Proposed Solution [Taseska and Habets, 213] Hyposes: H v : y(k, m) = v(k, m) speech absent H x : y(k, m) = x(k, m) + v(k, m) speech present J Hx j : y(k, m) = x (j) (k, m) + x (i) (k, m) + v(k, m) j = 1, 2,..., J i j } {{ } Recursive estimation of PSD matrices: ( Φ (j) x (m) = p[hx j y] α Φ(j) x x (m 1) + (1 α x) yy H) ( + 1 p[hx j y] ) Φ(j) x (m 1) Signal-to-diffuse ratio (Γ) and position (Θ) -based posterior probabilities: p[h j x y] = p[h j x y, H x] p[h x y] p[h j x Θ, H x] p[h x Γ, y] Page 24/31

Parameter-based PSD Matrix Estimation Γ p[h x Γ, y] Φv Ω Θ p[h j x Θ, H x] ˆp[Hj x y] Φ (j) x The distribution p[ Θ Hx] is modelled as a Gaussian mixture (GM). GM parameters estimated by Expectation-Maximization algorithm. Page 25/31

Results (1) Setup: Three reverberant sources ith approximately equal poer, diffuse babble speech (SNR=22 db), and uncorrelated sensor noise (SNR =5 db). The reverberation time as T6 = 25 ms. To uniform circular arrays ere used ith three omnidirectional microphones, a diameter 2.5 cm and an inter-array spacing of 1.5 m. (a) Training during single-talk (b) Training during triple-talk Figure: Output of EM algorithm (3 iterations) and 4.5 s of noisy speech data. The actual source positions are denoted by hite squares. The array location is marked by a plus symbol. The interior of each ellipse contains 85% probability mass of respective Gaussian. Page 26/31

Results (2) mixture mixture reference source signals reference source signals (1) (2) (3) extracted source signals extracted source signals (1) (2) (3) time (s) time (s) Figure: Left: constant triple-talk scenario. Right: mainly single-talk scenario. (S,M) Audio files available at http://home.tiscali.nl/ehabets/publications/taseska213.html. Page 27/31

More Information These and or examples are presented at ICASSP 213 on: Friday 1:3-12:3 in Poster Session AASP-P8: An Informed Spatial Filter in Spherical Harmonic Domain for Joint Noise Reduction and Dereverberation (Braun, Jarret, Fischer and Habets) Friday 1:3-12:3 in Poster Session AASP-P8: An Informed LCMV Filter Based on Multiple Instantaneous Direction-Of-Arrival Estimates (Thiergart and Habets) Friday 1:3-12:3 in Poster Session AASP-P8 MMSE-based Source Extraction using Position-based Posterior Probabilities (Taseska and Habets) Friday 1:3-12:3 in Poster Session AASP-P8 Spherical Harmonic Domain Noise Reduction Using an MVDR Beamformer and DOA-based Second-order Statistics Estimation (Jarrett, Habets and Naylor) Page 28/31

Special thanks to Sebastian Braun, Maja Taseska, Oliver Thiergart and Daniel Jarrett for ir contributions. Page 29/31

References I Benesty, J., Chen, J., and Habets, E. A. P. (211). Speech Enhancement in STFT Domain. SpringerBriefs in Electrical and Computer Engineering. Springer-Verlag. Braun, S., Jarrett, D. P., Fischer, J., and Habets, E. A. P. (213). An informed filter for dereverberation in spherical harmonic domain. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada. Jarrett, D. P., Habets, E. A. P., Thomas, M. R. P., and Naylor, P. A. (212). Rigid sphere room impulse response simulation: algorithm and applications. J. Acoust. Soc. Am., 132(3):1462 1472. Souden, M., Chen, J., Benesty, J., and Affes, S. (21). Gaussian model-based multichannel speech presence probability. IEEE Trans. Audio, Speech, Lang. Process., 18(5):172 177. Taseska, M. and Habets, E. (213). MMSE-based source extraction using position-based posterior probabilities. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). Page 3/31

References II Taseska, M. and Habets, E. A. P. (212). MMSE-based blind source extraction in diffuse noise fields using a complex coherence-based a priori SAP estimator. In Proc. Intl. Workshop Acoust. Signal Enhancement (IWAENC). Thiergart, O., Del Galdo, G., and Habets, E. A. P. (212). On coherence in mixed sound fields and its application to signal-to-diffuse ratio estimation. J. Acoust. Soc. Am., 132(4):2337 2346. Thiergart, O. and Habets, E. (213). Informed optimum filtering using multiple instantaneous direction-of-arrival estimates. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). Thiergart, O. and Habets, E. A. P. (212). Sound field model violations in parametric sound processing. In Proc. Intl. Workshop Acoust. Signal Enhancement (IWAENC). Page 31/31