Linear and Parametric Microphone Array Processing

Part 5 - Joint Linear and Parametric Spatial Processing Emanuël A. P. Habets 1 and Sharon Gannot 2 1 International Audio Laboratories Erlangen, Germany A joint institution of University of Erlangen-Nuremberg and Fraunhofer IIS 2 Faculty of Engineering, Bar-Ilan University, Israel ICASSP 213, Vancouver, Canada

Overvie 1 Motivation 2 Informed Spatial Filtering 3 Examples Page 1/31

1. Motivation Classical Linear Spatial Filtering: + High amount of noise plus interference reduction + Controllable tradeoff beteen speech distortion and noise reduction + Controllable tradeoff beteen different noise types Not very robust.r.t. estimation errors, position changes, etc. Relatively slo response time Parametric Spatial Filtering: + Fast response time + Relatively robust.r.t. estimation errors, position changes, etc. + Possibility to manipulate parameters (e.g., virtual source displacement) Inherent tradeoff beteen speech distortion and noise reduction Model violations can introduce audible artifacts [Thiergart and Habets, 212] Relatively poor interference reduction due to tradeoff and model violations Page 2/31

Overvie 1 Motivation 2 Informed Spatial Filtering 3 Examples Page 3/31

2. Informed Spatial Filtering The main idea behind informed filtering is to incorporate relevant information about specific problem into design of filters and estimation of required statistics. Microphone Signals Informed Multichannel Spatial Filter Processed Signals Estimate Second Order Statistics Estimate Parameters (e.g., diffuseness, DOA) Figure: Informed filtering approach. Page 4/31

2. Informed Spatial Filtering A selection of parameters that can be used (see Part 4): Signal-to-diffuse ratio (SDR): Γ(k, m, p i) = P dir(k, m, p i) P diff (k, m), here P dir is poer of direct component at position p i and P diff is poer of diffuse component (assuming a ly homogenous sound field). Time and frequency dependent direction-of-arrival estimates. Time and frequency dependent interaural level differences. Time and frequency dependent interaural phase differences.... Page 5/31

Overvie 1 Motivation 2 Informed Spatial Filtering 3 Examples Example A: Extracting Coherent Sound Sources Example B: Dereverberation in SH Domain Example C: Directional Filtering Example D: Source Extraction Page 6/31

3.1 Example A: Extracting Coherent Sound Sources Signal model: y(k, m) = x(k, m) + v(k, m). Assumption: Desired signals are strongly coherent across array. Aim: Estimate X1(k, m) using a parametric multichannel Wiener filter [Benesty et al., 211]: Φ 1 v (k, m)φ x(k, m) h PMWF(k, m) = λ(k, m) + tr { Φ 1 v (k, m)φ x(k, m) } u1 5 λ -15-1 -5 5 1 15 Figure: Mapping from input signal-to-diffuse ratio to tradeoff parameter λ [Taseska and Habets, 212]. Page 7/31

Proposed Solution [Taseska and Habets, 212] y(k, m) Z(k, m) Parametric Multichannel Wiener Filter Estimate Noise PSD Matrix v(k, m) Estimate Speech Presence Probability P [H 1 y(k, m)] (k, m) Estimate Signal-to- Diffuse Ratio Figure: Block diagram of proposed system. Page 8/31

Algorithm Summary High-level description of proposed algorithm [Taseska and Habets, 212]: 1. Compute signal-to-diffuse ratio (SDR) using [Thiergart et al., 212]. 2. Compute a priori speech presence probability (SPP) based on SDR. 3. Compute multichannel a posteriori SPP [Souden et al., 21]. 4. Update noise PSD matrix using a posteriori SPP. 5. Compute tradeoff parameter for parametric multichannel Wiener filter (PMWF) based on SDR: - When SDR is high, e decrease amount of speech distortion. - When SDR is lo, e increase amount of noise reduction. 6. Compute and apply parametric multichannel Wiener filter. Page 9/31

Results (1).8 λ = λ = 1 λ = f(γ) λ = f(γ) + SPP.8 Δ PESQ.6.4.2.6.4.2 5 5 1 15 2 25 input SNR (db) 5 5 1 15 2 25 input SNR (db) Figure: Performance evaluation: PESQ improvement for stationary diffuse noise (left) and diffuse babble speech (right) [Taseska and Habets, 212]. Page 1/31

Results (2) λ = λ = 1 λ = f(γ) λ = f(γ) + SPP SNR gain (db) 8 6 4 2 8 6 4 2 5 5 1 15 2 25 input SNR (db) 5 5 1 15 2 25 input SNR (db) Figure: Performance evaluation: segmental SNR improvement for stationary diffuse noise (left) and diffuse babble speech (right) [Taseska and Habets, 212]. Page 11/31

Results (3) 8 2 8 2 7 1 7 1 Frequency [khz] 6 5 4 3 2 1 2 3 Frequency [khz] 6 5 4 3 2 1 2 3 1 4 1 4 1 2 3 4 5 6 Time [s] 5 1 2 3 4 5 6 Time [s] 5 (a) First Microphone Signal (b) MVDR 8 2 8 2 7 1 7 1 Frequency [khz] 6 5 4 3 2 1 2 3 Frequency [khz] 6 5 4 3 2 1 2 3 1 4 1 4 1 2 3 4 5 6 Time [s] (c) Parametric MWF 5 1 2 3 4 5 6 Time [s] (d) Parametric MWF ith MC-SPP 5 Figure: Examples obtained using M=4 microphone signals corrupted by sensor noise and babble speech (input SNR = 1 db). Audio Examples Page 12/31

3.2 Example B: Dereverberation in SH Domain Assumed signal model ith stacked spherical harmonic components: p(k, m) = x(k, m) + d(k, m) + ṽ(k, m) }{{} = γ(k, m) X (k, m) + ũ(k, m) γ(k, m) = x(k, m) X (k, m) = y(ω dir) Y (Ω dir ) = γ dir, here Y is zero-order spherical harmonic and Ω dir is DOA. p d v x Spherical Harmonics up to order 3 Page 13/31

Proposed Solution [Braun et al., 213] Desired signal: The direct signal component X(k, m) hich corresponds to sound pressure measured at center of array in absence of spherical microphone array. Assumption: Direct, diffuse and noise components are mutually uncorrelated. Proposed solution: The (rank-1) MWF provides an MMSE estimate of X (k, m). For practical reasons, e split MWF into an MVDR filter folloed by a single-channel Wiener filter: h MWF(k, m) = φ X (k, m) Φ 1 ũ (k, m) γ dir φ X (k, m) γ H dir Φ 1 ũ (k, m) γ dir + 1 φ X = Φ 1 ũ (k, m) γ dir γ H dir Φ 1 ũ (k, m) γ dir φ }{{} X + [ γ H dir Φ 1 ũ (k, m) γ ] 1 dir }{{} h MVDR (k,m) H W (k,m) Page 14/31

Parameter-based PSD Matrix Estimation Required information: Diffuse PSD matrix estimation: Direction of arrival (DOA) γ dir Interference PSD matrix: Φũ(k, m) = Φ d(k, m) + Φṽ(k, m) Assume model for diffuse sound component: Φ d(k, m) = φ D (k, m) I (L+1) 2 Calculate diffuse sound PSD using an estimate of diffuseness Ψ: φ D (k, m) = φ P (k, m) φṽ (k, m) Ψ 1 (k, m) STFT SHT STFT -1 Diffuseness Estimation Residual Interf. PSD Diffuse PSD Estimation Page 15/31

Results 8 2 8 2 7 1 7 1 Frequency [khz] 6 5 4 3 2 1 2 3 Frequency [khz] 6 5 4 3 2 1 2 3 1 4 1 4.5 1 1.5 2 2.5 Time [s] 5.5 1 1.5 2 2.5 Time [s] 5 (a) Reference X (k, m) (b) Received P (k, m) 8 2 8 2 7 1 7 1 Frequency [khz] 6 5 4 3 2 1 2 3 Frequency [khz] 6 5 4 3 2 1 2 3 1 4 1 4.5 1 1.5 2 2.5 Time [s] 5.5 1 1.5 2 2.5 Time [s] 5 (c) Processed: MVDR (d) Processed: MWF Figure: Examples obtained using simulated signals [Jarrett et al., 212] (source-array distance is 2 m, SNR = 2 db, T 6 =4 ms). Audio examples. Page 16/31

3.3 Example C: Directional Filtering Flexible sound acquisition in noisy and reverberant environments ith rapidly changing acoustic scenes is a common problem in modern communication systems. A filter is proposed that provides an arbitrary response for J sources being simultaneously active per time and frequency. The proposed filter provides an optimal tradeoff beteen hite noise gain (WNG) and directivity index. The filter exploits instantaneous information on sound (narroband DOAs, diffuse-to-noise ratio) hich allos a nearly immediate adaption to changes in acoustic scene. Page 17/31

Problem Formulation Assuming three components in (1) are mutually uncorrelated, e canthe expressdesired poer signal spectral density is given (PSD) by: matrix of microphone signals as { J } Φ(k, n) Z(k, =E m) x(k, n) = x H (k, n) G(k, ϕ j)x (j) 1 L j=1 = Φl(k, n)+φd(k, n)+φn(k, n), (2) ith Signal model: Based on a multi-ave sound field model, M microphone signals can be expressed as: J y(k, m) = x (j) (k, m) + d(k, m) + v(k, m) }{{}}{{} j=1 }{{} diffuse sound sensor noise J plane aves Aim: Capturing J plane aves (J M) ith desired arbitrary gain hile attenuating sensor noise and reverberation. l=1 (k, m) Φd(k, n) =φd(k, n) Γd(k), (3) Φn(k, n) =φn(k, n) I. (4) Here, I is an identity matrix, φn(k, n) is expected poer of microphone self-noise, hich is identical for all microphones, and φd(k, Linear n) isand Parametric expected poer Microphone of Array diffuse Processing field, hich can vary rapidly Emanuël across Habets time and (FAU) frequency. and Sharon The Gannot ij-th(biu) element of coher- G(k, ϕ) 2 [db] 5 1 15 2 G1 G2 ϕa ϕb 9 45 45 9 DOA ϕ [ ] The desired signal is estimated using an informed LCMV filter: Ẑ(k, m) = h H ilcmv(k, m) y(k, m) 1. Toarbitrarydirectivityfunctions&sourcepositions 3.1. Existing Spatial Filters 3. OPTIMAL SPATIAL FILTERS While PSD φn(k, c International n) can beaudio estimated Laboratories during Erlangen, periods of213 silence, φd(k, n) is commonly assumed unknon and unobservable. Page 18/31 We

Proposed Solution (1) The proposed informed LCMV filer is given by: h ilcmv = argmin h h H [Φ d (k, m) + Φ v(k, m)] h s. t. h H (k, m) a(k, ϕ j) = G(k, ϕ j), j {1, 2,..., J} here a(k, ϕ j) denotes steering vector for jth plane ave at time m and frequency k. For assumed signal model, e can alternatively minimize h H [Ψ(k, m) Γ d (k) + I] h, here Ψ(k, m) denotes instantaneous diffuse-to-noise ratio (DNR) and Γ d (k) denotes coherence matrix of diffuse sound field. The filter is updated for each time and frequency given instantaneous parametric information (DOAs, DNR). The filter requires knoledge of DNR, hich can be estimated using an auxiliary filter (see poster session AASP-P8 on Friday or [Thiergart and Habets, 213]). Page 19/31

Proposed Solution (2) DI [db] WNG [db] 12 9 6 3 6 n 5d 4nd nd 6 3 frequency [khz] DNR Estimation DOA Estimation Desired Response 9 45 Filter Weights 12 2 45 2 15 18 1 1.5 9 2 1 1 1.5 2 2.5 3.5 1 1.5 2 2.5 3 frequency time [khz] [s] time [s] (b) Mean (a) DOA WNG ϕ1(k, n) [ ] (b) G(k, ϕ1) 2 [db] 3. DI and WNG of filters in Sec. 3. For d, Figure: Left: DOA ϕ minimum WNG 5 1 (k, m) as a function of time 4. EstimatedDOAϕ1(k, and frequency. n) Right: and resulting Desired gains G(k, ϕ1) as set to 12 db to make filter robust response G(k, ϕ 5 against microphone 4 self-noise. 1 ) 2 in db for DOA ϕ 1 (k, m) as a function of time and frequency. fit. The best performance in terms of SSNR is obtained using n. 3 1 In terms of PESQ, nd and d outperform n. Usinginstanta- neous directional c constraints International (asaudio in this Laboratories section) instead Erlangen, of 213 time- for alinear silent part and 2 Parametric of signal Microphone and duringarray speech Processing activity [both signal 15 partsemanuël marked in 2(b)]. During silence, proposed filter 1 Habets (FAU) and Sharon Gannot (BIU) invariant constrains (as in Sec. 5.1, values in brackets) Page mainly 2/31 re- frequency [khz] n d nd nd 1 frequency [khz] (a) Mean DI frequency [khz] frequency [khz] 5 4 3 2 1.5 1 1.5 2 2.5 3 time [s] (a) DOA ϕ1(k, n) [ ] 5 4 3 9 45 45 9 5 1

Informed InformedLCMV LCMVFilter Filter Results (1) 5 5 2 2 4 4 1 1 3 3 2 2 9 9 9 9 DIDI[dB] [db] DIDI[dB] [db].(9). t. (9). (12) (12) pressed essed as as 12 12 12 12 3 3 6 6 nn dn dn dnd dnd nd nd nd nd ndnd sources active sources active sources silent sources silent 6 6 6 6 1 1.5.5 1 1 1.5 1.5 2 2 2.5 2.5 time time [s] [s] True Ψ(k, [db] (a)(a) True Ψ(k, n) n) [db] 3 3 8 8 nes. The solu. The solu- 4 4 7 7 frequency [khz] frequency [khz] (14) (14) 3 3 6 6 5 5 2 2 4 4 1 1 3 3 (15) (15) 1 1 2 2 1 1.5.5 1 1 1.5 1.5 2 2 2.5 2.5 3 3 1 1 3 3 3 3 6 6 6 6 6 6 6 6 12 12 12 12 18 18 18 18 WNG WNG[dB] [db] WNG WNG[dB] [db] t.9),(9), (13) (13) 1 1 1 1 frequency [khz] frequency [khz] frequency [khz] frequency [khz] Mean (a)(a) Mean DIDI Mean (a)(a) Mean DIDI nn dn dn nd nd d d nd nd nd nd ndnd sources silent sources silent sources active sources active 1 1 1 1 frequency [khz] frequency [khz] 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 frequency frequency[khz] [khz] frequency frequency[khz] [khz] 7 7 frequency [khz] frequency [khz] time time [s] [s] he propagation propagation Mean WNG (b)(b) Mean WNG Estimated Ψ(k, [db] (b)(b) Estimated Ψ(k, n) n) [db] Mean WNG (b)(b) Mean WNG ns given areare given byby WNG filters Sec., 3. 3.DIDI andand WNG of of filters in in Sec. 3. 3.ForFor d, d nf of Ψ(k, Ψ(k, n)n) is is 3.DIDI and WNG of filters Sec., True estimated DNR Ψ(k, The to marked areas 3. and WNG of filters in in Sec. 3. 3.ForFor d, robust 2. 2.True andand estimated DNR Ψ(k, n).n).the to marked areas d minimum WNG as 12 make filter minimum WNG as setset to to 12 dbdb to to make filter robust Figure: Top: Directivity index (DI) in db. Figure: Top: True DNR in db. Bottom: indicate respectively a silent and active part signal. indicate respectively a silent and active part of of signal. minimum WNG as set to 12 db to make filter robust minimum WNG as set to 12 db to make filter robust against microphone self-noise. against microphone self-noise. against microphone self-noise. against microphone self-noise. Bottom: White noise gain (WNG) in db. Estimated DNR in db. frequency frequency[khz] [khz] frequency frequency[khz] [khz] 4 4 8 8 frequency [khz] frequency [khz] lem m in in (8)(8) areare er diffuse andand diffuse Es 4. 4. Estim Es 4. 4. Estim Table 1 Table 1 su Table 1 Table 1 su International Erlangen International AudioLabs Erlangen terms in in terms of of sigs The required expected poer microphone self-noise n) minimizes The required expected poer of of microphone self-noise φn φ (k, n)audiolabs n (k, noise poer, The in terms nactivity terms of of sigs speech activity reverberant environment. The estimated in destimated speech duedue to to reverberant environment. ratio (SRR) ratio (SRR), a example estimated during silence assuming that poer cancan forfor example be be estimated during silence assuming that poer 6 activity 6speech activity due to reverberant environment. The estimated speech due to reverberant environment. The estimated ratio (SRR) (SRR), DNR 2(b) possesses a limited temporal resolution due to DNR in in 2(b) possesses a limited temporal resolution due to ratio ed approach approach in in minimizes diffuse poer, is SRR (sourc SRR (source a nd duedue is constant over time.note Note that proposed DNR estimator does is constant over time. that proposed DNR estimator does DNR 2(b) possesses a limited temporal resolution DNR in in 2(b) possesses a limited temporal resolution to to SRR (sourc (source an obtained incorporated temporal averaging process. Neverless, Ψ(k, be be obtained incorporated temporal averaging process. Neverless, Ψ(k, n)n) SRR ( ) and th ( ) and necessarily provide loest estimation variance practice nd notnot necessarily provide loest estimation variance in in practice nd incorporated temporal averaging process. Neverless, Ψ(k, n) incorporated temporal averaging process. Neverless, Ψ(k, n) proposed ilcmv filter that minimizes (( ) and th ssprit ESPRIT [24] [24] estimates sufficiently accurate shon folloing results. estimates areare sufficiently accurate as as shon byby folloing results. nd )ndand formance. HoH chosen optimization criteria (16), provides unbiased duedue to to chosen optimization criteria (16), butbut provides unbiased estimates sufficiently accurate shon folloing results. formance. estimates areare sufficiently accurate as as shon byby folloing results. formance.hoh estimation stimation of of Figure 3(a) depicts mean and both formance. Figure 3(a) depicts mean DIDI forfor n[shon d(hich areare both diffuse plus noise poer hen n and d (hich than SS SSNR results. results. Figure 3(a) depicts mean DI (hich are both than Figure 3(a) depicts for mean DIproposed forfor n and dfilter both n and d (hich than SS than SSNR signal-independent), for filter are (hich signal-independent), andand proposed nd (hich nd The bes fit.fit.the best p sources are active (red solid line) and signal-independent), and for proposed filter (hich signal-independent), and for proposed filter (hich nd nd onal fit.the The besp l fil-filfit.in best is signal-dependent). proposed filter, sho is signal-dependent). ForFor proposed filter, ee sho DIDI In terms of terms of P is signal-dependent). proposed filter, sho is signal-dependent). ForFor proposed filter, ee sho DIDI In In EXPERIMENTAL RESULTS 5. 5.EXPERIMENTAL RESULTS ffuse sound use sound is is terms of terms of P silent (red dashed line)]. a silent part signal during speech activity [both signal forfor a silent part of of signal andand during speech activity [both signal neousdirectio direc neous for a silent part of signal and during speech activity [both signal for a silent part of signal and during speech activity [both signal by maximizing maximizing neousdirectio direc neous parts marked 2(b)]. During silence, proposed filter invariant parts marked in in 2(b)]. During silence, proposed filter invariant co cons assume L= 2 plane aves model ULA LetLet us us assume L= 2 plane aves in in model in in (1)(1) andand an an ULA parts marked in 2(b)]. During silence, proposed filter invariant parts marked in During silence, filter invariant co cons (dashed line ) 2(b)]. provides same lo DI as. During speech (dashed line same lo DI asproposed n. nduring speech nd )ndprovides duced duced acha ith M 4 microphones ith an inter-microphone spacing 3 cm. ith MParametric = 4=microphones ith anarray inter-microphone of of 3 cm. (dashed line )ndprovides ) provides same lo as. During speech213 (dashed line nd same lo DIisDI as. nduring speech c), International Linear and Microphone Processing spacing Audio Laboratories Erlangen, n duced duced acha activity (solid line ), obtained DI as high as for robust activity (solid line obtained DI is as high as for robust 3 3 nd nd varying sou varying source reverberant shoebox room (7., RT 38 ms) AA reverberant shoebox room (7. 5.45.4 2.42.4 mm, RT 38 ms) 6 6 activity (solid line ),nd ), obtained DI is as high as for robust (16) activity (solid line obtained DI is as high as for robust (16) nd varying sou varying source Emanue l Habets (FAU) and Sharon Gannot (BIU) Page 21/31 beamformer ). Figure 3(b) shos corresponding WNGs. quired SDSD beamformer (( 3(b) shos corresponding WNGs. d ).dfigure quired comp comple as simulated using source-image method [26, ith to as simulated using source-image method [26, 27]27] ith to beamformer ). Figure 3(b) shos corresponding WNGs. SDSD beamformer (( ). Figure 3(b) shos corresponding WNGs.

Results (2) The proposed filter provides a high DI hen sound field is diffuse and a high WNG hen sensor noise is dominant. Interfering sound can be strongly attenuated if desired. The proposed DNR estimator provides a sufficiently high accuracy and temporal resolution to allo signal enhancement under adverse conditions even in changing acoustic scenes. SegSIR [db] SegSRR [db] SegSNR [db] PESQ 11 (11) 7 ( 7) 26 (26) 1.5 (1.5) n 21 (32) 2 ( 3) 33 (31) 2. (1.7) d 26 (35) ( 1) 22 (24) 2.1 (2.) nd 25 (35) 1 ( 1) 28 (26) 2.1 (2.) Table: Performance of all filters [ unprocessed, first sub-column using true DOAs (of sources), second sub-column using estimated DOAs (of plane aves)]. Audio Examples Page 22/31

3.4 Example D: Source Extraction Scenario Multiple talkers Additive background noise Distributed sensor arrays Applications Spatial filter Estimate of desired source at microphone Teleconferencing systems Automatic speech recognition Spatial sound reproduction Signal model: y(k, m) = x (d) (k, m) + i d x (i) (k, m) + v(k, m). (d) Aim: Obtain an MMSE estimate of X 1 (k, m). Page 23/31

Proposed Solution [Taseska and Habets, 213] Hyposes: H v : y(k, m) = v(k, m) speech absent H x : y(k, m) = x(k, m) + v(k, m) speech present J Hx j : y(k, m) = x (j) (k, m) + x (i) (k, m) + v(k, m) j = 1, 2,..., J i j } {{ } Recursive estimation of PSD matrices: ( Φ (j) x (m) = p[hx j y] α Φ(j) x x (m 1) + (1 α x) yy H) ( + 1 p[hx j y] ) Φ(j) x (m 1) Signal-to-diffuse ratio (Γ) and position (Θ) -based posterior probabilities: p[h j x y] = p[h j x y, H x] p[h x y] p[h j x Θ, H x] p[h x Γ, y] Page 24/31

Parameter-based PSD Matrix Estimation Γ p[h x Γ, y] Φv Ω Θ p[h j x Θ, H x] ˆp[Hj x y] Φ (j) x The distribution p[ Θ Hx] is modelled as a Gaussian mixture (GM). GM parameters estimated by Expectation-Maximization algorithm. Page 25/31

Results (1) Setup: Three reverberant sources ith approximately equal poer, diffuse babble speech (SNR=22 db), and uncorrelated sensor noise (SNR =5 db). The reverberation time as T6 = 25 ms. To uniform circular arrays ere used ith three omnidirectional microphones, a diameter 2.5 cm and an inter-array spacing of 1.5 m. (a) Training during single-talk (b) Training during triple-talk Figure: Output of EM algorithm (3 iterations) and 4.5 s of noisy speech data. The actual source positions are denoted by hite squares. The array location is marked by a plus symbol. The interior of each ellipse contains 85% probability mass of respective Gaussian. Page 26/31

Results (2) mixture mixture reference source signals reference source signals (1) (2) (3) extracted source signals extracted source signals (1) (2) (3) time (s) time (s) Figure: Left: constant triple-talk scenario. Right: mainly single-talk scenario. (S,M) Audio files available at http://home.tiscali.nl/ehabets/publications/taseska213.html. Page 27/31

More Information These and or examples are presented at ICASSP 213 on: Friday 1:3-12:3 in Poster Session AASP-P8: An Informed Spatial Filter in Spherical Harmonic Domain for Joint Noise Reduction and Dereverberation (Braun, Jarret, Fischer and Habets) Friday 1:3-12:3 in Poster Session AASP-P8: An Informed LCMV Filter Based on Multiple Instantaneous Direction-Of-Arrival Estimates (Thiergart and Habets) Friday 1:3-12:3 in Poster Session AASP-P8 MMSE-based Source Extraction using Position-based Posterior Probabilities (Taseska and Habets) Friday 1:3-12:3 in Poster Session AASP-P8 Spherical Harmonic Domain Noise Reduction Using an MVDR Beamformer and DOA-based Second-order Statistics Estimation (Jarrett, Habets and Naylor) Page 28/31

Special thanks to Sebastian Braun, Maja Taseska, Oliver Thiergart and Daniel Jarrett for ir contributions. Page 29/31

References I Benesty, J., Chen, J., and Habets, E. A. P. (211). Speech Enhancement in STFT Domain. SpringerBriefs in Electrical and Computer Engineering. Springer-Verlag. Braun, S., Jarrett, D. P., Fischer, J., and Habets, E. A. P. (213). An informed filter for dereverberation in spherical harmonic domain. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada. Jarrett, D. P., Habets, E. A. P., Thomas, M. R. P., and Naylor, P. A. (212). Rigid sphere room impulse response simulation: algorithm and applications. J. Acoust. Soc. Am., 132(3):1462 1472. Souden, M., Chen, J., Benesty, J., and Affes, S. (21). Gaussian model-based multichannel speech presence probability. IEEE Trans. Audio, Speech, Lang. Process., 18(5):172 177. Taseska, M. and Habets, E. (213). MMSE-based source extraction using position-based posterior probabilities. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). Page 3/31

References II Taseska, M. and Habets, E. A. P. (212). MMSE-based blind source extraction in diffuse noise fields using a complex coherence-based a priori SAP estimator. In Proc. Intl. Workshop Acoust. Signal Enhancement (IWAENC). Thiergart, O., Del Galdo, G., and Habets, E. A. P. (212). On coherence in mixed sound fields and its application to signal-to-diffuse ratio estimation. J. Acoust. Soc. Am., 132(4):2337 2346. Thiergart, O. and Habets, E. (213). Informed optimum filtering using multiple instantaneous direction-of-arrival estimates. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). Thiergart, O. and Habets, E. A. P. (212). Sound field model violations in parametric sound processing. In Proc. Intl. Workshop Acoust. Signal Enhancement (IWAENC). Page 31/31