Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation Gal Reuven Under supervision of Sharon Gannot 1 and Israel Cohen 2 1 School of Engineering, Bar-Ilan University, Ramat-Gan 2 Department of Electrical Engineering, Technion, Haifa February, 26

Motivation Motivation Interferences degrade MICROPHONES ARRAY AMBIENT NOISE Intelligibility Speech compression quality Speech recognition rates NOISE SOURCE SPEECH ENHANCEMENT SYSTEM COMPEETING SPEECH SIGNAL DESIRED SPEECH SIGNAL Goal: speech enhancement by joint interference and noise reduction system 1

Problem presentation Outline Outline The DTF-GSC Estimation Performance analysis and experimental study Application: joint AEC and NR Cascade schemes ETF-GSC scheme 2

Problem Presentation Problem Presentation M 3 microphones One desired speech signal One directional interference signal One directional/ambient noise signal Arbitrary acoustic transfer functions (ATFs) NOISE SOURCE COMPEETING SPEECH SIGNAL MICROPHONES ARRAY DESIRED SPEECH SIGNAL AMBIENT NOISE SPEECH ENHANCEMENT SYSTEM z m (t) = a m (t) s 1 (t) + b m (t) s 2 (t) + n m (t) m = 1,..., M 3

Time Domain Presentation Problem Presentation z m (t) = a m (t) s 1 (t) + b m (t) s 2 (t) + n m (t); m = 1,..., M where a m (t): b m (t): s 1 (t): s 2 (t): n m (t): the acoustical impulse responses of the m-th microphone to the desired speech source the acoustical impulse responses of the m-th microphone to the non-stationary interference source the desired speech the non-stationary interference source the (directional or nondirectional) stationary noise signal at the m-th microphone 4

Frequency Domain Presentation Problem Presentation STFT: where Z(t, e jω ) = A(e jω )S 1 (t, e jω ) + B(e jω )S 2 (t, e jω ) + N(t, e jω ) Z(t, e jω ) = [ Z 1 (t, e jω ) Z 2 (t, e jω ) Z M (t, e jω ) ] T A(e jω ) = [ A 1 (e jω ) A 2 (e jω ) A M (e jω ) ] T B(e jω ) = [ B 1 (e jω ) B 2 (e jω ) B M (e jω ) ] T N(t, e jω ) = [ N 1 (t, e jω ) N 2 (t, e jω ) N M (t, e jω ) ] T

Goal Problem Presentation Reconstruct the desired speech signal in an environment contains Reverberation Competing speech signal (double talk) Stationary noise Applications Blind source separation (BSS) Acoustic echo cancellation (AEC) Methods Extend TF-GSC such that it will apply null to the interference direction Exploiting non stationarity of desired and interference signals 6

Dual Source Transfer-Function Generalized Sidelobe Canceller (DTF-GSC) Z 1 (t, e jω ) DTF-GSC Z 2 (t, e jω ) Z 3 (t, e jω ) Z M (t, e jω ) W 11 11 11 11 11 Y MBF (t, e jω ) + Y NC (t, e jω ) Y (t, e jω ) U 3 (t, e jω ) G 3 (t, e jω ) U 4 (t, e jω ) G 4 (t, e jω ) H 11 11 11 U M (t, e jω ) G M (t, e jω ) 7

Method Extend the TF-GSC for dealing with nonstationary interference DTF-GSC Matched beamformer (MBF) Distortionless to the desired direction while blocking the interference direction Blocking matrix (BM) Blocking both desired and interference directions Adaptive noise canceller (ANC) Estimates the residual noise at the MBF output using reference signals produced by the BM 8

ATFs ratio matched filter: Matched beamformer DTF-GSC W (e jω ) = A(e jω ) B(e jω ) ρ(e jω ) A(e jω ) 2 A(e jω ) B(e jω ) 1 ρ(e jω ) 2 F(e jω ) ρ(e jω ) B (e jω )A(e jω ) A(e jω ) B(e jω ) Easily verified: A (e jω )W (e jω ) = F(e jω ) B (e jω )W (e jω ) = 9

H(e jω ) = Blocking Matrix Q 3 (e jω ) Q 4 (e jω ) Q M (e jω ) L 3 (e jω ) L 4 (e jω ) L M (e jω ) 1 1... 1 DTF-GSC Q m (e jω ) = A 2 (ejω ) B A m (ejω ) 1 (ejω ) B1 (ejω ) B 2 (ejω ) B1 (ejω ) A 2 (ejω ) A 1 (ejω ) B 2 (ejω ) B1 (ejω ) A m (ejω ) A 1 (ejω ) ; m = 3,..., M L m (e jω ) = A m (ejω ) A 1 (ejω ) B m (ejω ) B1 (ejω ) A 2 (ejω ) A 1 (ejω ) B 2 (ejω ) B1 (ejω ) ; m = 3,..., M 1

Blocking Matrix DTF-GSC Easily verified: A (e jω )H(e jω ) = B (e jω )H(e jω ) = 11

Adaptive Noise Canceller DTF-GSC Normalized LMS: G m (t + 1, e jω ) = G m (t, e jω ) + µ U m(t, e jω )Y (t, e jω ) P est (t, e jω ) G m (t + 1, e jω ) FIR G m (t + 1, e jω ) for m = 3,..., M; where P est (t, e jω ) = ηp est (t 1, e jω ) + (1 η) Z(t, e jω ) 2 12

Estimation Estimation MBF components: Done in a two steps procedure Estimating A m (ejω ) A 1 (ejω ) and B m (ejω ) B1 (ejω ) exploiting non stationarity calculating W (e jω ) 13

Estimation Estimation An unbiased estimate of A m (ejω ) A 1 (ejω ) and B m (ejω ) B1 (ejω ) is obtained by applying LS to ˆΦ (1) z m z 1 (e jω ) ˆΦ (2) z m z 1 (e jω ). ˆΦ (K) z m z 1 (e jω ) = ˆΦ (1) z 1 z 1 (e jω ) 1 ˆΦ (2) z 1 z 1 (e jω ). 1 ˆΦ (K) z 1 z 1 (e jω ) 1 [ Hm (e jω ) Φ um z 1 (e jω ) ] + ε (1) m (e jω ) ε (2) m (e jω ). ε (K) m (e jω ) (a separate set of equations is used for m = 2,..., M). 14

Estimation Estimation BM components: Estimation method depends on type of frames Single speech signal is active: A m (ejω ) A 1 (ejω ) or B m (ejω ) B 1 (ejω ) is adapted and H(ejω ) is calculated Double talk: Q m (e jω ) and L m (e jω ) are estimated directly by solving 1

Estimation Estimation ˆΦ (1) z m z 1 (e jω ) ˆΦ (2) z m z 1 (e jω ). ˆΦ (K) z m z 1 (e jω ) = ˆΦ (1) ˆΦ (2) z 1 z 1 (e jω ) z 1 z 1 (e jω ). ˆΦ (K) z 1 z 1 (e jω ) ˆΦ(1) z 2 z 1 (e jω ) 1 ˆΦ(2) z 2 z 1 (e jω ) 1 ˆΦ(K) z 2 z 1 (e jω ) 1 Q m(e jω ) L m (e jω ) Φ um z 1 (e jω ) + ε (1) m (e jω ) ε (2) m (e jω ). ε (K) m (e jω ) (a separate set of equations is used for m = 3,..., M) 16

DTF-GSC Performance Analysis General expression for the output power spectral density: Φ yy (t, e jω ) = { Ŵ (e jω )ΦZZ (t, e jω )Ŵ(e jω ) DTF-GSC Analysis Ŵ (e jω )ΦNN (t, e jω )Ĥ(ejω ) (Ĥ (e jω )ΦNN (t, e jω )Ĥ(ejω )) 1 Ĥ (e jω )ΦZZ (t, e jω )Ŵ(e jω ) Ŵ (e jω )ΦZZ (t, e jω )Ĥ(ejω ) (Ĥ (e jω )ΦNN (t, e jω )Ĥ(ejω )) 1 Ĥ (e jω )ΦNN (t, e jω )Ŵ(e jω ) +Ŵ (e jω )ΦNN (t, e jω )Ĥ(ejω ) (Ĥ (e jω )ΦNN (t, e jω )Ĥ(ejω )) 1 Ĥ (e jω )ΦZZ (t, e jω )Ĥ(ejω ) (Ĥ (e jω )ΦNN (t, e jω )Ĥ(ejω )) 1 Ĥ (e jω )ΦNN (t, e jω )Ŵ(e jω ) } PSD depends on: Input signal PSD Noise signal PSD Signal ATF ratios 17

DTF-GSC Performance Analysis DTF-GSC Analysis Output power density 1 microphones linear array Delay only ATFs for speech and noise Maintaining desired signal at θ = 9 o Blocks directional noise from θ = 12 o Blocks interference from θ = 6 o 18

PSD deviation DTF-GSC Analysis DEV(t, e jω ) = Φ s 1 yy(t, e jω ) F(e jω ) 2 A 1 (e jω ) 2 Φ s1 s 1 (t, e jω ) 2 1 microphones linear array Delay only ATFs for speech Φ yy [db] 2 6 Directional noise field Desired signal from θ = 9 o Upto 4dB distortion in frequencies below 3Hz 1 1 2 3 4 Frequency[Hz] 87 88 89 9 91 92 93 θ [deg] 19

Noise Reduction DTF-GSC Analysis NR(t, e jω ) = Φ n yy(t, e jω ) F(e jω ) 2 D 1 (e jω ) 2 Φ nn (t, e jω ) 1 microphones linear array Delay only ATFs for speech Directional noise signal from θ = 12 o Φ yy [db] 2 4 6 12 124 123 122 121 12 119 118 θ [deg] 117 116 11 4 3 2 1 Frequency[Hz] db attenuation in the noise direction 2

Interference Reduction DTF-GSC Analysis NIR(t, e jω ) = Φ s 2 yy (t, e jω ) F(e jω ) 2 B 1 (e jω ) 2 Φ s1 s 1 (t, e jω ) 1 microphones linear array Delay only ATFs for speech Directional noise field Interference signal from θ = 6 o db attenuation in the interference direction Φ yy [db] 2 4 6 6 64 63 62 61 6 9 θ [deg] 8 7 6 4 3 2 1 Frequency[Hz] 21

Experimental study Experimental study Speech signal simulated ATFs in two noise fields: directional noise diffused noise Sonograms Performance evaluation 22

DTF-GSC AND APP. TO JOINT NR AND AEC Experimental study Sonograms (a) (c) 4 (e) 4 4 3 3 4 3 3 2 3 2 2 1 2 1 1 3 2 3 2 2 1 2 1 1 1 2 3 4 Time [Sec] 6 7 8 1 2 3 4 Time [sec] 6 7 8 1 2 3 4 2 1 2 1 1 3 2 3 2 2 1 2 1 1 1 8 3 2 3 2 2 1 2 1 1 1 7 4 1 6 4 3 4 Frequency [Hz] 2 Frequency [Hz] 3 8 3 3 4 2 7 4 3 6 (f) 3 3 4 Time [sec] 4 Frequency [Hz] 1 (d) 3 4 Time [sec] 2 1 4 3 2 1 (b) 2 3 2 4 1 3 2 1 1 4 1 4 3 4 Frequency [Hz] 4 Frequency [Hz] Frequency [Hz] 3 3 4 1 2 3 4 Time [sec] 6 7 8 1 2 3 4 Time [sec] 6 23 7 8

Performance evaluation Experimental study Noise and interference reduction in directional noise field (top) diffused noise field (bottom) Input Output of Output of Output of MBF BM DTFGSC S 1 NR S 1 S 2 R S 1 NR S 1 S 2 R S 1 NR S 2 NR S 1 NR S 1 S 2 R 11.3 2.3 13.8 16.9-3.9-4. 34.6 12.7 12.7 2.3 17.4 2-3.8-3. 2.9 22.6 24

Application Application: joint noise reduction and echo cancellation M 3 microphones One desired speech signal One competitive speech signal (echo) One directional/ambient noise signal Arbitrary acoustic transfer functions (ATFs) NOISE SOURCE REMOTE SPEECH SIGNAL MICROPHONES ARRAY DESIRED SPEECH SIGNAL AMBIENT NOISE SPEECH ENHANCEMENT SYSTEM z m (t) = a m (t) s 1 (t) + b m (t) e(t) + n m (t) m = 1,..., M 2

Cascade scheme Application AEC-BF: multichannel AEC followed by beamformer The beamformer inputs contain less echo The multichannel AEC deteriorates due to noise BF-AEC: beamformer followed by single channel AEC AEC contains less noise in its input The beamformer suppresses echo, although AEC has better performance AEC suffers from fast variations in echo path due to the beamformer 26

Application Z 1 (t, e jω ) + Z AEC 1 (t, e jω ) G E 1 (t, ejω ) Z 2 (t, e jω ) G E 2 (t, ejω ) + Z AEC 2 (t, e jω ) W Y MBF (t, e jω ) + Y (t, e jω ) Z M (t, e jω ) + Z AEC M (t, ejω ) G E M (t, ejω ) U 2 (t, e jω ) G N 2 (t, ejω ) Y NC (t, e jω ) E(t, e jω ) H U 3 (t, e jω ) G N 3 (t, ejω ) U N M (t, ejω ) G N M (t, ejω ) TF GSC 27

Application Z 1 (t, e jω ) Z 2 (t, e jω ) Y MBF (t, e jω ) Y BF (t, e jω ) Y (t, e jω ) 11 W + Y NC (t, e jω ) + Z M (t, e jω ) U 2 (t, e jω ) G N 2 (t, ejω ) H U 3 (t, e jω ) 11 11 11 U M (t, e jω ) G N 3 (t, ejω ) G N M (t, ejω ) TF GSC E(t, e jω ) G E (t, e jω ) 28

ETF-GSC scheme Application Matched beamformer (MBF) Maintains desired signal Blocking unit (BU) Blocks both desired and echo signals Adaptive noise and echo canceller (ANEC) Noise canceller and echo canceller work in parallel Echo reference signal is used to create more interference reference signals to the ANEC 29

Application Z 1 (t, e jω ) Z 2 (t, e jω ) Z 3 (t, e jω ) F 11 Y MBF (t, e jω ) + Y (t, e jω ) Z M (t, e jω ) 11 Y NC (t, e jω ) U 2 (t, e jω ) + U 2 (t, ejω ) G N 2 (t, ejω ) 11 11 H U 3 (t, e jω ) 11 U M (t, e jω ) + + 11 11 U 3 (t, ejω ) G N 3 (t, ejω ) 11 U M (t, ejω ) G N M (t, ejω ) G H 2 (t, ejω ) G E 1 (t, ejω ) 11 11 H G H 3 (t, ejω ) 11 11 F G E 2 (t, ejω ) 11 11 11 Y EC (t, e jω ) G H M (t, ejω ) G E M (t, ejω ) E(t, e jω ) 3

ETF-GSC scheme Application Estimation MBF estimation is done as in the TF-GSC (during single talk) BM estimation is done as in the TF-GSC (during single talk) Noise canceller adapts during noise only frames Echo canceller adapts during echo frames 31

Performance evaluation Application Input Tested algorithm Echo suppression Noise reduction SNR SER AEC BF Total Total AEC-BF 18. 2.1 2.6 23. 1 BF-AEC.4 6.2 11.7 24.7 ETF-GSC 37.7 23.1 AEC-BF 3.9.4 4.4 23.3 1 BF-AEC 1.7 6.8 8.6 24.4 ETF-GSC 18.1 23. AEC-BF 12.1 1. 13.1 23.9 1 1 BF-AEC 4.6.8 1.4 24. ETF-GSC 29.7 23.6 32

DTF-GSC AND APP. TO JOINT NR AND AEC Application Sonograms (a) (c) 4 4 3 3 4 4 3 4 3 2 3 2 2 1 2 Frequency [Hz] Frequency [Hz] 3 1 1 4 3 2 3 2 2 1 2 1 1 1 1 1 2 3 4 Time [Sec] 6 7 8 1 2 3 (b) 4 Time [sec] 6 7 8 (d) 4 4 3 3 4 4 3 4 3 2 3 2 2 1 2 1 1 Frequency [Hz] Frequency [Hz] 3 4 3 2 3 2 2 1 2 1 1 1 1 1 2 3 4 Time [sec] 6 7 8 1 2 3 4 Time [sec] 6 7 8 33

Conclusions Conclusions DTF-GSC algorithm GSC structure: modified MBF and BM New identification procedure for DT frames Application: BSS problem of convolutive mixtures and additive noise DTF-GSC performance analysis General expression for the output power spectral density Expected deviation imposed on the desired signal Noise reduction Interference reduction 34

Conclusions Conclusions ETF-GSC Joint echo cancellation and noise reduction in a reverberated environment TF-GSC based solution: BU and ANEC blocks (reference signal incorporated) Performance evaluation (during DT) and comparison to cascade schemes 3

Future Research Future Research Dual nonstationary speech signals in the presence of echo and stationary noise NOISE SOURCE MICROPHONES ARRAY AMBIENT NOISE SPEECH ENHANCEMENT SYSTEM COMPETING SPEECH SIGNAL ECHO SIGNAL DESIRED SPEECH SIGNAL 36

Future Research Speech enhancement using the DTF-GSC and postfiltering Less significant noise reduction is obtained in diffused noise field Postfiltering: known methods or using noise reference signals Future Research DTF-GSC using Relative Transfer Function (RTF) system identification Weighted least squares optimization criterion Smaller error variance and faster convergence Joint noise reduction and echo cancellation using the ETF-GSC and residual echo cancellation Misadjusted AEC filters and finite filters length Linear prediction error filter removes the short-term correlation of the residual echo Whitened residual echo is cancelled by a noise reduction filter 37