Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment

Size: px

Start display at page:

Download "Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment"

Eustacia Bruce
5 years ago
Views:

1 Chinese Journal of Electronics Vol.25, No.3, May 2016 Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment WANG Xiaofei, GUO Yanmeng, FU Qiang and YAN Yonghong (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing , China) Abstract In this paper, a multi-channel post-filtering approach in reverberant environment based on detection and estimation scheme is presented. A modified Signal presence probability (SPP), which is in consideration of reverberation, is proposed with a novel estimator Direct-toreverberate ratio (DRR) to adapt to distant-talking scene. SPP is known a key estimator to instruct the updating of transient noise or residual directional interference and form gain function in the time-frequency domain, consequently a new desired signal detection scheme is proposed to improve its accuracy. Appropriate spectral enhancement technique is applied to the noisy speech signal taking advantage of the modified SPP estimator. The proposed multi-channel post-filtering is tested in different nonstationary noisy and reverberant environments. Experimental results show that it achieves considerable improvement on signal preservation of the desired speech with more noise reduction over the comparative algorithms. Key words Speech enhancement, Multi-channel post-filtering, Reverberation robust, Signal detection. I. Introduction Multi-channel system has been proved effective to improve the performance of speech enhancement algorithm in noisy and reverberant environment [1,2]. Classical Generalized sidelobe canceller (GSC), suggested by Griffiths and Jim [3], reasonably suppresses directional interference using an unconstrained LMS-type algorithm for purpose of cancelling residual noise at Fixed beamformer (FBF) output given reference signals from Block matrix (BM). However, GSC algorithm suffers from signal cancellation problem because of channel leakage especially in reverberant environment. A single direction of arrival could not ensure that there is no steering vector error since reflections from different directions are captured by sensors. To overcome this problem, Gannot proposed general Transfer function GSC (TF-GSC) which modified the reference signals by estimating Relative TFs (RTFs) between pairs of sensors [4]. Based on the structure of TF-GSC, Multiplicative TF-GSC (MTF-GSC) [5] and Convolutive TF- GSC (CTF-GSC) [6] are proposed to obtain more accurate reference signals from BM to adapt to reverberant case. However, beamforming alone can not supply sufficient noise reduction when presented with nondirectional noise such as diffuse noise [7,8] and residual directional noise. Thus, post-filtering is required which was firstly proposed by Zelinski [9]. An adaptive Wiener filter employing auto- and cross-correlation functions of observed signals is derived for enhancement. And McCowan generalized it based on the priori knowledge of noise field for the purpose of overcoming underlying assumption that noise components at sensors are mutually uncorrelated [10].Considering the sound field is not homogeneous due to air attenuation, Hu et al. propose a post-filter based on spatial coherence measure, and a bias compensated solution [11].However, it relies on a target signal detection procedure, which is not easy to accurately carry out, to estimate the noise coherence function. To deal with highly non-stationary noise as well as residual directional noise, Cohen firstly proposed a multi-channel post-filtering approach mainly based on Transient beam-to-reference ratio (TBRR) [12]. The approach is tightly coupling with GSC beamformer by minimizing the Log-spectral amplitude (LSA) distortion. Both the beamformer output signals and reference Manuscript Received Apr. 15, 2014; Accepted Sept. 9, This work is supported by the National Natural Science Foundation of China (No , No , No ), the Strategic Priority Research Program of the Chinese Academy of Sciences (No.XDA , No.XDA ), the National High Technology Research and Development Program of China (863 Program) (No.2012AA012503) and the Chinese Academy of Sciences Priority Deployment Project (No.KGZD-EW-103-2). c 2016 Chinese Institute of Electronics. DOI: /cje

2 Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant 513 signals were taken advantage of, resulting from that the desired signal component is stronger at the beamformer output than at other reference noise signals. Gannot combined it with TF-GSC beamformer to achieve better performance in highly reverberant environment [13]. Essentially, multi-channel post-filtering suggested by Cohen is based on signal detection so that highly nonstationary and residual directional noise estimation is controlled by the SPP estimator. And then an Optimally modified LSA (OM-LSA) gain function is obtained for each frequency bin. However, the acquired SPP obtained from TBRR on each frequency bin is based on experienced estimation which would have brought target signal distortion. And in highly reverberant environment, distortion of high-frequency band is audible especially when overlapping scene occurs. Despite of this, a more accurate SPP estimator is helpful to alleviate the problem of signal cancellation in the adaptive beamforming procedure [14].The step-size of the adaptive filter is slow down when SPP is large to preserve the target signal. In this paper, an improved multi-channel post-filtering algorithm adapting to different reverberant environments is proposed. In order to accurately obtain the estimation of the highly non-stationary and residual directional noise, SPP, which is often seen as a controller, is deduced according to reasonable signal detection scheme. Since the spatial filtering response of each frequency bin is different, threshold determining the tradeoff between false alarm and detection probability should be adaptively confirmed rather than using empirical preset parameter of the whole frequency band. Furthermore, DRR is introduced into the signal detection scheme so that the proposed algorithm is more robust to reverberation as reflections would have reduced the discrimination between desired speech and competing interference. This paper is organized as follows. In Section II, signal model is introduced and post-filtering suggested by Cohen is briefly reviewed. The modified SPP combining a DRR tracking procedure is presented in Section III. In Section IV, experimental results are shown with comparative algorithms. Conclusions are drawn in Section V. II. Multi-channel Post-Filtering 1. Signal model Assume that it is a linear array with M sensors. Let s(t) denote the desired speech signal, i(t) denote the directional interference and n m (t) is the stationary background noise. Observed signal x m (t) atsensorm(m =1,,M) is given by x m (t) =h source m (t) s(t)+h interf m (t) i(t)+n m (t) (1) where h source m (t), h interf m (t) denote the Room impulse responses (RIR) from target source and interference to sensor m, denotes convolution. In general, RIR can be separated into two parts: coherent part d m (t) (direct sound) and diffuse part r m (t) (reverberation) shown as follows. h source m h interf m (t) =d source (t)+r source m m (t), m =1,,M (2) (t) =d interf m (t)+rm interf (t), m =1,,M Take Eq.(2) into Eq.(1) and analyse the observed signals using the Short-time Fourier transform (STFT). Signals in time-frequency domain are denoted by upper case. Due to the isotropic property, diffuse parts of desired signal and interference are combined together. Observed signal at time-frame l and frequency bin k is as follows. X m (l, k) =Dm source (l, k)s(l, k)+dm interf (l, k)i(l, k) +R m (l, k)+n m (l, k) (3) In the vectorial form, we have X =[X 1 (l, k) X 2 (l, k) X M (l, k)] T (4) 2. Transient beam-to-reference ratio TBRR is an useful statistic to detect the desired speech signal with highly non-stationary noise, especially with the residual directional noise of beamformer output. It s obtained from signal stream of GSC beamformer. Let F and B denote the FBF and BM, and W denotes multichannel adaptive reference canceller. So the power spectrums of GSC beamformer output Y (λ YY ) and blocking matrix output U (λ UU )aregivenby λ YY =[F BW] H XX H [F BW] (5) λ UU = B H XX H B (6) where the superscript H represents conjugate transpose. And TBRR of each time-frequency unit is defined by Λ(l, k) = λ YY(l, k) λ UU (l, k) SY(l, k) MY(l, k) (7) max{su i (l, k) MU i (l, k)} where SY and SU denote the smoothed power spectrum of outputs from fixed beamformer and block matrix, MY and MU are the pseudo-stationary noise estimators using MCRA method [15]. Besides TBRR, another statistics which is called Local nonstationarity (LNS) is proved efficient to remove stationary background noise in most signal-channel speech enhancement algorithms. The statistics are defined as follows. Υ Y (l, k) = SY(l, k) MY(l, k), Υ U(l, k) =max[ SU i(l, k) MU i (l, k) ] (8) 3. Multi-channel post-filtering using SPP Multi-channel post-filtering technique, which uses a multi-channel soft signal detection, is operated on the beamformer output by multiplying the OM-LSA gain

3 514 Chinese Journal of Electronics 2016 function [12]. Specifically, the clean signal STFT is estimated by Ŝ(l, k) =G(l, k)y (l, k) (9) where the OM-LSA gain function G(l, k) is reached by G(l, k) =(G EM (l, k)) P (l,k) 1 P (l,k) Gmin (10) G min is the minimum gain allowed (typically 20dB) and G EM (l, k) is Ephraim-Malah (EM) gain of the LSA estimator. Meanwhile, the noise spectrum power values are recursively averaged using a key smoothing parameter which is given by ˆα d (l, k) =α d +(1 α d )P (l, k) (11) where 0 <α d < 1isaconstantandP (l, k) is the SPP which suggests the probability of desired speech exists in the corresponding time-frequency unit. SPP is calculated based on Bayesian Theory under the assumed statistical model. And SPP is expressed as follows. Q(l, k) γ(l, k)ɛ(l, k) P (l, k) =[1+ (1+ɛ(l, k))exp( 1 Q(l, k) 1+ɛ(l, k) )] 1 (12) where γ(l, k) is the a posteriori SNR and ɛ(l, k) is the a priori SNR calculated using a Decision-Direct approach [16]. Q(l, k) istheapriorisignal absence probability which is based on signal detection scheme through estimating LNS and TBRR. III. Modified Signal Presence Probability As mentioned above, SPP P (l, k) plays a key role instructing the transient noise estimation and forming gain function of each frequency bin. The apriorisignal absence probability Q(l, k) is obtained from signal detection procedure through comparing LNS and TBRR statistics with a preset threshold which is the tradeoff between detection probability and false alarm [12]. In this section, a novel approach for desired signal detection is proposed, where the detection thresholds of TBRR statistics are adaptively confirmed from perspective of spatial filtering rather than empirical setting. In consideration of effects of reverberation, detection threshold combining DRR is proposed leading to a modified SPP estimation. 1. Detection threshold of TBRR in free sound field In the original GSC beamformers, the BM parts of linear array is acquired by subtracting observed signals on adjacent sensors. In order to simplify the implementation, we assume that the distance between adjacent sensors are same which are denoted by d mic. Considering the limited ability of GSC structure to suppress the transient noise in highly reverberant environment, we just assume W = 0. And for the purpose of generalizing it to n-channel sensor array and synthesizing Eq.(7) which only one reference is picked out, two elements based TBRR thresholds are 1 2 ]T and B =[1 1] T.All deducted where F =[ 1 2 these assumptions are proved appropriate from experiments later. In free sound field, speech signal far field comes from the marginal Direction-of-arrival (DOA) θ, whichisthe angle between the normal line of the broadside microphone array and the marginal DOA. Its value decides the acceptance region (a sector with 2θ angle), in which the signals are seen as target signal. With larger θ, the region is bigger. Rewritting X we have, X = DS =[D source v D source v+1 ] T S, v =1,,M 1 (13) Let Λ free (k, θ) denote the threshold in free sound field, f(k) is the central frequency of each frequency bin and c is the sound velocity in the air. Take Eqs.(5), (6) and Eq.(13) into Eq.(7), we have Λ free (k, θ) = F H DD H F B H DD H B = λ DD 1+e j2πf(k)dmic sin(θ)/c 2 (14) λ DD 2(1 e j2πf(k)dmic sin(θ)/c ) 2 = 1+cos(2πf(k)d mic sin(θ)/c) 4(1 cos(2πf(k)d mic sin(θ)/c)) where λ DD is power spectrum of direct part of target signal. Therefore, θ is the only parameter introduced to confirm the threshold for desired signal detection. Its value decides the tradeoff between detection probability and false alarm. From Eq.(14) we can see that the threshold Λ free (k, θ) monotonously decreases from low frequency band to high frequency band. 2. Detection threshold of TBRR in reverberant field In reverberant environment, reverberation should be considered and X is written as follows. X = DS + R =[D source v D source v+1 ] T S +[R v R v+1 ] T (15) Let Λ reverb (k, θ) denote marginal case of kth frequency bin. λ RR Γ RvR v+1 Λ reverb (k, θ) = F H DD H F + F H RR H F B H DD H B + B H RR H B ( ) RR H 1 Γ RvR = λ v+1 RR ΓR vr v+1 1 (16) (17) is the power spectrum of reverberate part and represents the cross-correlated coefficient be-

4 Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant 515 tween sensors. Λ reverb can be expressed by Λ free as follows. Λ reverb (k, θ) = α(k)ɛ(l, k)+β(k) α(k)ɛ(l, k)+1 Λ free(k, θ) (18) ɛ(l, k) is the DRR of each time-frequency unit which is given by ɛ(l, k) = λ DD (19) λ RR Reverberation is often regarded as diffused so that =sinc(2πf(k)d mic /c). Then we have Γ RvR v+1 α(k) = 1 cos(2πf(k)d mic sin(θ)/c) 2(1 Γ RvR v+1 ) β(k) = 1+Γ R vr v Γ RvR v+1 4Λ free (k, θ) From Eq.(18), we know that (20) (21) lim Λ reverb = Λ free (22) DRR + 3. DRR tracking of desired speech One of the benefits using multichannel system is that coherent property between channels can be analysed. A DRR estimation procedure is proposed by Jeub which is based on the assumption that the direct part and reverberate part are separately modeled by coherent signal and diffuse signal [17]. Given the relationship between magnitude square coherence function MSC(l, k) and DRR, ɛ(l, k) can be expressed as follows. ɛ(l, k) = λ DD = sinc(2πf(k)d mic/c) 2 MSC(l, k) λ RR MSC(l, k) 1 (23) Φ XvX MSC(l, k) = v+1 (l, k) 2 (24) Φ XvX v (l, k)φ Xv+1X v+1 (l, k) Φ XvX v+1, Φ XvX v and Φ Xv+1X v+1 are the smoothed crossand auto-power spectrum density corresponding to the two channels picked out above. DRR is a synthesized measurement of reflecting condition and position of desired speech. Assume that RIR of desired signal is not changing severely, and thresholds of TBRR proposed above should not excessively rely on DRR estimation which can be regarded as an acoustic perceptual controller. Thus, we propose an adaptive TBRR threshold based on global DRR of frame l, which is denoted by d g (l). It is obtained by full-band averaging and recursively smoothing as follows. d g (l) =μd g (l 1) + (1 μ)mean(ɛ(l, k)) (25) Operator mean denotes arithmetic mean and smoothing parameter 0.8 < μ < 1. The tracking strategy for DRR of desired signal is given by d(l) =max(d g (l),ϱd(l 1)) (26) where ϱ is an extremely small constant (typically 10 6 ) acting as a controller to avoid imbalance of global DRR when unexpected impulse emerges. Combining d(l) with Eq.(18) and to guarantee the monotonicity of threshold from low frequency to high frequency part, we have the signal detection threshold of TBRR which is given by ˆΛ reverb (k, θ) = ˆγd(l)+ ˆβ ˆγd(l)+1 Λ free(k, θ) (27) ˆγ =ˆαδ (28) where ˆα and ˆβ separately denote the arithmetic mean of α(k) andβ(k). δ is bias compensator for DRR in consideration of error estimate. It is worthy mentioning that the key parameter ˆΛ reverb (k, θ) for signal detection is a constant based on experiment in Ref.[12] resulting in highfrequency distortion. 4. Modified SPP The modified SPP can be obtained from Eq.(12) where the apriorisignal absence probability Q(l, k) is needed. With Eq.(27), we follow the procedure of the a priori signal absence probability estimation in Ref.[18] where the global likelihood of signal presence to Λ(l, k) isused to improve the discrimination between wideband source and interfering transients. The global likelihood ˆη global is based on full-band average of local likelihood η(l, k)which is given by 1, if Υ U (l, k) < Υ 0 0, if Υ Y (l, k) < Υ 0 1, if Λ(l, k) > η(l, k) = ˆΛ reverb (k, θ) 0, if Λ(l, k) < ˆΛ reverb (k, θ)/2 Λ(l, k) ˆΛ reverb (k, θ)/2 ˆΛ reverb (k, θ) ˆΛ reverb (k, θ)/2, elsewhere (29) The procedure of obtaining the modified SPP can be shown by Fig.1. Q(l, k) is obtained using the soft-decision information of both TBRR and LNS through signal detection procedure [18] and further the SPP p(l, k) is derived using Eq.(12). Since the key detection thresholds for estimating the presence of desired signal from look direction are modified combining DRR which is blindly estimated, a reverberation robust multi-channel post-filtering technique is achieved based on the modified SPP. IV. Evaluations 1. Experimental configuration The microphone array used in this work is composed of 4 omnidirectional MEMS microphones in broadside orientation. The distance between the microphones is set to be 5cm. The system is implemented under a sampling rate of 8kHz.

5 516 Chinese Journal of Electronics 2016 The experiments took place in a office room of 6m 5m 3m. Two cased are taken into consideration. In the first case, two interferences (a competing speaker and a gauss white noise source) are located in 90 and 45 of the array respectively with reverberation time of 300ms and 600ms (T 60 ) where the reverberation time is controlled with sound absorption material. In the other case, the two interference are located in 30 and 20 for the reason that the spatial distinction reduces due to the small angle between target signal and interference. The experimental configuration is shown by Fig.2. The speech source is four sentences from TIMIT database and the multi-channel clean speech is generated by computer simulation using Image method [19] so that clean speech signal can be obtained for objective evaluations. The input microphone signals were generated by mixing speech and noise components with different global SNR levels ( 5dB, 0dB, 5dB, 10dB). All the sound sources are 1m away from the sensor array. The dereverberation ability is not taken into consideration. Fig. 1. Derivation of the modified SPP using DRR tracking strategy Fig. 2. Configuration of experiments in the reverberant rooms For comparison, the following post-filtering techniques are applied to the beamformer (Griffiths GSC [3] ) output, which is considered to be the baseline for comparison. McCowan s post-filtering algorithm [10]. Cohen s post-filtering algorithm [12]. Proposed post-filtering algorithm using modified SPP. In the proposed post-filtering algorithm, the frame length is 32ms with a 50% overlap where 256 points windowed FFT is performed. θ is chosen 5 and μ =0.87, δ = Objective evaluation measures The results of the above methods are evaluated by the following objective speech quality measures: Segmental SNR, Noise reduction (NR) and Log-spectral distance (LSD). 1) Segmental SNR [20] Segmental SNR, which is in decibels, is defined by SegSNR = 1 L 1 SNR l (30) L l=0 N 1 = 10 s 2 (n + ln L 1 2 ) n=0 log L 10 N 1 l=0 [s(n + ln 2 ) ŝ(n + ln 2 )]2 n=0 where L represents the number of frames in the signal, and N = 256 is the number of samples per frame (correspnding to 32ms frames and 50% overlap). The SNR at each frame SNR l is limited to perceptually meaningful range between 35 and 10dB. This prevents the segmental SNR measure from being biased in either a positive or negative direction due to a few silence or unusually high SNR frames. This measure takes into account both residual noise and speech distortion and higher value means better performance. 2) Noise reduction (NR) [12] This measure compares the noise level in noise level recorded by the first microphone to the enhanced signal, which is defined by (in decibels). NR = 10 L log 10 l L N 1 n=0 N 1 x 2 1(n + ln 2 ) (31) ŝ 2 (n + ln 2 ) n=0 where L denotes the set of frames that contain only noise, and L is cardinality. The algorithm achieves better performance with larger NR. 3) Log-spectral distance (LSD) [12] LSD is defined as follows where lower values represent less signal distortion. LSD= 10 L 1 N/2 1 { [log L N/ AS(l, k) log AŜ(l, 10 k)]2 } l=0 k=0 (32)

6 Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant 517 where AS(l, k) =max{ S(l, k) 2,δ} is the spectral power, clipped such that log-spectrum dynamic range is confined to about 50dB (that is δ =10 50/10 max{ S(l, k) 2 }). 3. Discussion Fig.3 and Fig.4 show the experimental results of the average segmental SNR, NR and LSD respectively. The superiorities of the proposed noise reduction method to the other algorithms are discussed in the following paragraphs. Improvement on speech quality of traditional GSC beamformer is limited. With the reverberation time getting larger from 300ms to 600ms, the noise reduction Fig. 3. Performance comparison under different noise level among different algorithms in different reverberant conditions where the two interferences (a competing speaker and a gauss white noise source) are located in 90 and 45.(a) Segmental SNR; (b) NR;(c) LSD(T 60 = 300ms); (d) SegmentalSNR;(e) NR;(f) LSD(T 60 = 600ms) Fig. 4. Performance comparison under different noise level among different algorithms in different reverberant conditions where the two interferences (a competing speaker and a gauss white noise source) are located in 30 and 20.(a) Segmental SNR; (b) NR;(c) LSD(T 60 = 300ms); (d) SegmentalSNR;(e) NR;(f) LSD(T 60 = 600ms)

7 518 Chinese Journal of Electronics 2016 amount is decreasing resulting from that the adaptive noise cancellation filter shows powerless to the diffuse noise which late reverberation is often modeled by. With the existence of reverberation, channel leakage of both FBF and BM leads to a worse situation. And this limitation is in sore need of post-filtering techniques suppressing the residual directional interference and diffuse noise. Compared to the adaptive beamformer, post-filtering suggested by McCowan achieves better performance with a little higher noise reduction and lower signal distortion. Since we choose the reverberant speech signal as clean signal for objective evaluation, the high frequency part attenuates after propagation. Its highlight would have been affected. Meanwhile, post-filtering proposed by Cohen gives a more aggressive performance. And using the modified SPP estimator in consideration of reverberation, the proposed speech enhancement achieves a better performance on segmental SNR, NR and LSD. The segmental SNR, synthesizing the ability to suppress noise and bring signal distortion shown by Fig.3/Fig.4(a) and(d), indicates that the proposed algorithm outperforms the others with a competing interference and background noise in different reverberant environment. However, at the same time getting more noise reduction shown by Fig.3/Fig.4(b) and(e), from Fig.3/Fig.4(c) and(f), we can see that both the proposed and Cohen s post-filtering produce more speech distortion than the beamformer output in high SNR cases. It is still because we choose the reverberant speech signal as clean signal for objective evaluation. Some reverberation which is diffuse-like will be removed after post-filtering so that distortion may occur. As mentioned above, the dereverberation ability is not taken into consideration so that the effects of diffuse late reverberation suppression are reflected in LSD especially in high SNR cases. But to our knowledge, the suppression of reverberation especially the isotropic late reverberation is beneficial to either auditory perception or speech recognition. And furthermore, the proposed one still outperforms the one Cohen suggested which we generalized. Compared with Cohen s post-filtering, we generalize the situation by introducing a single parameter θ, which controls the tradeoff between false alarm and detection probability. θ may be considered as the steering error factor that designates the region of acceptance and this property is helpful in realistic scenario when target speech signal has a little movement. With parameter θ getting larger, the region of acceptance is enlarged. From Eq.(27), we can see that the detection threshold decreases from low frequency part to high frequency part and this is the main reason that the proposed algorithm outperforms the one Cohen suggested. And with the introduction of DRR estimation, the proposed post-filtering maintains noise reduction in high level though spatial distinction is reduced when reverberation is strong. As a result, the proposed multi-channel speech enhancement algorithm provides a robust performance in different reverberant environment with directional interference and stationary diffuse noise. V. Conclusion A speech enhancement algorithm using multi-channel post-filtering is presented in this paper. The post-filtering algorithm uses modified SPP which is based on improved signal detection scheme aiming at precisely controlling the rate of recursive averaging for obtaining the noise spectrum estimate and gain function of each time-frequency unit. Experimental results indicate that the proposed post-filtering algorithm achieves performance improvement with more noise reduction and less desired signal distortion when competing interference or transient noise is existing. Furthermore, the DRR tracking procedure introduced to the desired signal detection scheme makes the algorithm keep decent performance in different reverberant environments. References [1] J. Benesty, J. Chen and Y. Huang, Microphone Array Signal Processing, Springer, [2] M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications, Springer, [3] L.J. Griffiths and C.W. Jim, An alternative approach to linearly constrained adaptive beamforming, IEEE Transactions on Antennas and Propagation, Vol.30, No.1, pp.27 34, [4] S. Gannot, D. Burshtein and E. Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Transactions on Signal Processing, Vol.49, No.8, pp , [5] Y. Avargel and I. Cohen, On multiplicative transfer function approximation in the short-time fourier transform domain, IEEE Signal Processing Letters, Vol.14, No.5, pp , [6] R. Talmon, et al., Convolutive transfer function generalized sidelobe canceler, IEEE Transactions on Audio, Speech, and Language Processing, Vol.17, No.7, pp , [7] J. Bitzer, et al., Theoretical noise reduction limits of the generalized sidelobe canceller (GSC) for speech enhancement, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.5, pp , [8] S. Gannot, et al., Theoretical analysis of the general transfer function GSC, Proc. of Int. Workshop on Acoustic Echo and Noise Control (IWAENC), pp , [9] R. Zelinski, A microphone array with adaptive post-filtering for noise reduction in reverberant rooms, Proc. of International Conference on Acoustics, Speech, and Signal Processing, pp , [10] I.A. McCowan and H. Bourlard, Microphone array post-filter based on noise field coherence, IEEE Transactions on Speech and Audio Processing, Vol.11, No.6, pp , [11] J. Hu and M. Lee, Multi-channel post-filtering based on spatial coherence measure, Signal Processing, Vol.105, No.12, pp , 2014.

Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant 519 [12] I.

Cohen, Speech enhancement based on the general transfer function GSC and postfiltering, IEEE Transactions on Speech and Audio Processing, Vol.12, No.6, pp.561 571, 2004. [14] K. Li, Q. Fu, and Y.

Cohen and B. Berdugo, Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal Processing Letters, Vol.9, No.1, pp.12 15, 2002. [16] Y. Ephraim and D.

[17] M. Jeub, C. Nelke, C. Beaugeant, et al., Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals, Proc.

8 Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant 519 [12] I. Cohen, Multichannel post-filtering in nonstationary noise environments, IEEE Transactions on Signal Processing, Vol.52, No.5, pp , [13] S. Gannot and I. Cohen, Speech enhancement based on the general transfer function GSC and postfiltering, IEEE Transactions on Speech and Audio Processing, Vol.12, No.6, pp , [14] K. Li, Q. Fu, and Y. Yan, Speech enhancement using robust generalized side lobe canceller with multi-channel postfiltering in adverse environments, Chinese Journal of Electronics, Vol.21, No.1, pp.85 90, [15] I. Cohen and B. Berdugo, Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal Processing Letters, Vol.9, No.1, pp.12 15, [16] Y. Ephraim and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.32, No.6, pp , [17] M. Jeub, C. Nelke, C. Beaugeant, et al., Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals, Proc. of 19th European Signal Processing Conference (EUSIPCO 2011), pp , [18] I. Cohen and B. Berdugo, Two-channel signal detection and speech enhancement based on the transient beam-to-reference ratio, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.5, pp , [19] J.B. Allen and D.A. Berkley, Image method for efficiently simulating small-room acoustics, The Journal of the Acoustical Society of America, Vol.65, No.4, pp , [20] S.R. Quackenbush, T.P. Barnwell and M.A. Clements, Objective Measures of Speech Quality, Prentice Hall, WANG Xiaofei was born in He received the B.E. degree in electronic and information engineering from Huazhong University of Science and Technology. He is now a Ph.D. candidate of Institute of Acoustics, Chinese Academy of Sciences. His research interests include speech enhancement and microphone array processing ( wangxiaofei@hccl.ioa.ac.cn) GUO Yanmeng was born in She received the B.E. degree from the Dongnan University, in 1999, the M.S. degree and the Ph.D. degree in signal and information processing from Chinese Academy of Sciences, in 2002 and She is now an associate researcher in Institute of Acoustics, Chinese Academy of Sciences. Her research interests include noise reduction, voice activity detection and speech recognition. ( guoyanmeng@hccl.ioa.ac.cn) FU Qiang (corresponding author) received the Ph.D. degree in electronic engineering from Xidian University, Xian, in From 2001 to 2002, he was working as a senior research associate in Center for Spoken Language Understanding (CSLU), OGI School of Science and Engineering at Oregon Health & Science University, Oregon, USA. From 2002 to 2004, he was working as a senior postdoctoral research fellow in Department of Electric and Computer Engineering, University of Limerick, Ireland. He is currently a professor in Institute of Acoustics, Chinese Academy of Sciences, China. His research interests are in speech analysis, microphone array processing, far-distant speech recognition, audio-visual signal processing, machine learning for signal processing, etc. Dr. Qiang Fu is a member of IEEE Signal Processing Society. YAN Yonghong received the Ph.D. degree in computer science and engineering from Oregon Graduate Institute of Science Engineering (OGI) in Before joined the Key Laboratory of Speech Acoustics and Content Understanding, he was the chief architect of Human Computer Interface in Intel, director of Intel China Research Center and associate director of CSLU,OGI.Nowheisthedirectorofthe Key Laboratory of Speech Acoustics and Content Understanding.

IN REVERBERANT and noisy environments, multi-channel

IN REVERBERANT and noisy environments, multi-channel 684 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 6, NOVEMBER 2003 Analysis of Two-Channel Generalized Sidelobe Canceller (GSC) With Post-Filtering Israel Cohen, Senior Member, IEEE Abstract