Chinese Journal of Electronics Vol.21, No.1, Jan. 2012 Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse Environments LI Kai, FU Qiang and YAN Yonghong (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China) Abstract In this paper, we propose a speech enhancement algorithm which has the feature of interaction between adaptive beamforming and multi-channel postfiltering. A novel subband feedback controller based on speech presence probability is applied to Generalized Sidelobe Canceller algorithm to obtain a more robust adaptive beamforming in adverse environment and alleviate the problem of signal cancellation. A multi-channel postfiltering is used not only to further suppress diffuse noises and some transient noises, but also to give the speech presence probability information in each subband. Experimental results show that the proposed algorithm achieves considerable improvement on signal preservation of the desired speech in adverse noise environments over the comparative algorithms. Key words Speech enhancement, Microphone array, Generalized sidelobe canceller, Adaptive filter, Postfiltering. I. Introduction Microphone array has been widely used to improve the performance of speech communication and Automatic speech recognition (ASR) systems in adverse noise environments because of their effectiveness in enhancing the quality of the captured speech [1,2]. Compared with single channel systems, a substantial gain in performance is obtainable due to the spatial filtering capability to suppress interfering signals coming from undesired directions. In practical environments, there are both directional noises which have some determinable directions (e.g. competitive speaker s voice or background music) and diffuse noises which come from all directions due to the diffuse reflections of the room. To suppress directional noises, a lot of algorithms based on beamformer have been proposed [1,2]. Van Veen and Buckley [3] classified various types of beamformers according to spatial filtering methods and analyzed their beam patterns. The Frost beamformer [4] was one of the first array structures to handle adaptive broad-band array processing. Griffiths and Jim [5] proposed an alternative method of Frost s algorithm and introduced the Generalized sidelobe canceller (GSC) solution, which not only effectively reduces the computational complexity but also provides flexibility to implement different beamformers. However, GSC algorithm suffers from signal cancellation problem because of the steering vector error, reverberation or imperfect microphones [1,6]. This problem has been noticed by some researchers and many adaptive beamforming algorithms have been proposed to avoid that [7 13]. Most of these methods, however, are not robust in transient non-stationary noise environment. In order to prevent the algorithms from diverging, several trials need to be conducted before a proper step-size is found. These drawbacks obviously will obstruct the use of these adaptive beamforming algorithms in practice. To suppress diffuse noises, post-filtering is normally needed. Zelinski s postfilter [14] employs auto- and cross- correlation functions of received multi-channel signals to derive a proper gain for enhancement. However, this method is based on the assumption of incoherent noise field which is seldom satisfied in practical environments. A generalized expression for Zelinski postfilter has been derived based on the a priori knowledge of noise field [15]. J. Li and Masato Akagi [16,17] proposed a hybrid post-filter with the assumption of a diffuse noise field. A modified Zelinski post-filter is applied to the high frequencies to suppress spatially uncorrelated noise and a single-channel wiener post-filter is applied to the low frequencies for cancellation of spatial correlated noise. However, as the aperture of the array decreases, correlation of noise becomes stronger, which makes the distinction between noise and desired speech weaker. And the post-filters mentioned above become unreliable. Another drawback of these post-filtering techniques is that highly non-stationary noise components can not dealt with well in real world applications [18]. To deal with the problems of the traditional algorithms mentioned above, in this paper, a robust GSC algorithm which has the feature of interaction between beamforming and multichannel post-filtering is proposed, as shown in Fig.1. The outputs of Fixed beamforming (FBF) and a modified Blocking matrix (BM), which uses more spatial information are analy- Manuscript Received Dec. 2010; Accepted June 2011. This work is partially supported by the National Natural Science Foundation of China (No.10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319)
86 Chinese Journal of Electronics 2012 Fig. 1. The framework of the proposed algorithm zed in the Short-time Fourier transform (STFT) domain and regrouped into auditory subbands according to the Bark scale, which mimics the auditory characteristics of human ears. And adaptive interference cancellation is performed in each subband. A multi-channel signal presence probability estimation based post-filter [18] is adopted to further enhance the output of the robust GSC, which is particularly advantageous in nonstationary noise environments. Besides, this method does not need the difference of correlation between speech and noise, making it more robust on small aperture arrays. A closed-loop controller uses feedback to control states of a dynamic system can keep the control error to a minimum and dynamically compensate for disturbances to the system [19]. In speech enhancement area, adaptive beamforming can be seen as a dynamic system which is adaptive to the adverse environment. Besides, speech signal is sparse in time-frequency domain, traditional GSC algorithm does not using these characteristics. Based on these considerations, we propose a novel subband feedback controller based on speech presence probability which is derived from the post-filtering to feedback control the adaptive interference canceller of GSC in each subband. We modified Cohen s multi-channel post-filtering so that signal presence probability in each auditory subband can be derived. The update of the filter coefficients is slowed down when the desired speech is present so that the proposed algorithm is more robust to array imperfection or reverberation, as the desired speech may leak into the reference channel. The interaction between the multi-channel processing and the postfiltering leads to better signal preservation thus improves the algorithm s overall performance. The remainder of the paper is organized as follows: a detail of the proposed speech enhancement algorithm is introduced in Section II. In Section III, we evaluate our algorithm and compare it with other methods. Conclusions are drawn in Section IV. II. Proposed Speech Enhancement Algorithm Consider a four-sensor microphone array in noisy environment, the observed signal on each microphone is composed of desired speech signal, directional noises arriving from determinable directions and diffuse noises propagating in all directions. The aim of our task is to reduce both directional and diffuse noises simultaneously while keeping the desired speech distortionless. To implement this idea, we construct a speech enhancement system, as shown in Fig.1, which consists of three main parts: robust generalized sidelobe canceller for directional noises suppression, multi-channel post-filtering for diffuse noises suppression, and the interaction of these two parts through a signal presence probability-based subband feedback controller, detailed in the following three subsections. 1. Robust generalized sidelobe canceller To suppress directional noises, we proposed a robust GSC algorithm which has three main parts: FBF, modified BM, and auditory subband adaptive interferences cancellation as shown in Fig.1. In the original GSC beamformers, the BM parts was implemented by subtracting between observed signals on adjacent sensors, which indicates that only limited spatial information was used. Comparatively, the modified BM considers the spatial information not only between adjacent sensors but 1 1 0 0 also other sensor pairs, given by: 1 0 1 0. Experiment demonstrates the effectiveness of this BM in Section 1 0 0 1 III. Signal from the output of FBF and BM (denoted as y(n) and u m(n) respectively) are segmented into temporal frames and analyzed by STFT. y(n) ST F T Y (k, l), u m(n) ST F T U m(k, l) (1) in which l and k denote the index of temporal frames and frequency bins, respectively, m = 1, 2, M 1, M is the number of the microphones. We regroup the frequency domain signal of each frame into B groups according to Bark scale. The vectors of bins within the bth group are denoted as Y b (l) and U b m(l) respectively. Recall that our goal is to minimize the output power under a constraint on the response at the desired direction. Since the constraint is satisfied in the fixed beamformer, this is an unconstrained minimization similar to Widrow s classical Adaptive
Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse... 87 noise cancellation problem [20]. J b m(l) = E[ Y b (l) W b m(l)u b m(l) 2 ] (2) J b m(l) denotes the energy in bth band, and E( ) is the expectation operator. Minimizing J b m(l) leads to where W b m(l) opt = Φb U my (l) Φ b U mu m (l), if dj b m(l) dw b m(l) = 0 (3) Φ b U my (l) = E[U b m(l)(y b (l)) H ] (4) Φ b U mu m (l) = E[U b m(l)(u b m(l)) H ] (5) ( ) H is the Hermitian transpose operator. In order to track changes, we process the signals by segments. The following Unconstrained frequency domain normalized LMS (UFNLMS) algorithm is used. The adaptive interference canceller filter in each of the subband is updated by a modified UFNLMS with a different norm constraint. where W b m(l + 1) = W b m(l) + µ U b m(l)(y b (l)) H P b est(l) (6) M 1 Pest(l) b = αpest(l b 1) + (1 α) Xm(l) b 2 (7) m=1 For a standard UFNLMS algorithm, we should calculate P b est(l) using the power of the noise reference signals, but we find in experiment that the signal cancellation problem is serious if we update the weight during speech presence, so we usex b m(l) which is the frequency domain representation of input sensor signals, instead. The performance is improved due to the fact that the adaptation term becomes relatively small during speech presence. This can be seen as an implicit control of the adaptive filter. In order to precisely control the filter adaptation in Generalized sidelobe canceller, we proposed a method to use the signal presence probability derived from the post-filtering to feedback control the adaptive interference canceller of GSC in each subband, which will be detailed in Section II.3. As speech is concerned, the energy of desired signal mainly centralizes in low frequencies, so the signal in this area appears to be more colorful, while in higher frequencies, signal energy appears to be much weaker. So it is reasonable that non-uniform filter banks, instead of the uniform ones, should be used to make the low frequency bands narrower to proceed explicit analysis while in the high frequency bands, the bandwidth should be broader to contain more signal energy in order that the adaptive interference canceller may converge more smoothly. Functioning adaptive interferences canceller in a series of subbands can improve the system s SNR gain as well as enable it to deal with multiple interferences in different bands. This auditory subband method has been proved to be effective in our previous work [21]. 2. Multi-channel post-filtering The residual diffuse noises are further suppressed by a signal presence probability based multi-channel post-filtering [18], which uses a multi-channel soft signal detection based on the non-stationary of the signals and the transient power ratio between the beamformer primary output and its reference noise signals to estimate the speech presence probability and noise power spectral density and then an optimal gain function that minimizes the mean square error of the log-spectral amplitude is applied. The post-filtering estimates the Ephraim-Malah (EM) gain [23] : G EM (k, l) and SPP: P (k, l). And final gain for enhancement G(k, l) is reached by G(k, l) = (G EM (k, l)) P (k,l) 1 P (k,l) Gmin (8) where G min is the minimum gain allowed. G EM (k, l) is derived from single channel approach mainly and is able to reduce the stationary and quasi-stationary noises. And P (k, l), which suggests the probability of the desired speech exists in the corresponding time-frequency unit, is calculated by considering the ratio between the transient power of the GSC output Z(k, l) and the transient power of the BM output reference signal U m(k, l). A low ratio indicates a larger transient power in the reference channel, which means that an interfering source is probably present. In this case, a smaller P (k, l) is assigned. Thus the non-stationary noise in Z(k, l) will be further suppressed according to Eq.(8) because a small P (k, l) will make the final gain approach G min. The enhanced spectrum is given by Ŝ(k, l) = G(k, l) Z(k, l) (9) and the enhanced signal is obtained by taking the inverse Fourier transform of the enhanced spectrum using the phase of the original noisy spectrum. Finally, the standard overlapand-add method is used to obtain the enhanced signal. As mentioned in Section II.3, SPP in each auditory subband is needed for constraining filter updates. This can be achieved by averaging SPP of the time frequency units within the corresponding subbands. 3. Subband feedback controlled adaptive filters In practical implementations, the target speaker may not stay precisely at 0. Moreover, the desired speech will also leak into the reference channel due to echo and reverberation characteristics of the room. Furthermore, the position and frequency response of the microphones may not be as precise as expected, leading to imperfect cancellation of the desired speech in the reference channel. So the minimization of Jm(l) b in Eq.(2) does not necessarily lead to maximization of output SNR, instead, a certain proportion of speech signal will be canceled as a result. The leakage will also cause false fluctuations of filter coefficients. To improve the system s robustness against the adversities mentioned above, it is preferable that the updating rate of the adaptive filters should be controlled according to the presence of the desired speech. When the desired speech is present, update mentioned in Eq.(6) should be slowed down. The adaptation speed and steady state error of the adaptive filter are highly related to the step-size constant [24], but it is very hard to find the optimal step-size which guarantees the good performance in a general environment. So u in Eq.(6) must vary in different frequency bands and temporal frames.
88 Chinese Journal of Electronics 2012 We propose an time-varying step-size which is controlled by the speech presence probability in each subband which is derived from the post-filtering described in the last section. p b (l) = 1 N b i=i 1,i 2,,i Nb P (i, l) (10) in which N b is the number of frequency bins within the bth subband, i 1, i 2,, i Nb is the index of frequency bins within the bth subband. ( µ b (l) = (1 p b (l))µ = 1 1 P (i, l))µ (11) N b i=i 1,i 2,,i Nb p b (l) is the signal presence probability derived from the postfiltering in the last section, 0 < p b (l) < 1. A greater p b (l) indicates a high probability that the desired signal may exist in the bth subband during the lth frame. Thus a smaller µ b (l) is achieved according to Eq.(11), resulting in slow updates of the adaptive filters which preserves the speech components. And a small p b (l) means the desired signal is mostly absent. So the updates become fast enough to adapt to the changing nature of the interferences. III. Evaluations and Discussions 1. Experimental configuration The microphone array used in this work is composed of 4 omni-directional MEMS (Micro electrical mechanical system) microphone in broadside orientation. The distance between the microphones is set to be 5cm. The system is implemented under a sampling rate of 8kHz. Fig. 2. Configuration of experiments in a room environment The experiment was taken place in a 6m 5m 3m conference room with a reverberation time of 300ms as shown in Fig.2. Two interferences (a competing speaker and a gauss white noise source) are located in 90 and 45 of the array, respectively. The speech source is ten male and ten female TIMIT sentences. The multi-channel clean speech is generated by computer simulation in a virtual room [25] with the same size and reverberation time of the conference room in which the interferences are recorded, so that clean speech signal can be obtained for objective evaluations. And then we mix the two parts with different global SNR levels ( 6 6dB). All the sound sources are 1m away from the array. For comparison, the multi-channel noisy speech is processed with six methods listed below. (1) GSC algorithm in time domain (GSC-TD) [5]. (2) GSC algorithm in frequency domain (GSC-FD) [26]. (3) GSC-FD with modified Blocking matrix (GSC-FD*). (4) GSC-FD*with Subband-feedback-controlled adaptive filters (GSC-FD*-SFC). (5) Cohen s algorithm [22]. (6) Proposed algorithm. 2. Objective evaluation measures and results To evaluate the studied noise reduction methods for speech enhancement, three objective speech quality measures were used: Noise reduction (NR), Log-spectral distance (LSD) and Perceptual evaluation of subjective quality (PESQ). (1) Noise reduction (NR) [22]. This measure compares the noise level in the enhanced signal to the noise level recorded by the first microphone. It is designed to test the system s noise canceling ability during non-speech segments. NR = 1 l 10 log fore2 (12) L l L l Ŝ2 in which fore denotes the signal received by one of the microphones, and Ŝ is the signal estimates. L is the set of frames containing only noise, and L is its cardinality. (2) Log-spectral distance(lsd) [22], which can be expressed as LSD = 1 L 1 { N/2 1 [10 log AS(k, l) L N/2 + 1 l=0 k=0 10 log AŜ(k, l)]2 } 1/2 (13) where AS(k, l) = max( S(k, l) 2, δ) is the spectral power clipped such that log-spectrum dynamic range is confined to about 50dB (that is δ = 10 50/10 max k,l { X(k, l) 2 }). And N is the order of Fast Fourier transform (FFT). (3) Perceptual evaluation of subjective quality (PESQ). This measure is able to predict subjective quality with good correlation in a very wide range of conditions specified by the ITU-T as recommendation P.862 [27]. Note that a higher PESQ means the higher speech quality of the enhanced signal. Table 1. PESQ-MOS scores Input SNR(dB) 6 3 0 3 6 Noisy 1.73 1.87 2.02 2.19 2.46 GSC-TD 1.93 2.03 2.13 2.27 2.43 GSC-FD 2.04 2.14 2.26 2.38 2.46 GSC-FD* 2.24 2.42 2.57 2.67 2.72 GSC-FD*-SFC 2.46 2.64 2.76 2.85 2.93 Cohen s method 2.64 2.76 2.85 2.93 3.01 Proposed method 2.66 2.83 2.98 3.11 3.20 The experiment results in the real room acoustic conditions are shown in Fig.3. Compared with (1) (4) algorithms, our proposed algorithm shows considerable improvement in terms of noise reduction and LSD in various SNR conditions. Compared with Cohen s algorithm, although, the noise reduction performance is similar, our proposed algorithm shows better signal preservation. It also shows that adaptive beamforming using frequency-domain adaptive filter exhibits fast convergence behavior and better performance of nulling wideband interferences. We can also notice that the modified fixed BM
Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse... 89 using more spatial information gains some improvement. The subband feedback controlled method can alleviate the problem of signal cancellation in adaptive beamformer and has better desired signal preservation. To further demonstrate this point, PESQ-MOS is employed, as shown in Table 1. Fig. 3. Performance comparison in real room environment under different noise level among different algorithms: GSC-TD (+), GSC-FD (*), GSC-FD* ( ), GSC-FD- SFC ( ), Cohen s algorithm ( ) and Proposed algorithm ( ) 3. Discussions From the experimental results presented in the last section, the superiorities of the proposed noise reduction method to the other traditional methods are discussed in the following paragraphs. The proposed modified Block matrix outperforms traditional GSC Block matrix due to the fact that detailed in the following. In the original GSC beamformer, the BM parts was implemented by subtracting between observed signals on adjacent sensors, which indicates that only limited spatial information was used. Comparatively, the modified BM considers the spatial information not only between adjacent sensors but also other sensor pairs. The proposed method outperforms the GSC beamformer. The traditional GSC beamformer suffers from signal cancellation problem because of the steering vector error, reverberation or imperfect microphones. To overcome this problem, we propose a subband feedback controller based on speech presence probability which is derived from the post-filtering to feedback control the adaptive interference canceller of GSC in each subband. The update of the filter coefficients is slowed down when the desired speech is present so that the proposed algorithm is more robust to array imperfection or reverberation, as the desired speech may leak into the reference channel. This method leads to better signal preservation thus improves the algorithm s overall performance. Furthermore, the partitioning of the signals in subbands will effectively convert a wideband signal to a number of narrow-band signals, thus a more effective processing will become possible. Adaptive beamforming using the frequencydomain NLMS exhibits fast convergence behavior and better performance of nulling wideband interferences than using the NLMS, especially for the larger eigenvalue spread. Compared with Cohen s method (GSC with a multichannel post-filtering), we can see that although, noise reduction performance is similar because we use the similar speech presence probability based multi-channel post-filtering to overcome the diffuse noises and transient noises, the improvement of signal preservation is considerable by our subband feedback controlled method. As a result, the proposed speech enhancement method provides the highest performance among the studied speech enhancement algorithms under all experimental conditions, as shown in Fig.3 and Table 1. Considering that the speech presence probability used by subband feedback controller is obtained from the post-filtering, it does not increase much computational cost. This method can also be applied to other Adaptive noise cancellation or Acoustic echo cancellation applications which need carefully control of adaptive filter. IV. Conclusion A multi-channel speech enhancement algorithm is proposed. The algorithm consists of three parts: directional noise suppression, which is based on a robust Generalized sidelobe canceller with subband feedback controlled adaptive filters; diffuse noise suppression which is implemented by a multichannel post-filtering based on speech presence probability; and the interaction of adaptive beamforming and post-filtering through a subband feedback controller. Experimental results indicate that the subband feedback controller make the filter adaptation more robust and alleviate the problem of signal cancellation in adaptive beamformer. The proposed algorithm achieves considerable improvement on signal preservation of the desired speech in adverse noise environments over the comparative algorithms. References [1] J. Benesty, J. Chen and Y. Huang, Microphone Array Signal Processing, Berlin, Germany: Springer-Verlag, 2008. [2] M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications, Berlin: Springer-Verlag, 2001. [3] V. Veen and B.D. Buckley, Beamforming: a versatile approach
90 Chinese Journal of Electronics 2012 to spatial filtering, IEEE Signal Processing Magazine, Vol.5, pp.4 24, 1988. [4] O.L. Frost, An algorithm for linearly constrained adaptive array processing, Proceedings of the IEEE, Vol.60, No.8, pp.926 935, Aug. 1972. [5] L.J. Griffths and C.W. Jim, An alternative approach to linearly constrained adaptive beamforming, IEEE Transactions on Antennas and Propagation, Vol.30, No.1, pp.27 34, Jan. 1981. [6] B. Widrow, Signal cancellation phenomena in adaptive antennas: causes and cures, IEEE Transactions on Antennas and Propagation, Vol.30, No.3, pp.469 478, 1982. [7] J.E. Greenberg, Evaluation of an adaptive beamforming method for hearing aids, J. Acoust. Soc. Am., Vol.91, No.3, pp.1662 1675, 1992. [8] O. Hoshuyama, A. Sugiyama and A. Hirano, A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters, IEEE Transactions on Signal Processing, Vol.47, No.10, pp.2677 2684, 1999. [9] S. Gannot, D. Burshtein and E. Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Transactions on Signal Processing, Vol.49, No.8, pp.1614 1626, 2001. [10] W. Herbordt and W. Kellermann, Analysis of blocking matrix for generalized sidelobe cancellers for non-stationary broadband signals, IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, USA, Vol.4, pp.iv 4187, May 2002. [11] W.H. Neo and B. Farhang-Boroujeny, Robust microphone arrays using subband adaptive filters, IEE Proc.-Vis. Image Signal Process., Vol.149, No.1, pp.17 25, 2002. [12] E. Warsitz, A. Krueger and R. Haeb-Umbach, Speech enhancement with a new generalized eigenvector blocking matrix for application in a generalized sidelobe canceller,, IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, USA, pp.73 76, 2008. [13] A. Krueger, E. Warsitz and R. Haeb-Umbach, Speech enhancement with a GSC-like structure employing eigenvectorbased transfer function ratios estimation, IEEE Trans. on Audio, Speech, and Language Processing, Vol.19, pp.206 219, Jan. 2011. [14] R. Zelinski, A microphone array with adaptive post-filtering for noise reduction in reverberant rooms, IEEE International Conference on Acoustics, Speech, and Signal Processing, New York, USA, Vol.5, pp.2578 2581, May 1988. [15] I.A. McCowan and H. Bourlard, Microphone array post-filter based on noise field coherence, IEEE Transactions on Speech and Audio Processing, Vol.11, No.6, pp.709 716, 2003. [16] J. Li and M. Akagi, A noise reduction system based on hybrid noise estimation technique and post-filtering in arbitrary noise environments, Speech Communication, Vol.48, No.2, pp.111 126, 2006. [17] J. Li and M. Akagi, A hybrid microphone array post-filter in a diffuse noise field, Applied Acoustics, Vol.69, No.2, pp.546 557, 2008. [18] I. Cohen, Multichannel post-filtering in nonstationary noise environments, IEEE Transactions on Signal Processing, Vol.52, No.5, pp.1149 1160, 2004. [19] J.G.F. Franklin and A. Emami-Naeini, Feedback Control of Dynamic Systems, Addison-Wesley, Reading, MA, 1994. [20] B. Widrow, Adaptive noise cancelling, principles and applications, Proceedings of the IEEE, Vol.63, pp.1692 1716, 1975. [21] H. Zhang, Q. Fu and Y. Yan, Speech enhancement using compact microphone array and applications in distant speech acquisition, Chinese Journal of Electronics, Vol.18, No.3, pp.481 486, July 2009. [22] I. Cohen, Analysis of two-channel generalized sidelobe canceller (GSC) with post-filtering, IEEE Transactions on Speech and Audio Processing, Vol.11, No.6, pp.684 699, 2003. [23] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.33, No.2, pp.443 445, 1985. [24] A. Mader, H. Puder and G.U. Schmidt, Step-size control for acoustic echo cancellation filters- an overview, Signal Processing, Vol.80, pp.1697 1719, 2000. [25] J.B. Allen and D.A. Berkley, Image method for efficiently simulating small-room acoustics, J. Acoust. Soc. Am., Vol.65, No.4, pp.943 950, Apr. 1979. [26] Y.H. Chen and H.D. Fang, Frequency-domain implementation of griffiths-jim adaptive beamformer, J. Acoust. Soc. Am., Vol.91, No.6, pp.3354 3366, 1992. [27] A.W. Rix, J.G. Beerends, M.P. Hollier and A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.2, pp.749 752, 2001. LI Kai received the B.E. degree from Electronic Engineering Department of Wuhan University in 2007. Currently he is a Ph.D. candidate at the Institute of Acoustics, Chinese Academy of Sciences. His research interests include single and multi-channel speech enhancement, microphone array signal processing and distant-talking speech recognition. (Email: likai@hccl.ioa.ac.cn) FU Qiang received the B.E. degree from the Xi an Technological Uninversity, Xi an, China, in 1994, the M.S. degree in electronic engineering from Chongqing University of Posts and Telecommunications, Chongqing, China, in 1997, and the Ph.D. degree in electronic engineering from Xidian University, Xi an, in 2000. In 2000, he was working as a Researcher in Motorola China Research Center (MSRC), Shanghai, China. From 2001 to 2002, he was working as a senior Research Associate in Center for Spoken Language Understanding (CSLU), OGI School of Science and Engineering at Oregon Health & Science University, Oregon, USA. From 2002 to 2004, he was working as a Senior Postdoctoral Research Fellow in Department of Electric and Computer Engineering, University of Limerick, Ireland. He is currently an Associated Professor in Institute of Acoustics, Chinese Academy of Sciences, China. His research interests include speech analysis, microphone array processing and audio-visual signal processing, etc. Dr. Fu is a member of IEEE Signal Processing Society. YAN Yonghong received the B.E. degree from the Electronic Engineering Department of Tsinghua University in 1990, and Ph.D. degree in Computer Science and Engineering from Oregon Graduate Institute of Science and Engineering in 1995. From 1995 to 1998, he worked in OGI as an Assistant Professor, Associate Director and Associate Professor of the Center for Spoken Language Understanding. From 1998 to 2001 he worked as the Principal Engineer of Intel Microprocessors Research Lab, Director and Chief Scientist of Intel China Research Center. In 2002 he returned to China to work for Chinese Academy of Sciences. He is a professor and director of Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences. His research interests include large vocabulary speech recognition, speaker/language recognition and audio signal processing. He has published more than 100 papers and holds 40 patents.