Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse Environments

Similar documents
Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

LETTER Pre-Filtering Algorithm for Dual-Microphone Generalized Sidelobe Canceller Using General Transfer Function

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

IN REVERBERANT and noisy environments, multi-channel

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

MULTICHANNEL systems are often used for

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment

Adaptive beamforming using pipelined transform domain filters

ROBUST echo cancellation requires a method for adjusting

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

Towards an intelligent binaural spee enhancement system by integrating me signal extraction. Author(s)Chau, Duc Thanh; Li, Junfeng; Akagi,

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Recent Advances in Acoustic Signal Extraction and Dereverberation

Microphone Array Feedback Suppression. for Indoor Room Acoustics

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

arxiv: v1 [cs.sd] 4 Dec 2018

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Broadband Microphone Arrays for Speech Acquisition

Speech Enhancement Based On Noise Reduction

Comparison of LMS and NLMS algorithm with the using of 4 Linear Microphone Array for Speech Enhancement

Smart antenna for doa using music and esprit

Title. Author(s)Sugiyama, Akihiko; Kato, Masanori; Serizawa, Masahir. Issue Date Doc URL. Type. Note. File Information

PAPER Adaptive Microphone Array System with Two-Stage Adaptation Mode Controller

Subspace Noise Estimation and Gamma Distribution Based Microphone Array Post-filter Design

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 5, MAY

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

Different Approaches of Spectral Subtraction Method for Speech Enhancement

The Hybrid Simplified Kalman Filter for Adaptive Feedback Cancellation

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

OPTIMUM POST-FILTER ESTIMATION FOR NOISE REDUCTION IN MULTICHANNEL SPEECH PROCESSING

Eigenvalues and Eigenvectors in Array Antennas. Optimization of Array Antennas for High Performance. Self-introduction

Automotive three-microphone voice activity detector and noise-canceller

Robust Low-Resource Sound Localization in Correlated Noise

NOISE REDUCTION IN DUAL-MICROPHONE MOBILE PHONES USING A BANK OF PRE-MEASURED TARGET-CANCELLATION FILTERS. P.O.Box 18, Prague 8, Czech Republic

NOISE REDUCTION IN DUAL-MICROPHONE MOBILE PHONES USING A BANK OF PRE-MEASURED TARGET-CANCELLATION FILTERS. P.O.Box 18, Prague 8, Czech Republic

Residual noise Control for Coherence Based Dual Microphone Speech Enhancement

Speech enhancement with a GSC-like structure employing sparse coding

RECENTLY, there has been an increasing interest in noisy

Introduction to distributed speech enhancement algorithms for ad hoc microphone arrays and wireless acoustic sensor networks

Adaptive Noise Reduction Algorithm for Speech Enhancement

Acoustic Beamforming for Hearing Aids Using Multi Microphone Array by Designing Graphical User Interface

NOISE ESTIMATION IN A SINGLE CHANNEL

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

/$ IEEE

Analysis of LMS Algorithm in Wavelet Domain

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION

Auditory System For a Mobile Robot

A MULTI-CHANNEL POSTFILTER BASED ON THE DIFFUSE NOISE SOUND FIELD. Lukas Pfeifenberger 1 and Franz Pernkopf 1

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 5, SEPTEMBER

INTERFERENCE REJECTION OF ADAPTIVE ARRAY ANTENNAS BY USING LMS AND SMI ALGORITHMS

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Analysis of LMS and NLMS Adaptive Beamforming Algorithms

DISTANT or hands-free audio acquisition is required in

High-speed Noise Cancellation with Microphone Array

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Published in: Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

NOISE reduction, sometimes also referred to as speech enhancement,

Microphone Array Design and Beamforming

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Audio Restoration Based on DSP Tools

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Implementation of Optimized Proportionate Adaptive Algorithm for Acoustic Echo Cancellation in Speech Signals

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Li, Junfeng; Sakamoto, Shuichi; Hong Author(s) Akagi, Masato; Suzuki, Yôiti. Citation Speech Communication, 53(5):

Performance improvement in beamforming of Smart Antenna by using LMS algorithm

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

A Robust Adaptive Beamformer with a Blocking Matrix Using Coefficient-Constrained Adaptive Filters

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation

Performance Study of A Non-Blind Algorithm for Smart Antenna System

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

Speech Enhancement for Nonstationary Noise Environments

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm

Mikko Myllymäki and Tuomas Virtanen

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

PATH UNCERTAINTY ROBUST BEAMFORMING. Richard Stanton and Mike Brookes. Imperial College London {rs408,

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

ZLS38500 Firmware for Handsfree Car Kits

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Multiple Antenna Processing for WiMAX

260 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE

Wavelet Speech Enhancement based on the Teager Energy Operator

ACOUSTIC feedback problems may occur in audio systems

THE PROBLEM of electromagnetic interference between

Design of Robust Differential Microphone Arrays

REAL-TIME BROADBAND NOISE REDUCTION

Transcription:

Chinese Journal of Electronics Vol.21, No.1, Jan. 2012 Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse Environments LI Kai, FU Qiang and YAN Yonghong (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China) Abstract In this paper, we propose a speech enhancement algorithm which has the feature of interaction between adaptive beamforming and multi-channel postfiltering. A novel subband feedback controller based on speech presence probability is applied to Generalized Sidelobe Canceller algorithm to obtain a more robust adaptive beamforming in adverse environment and alleviate the problem of signal cancellation. A multi-channel postfiltering is used not only to further suppress diffuse noises and some transient noises, but also to give the speech presence probability information in each subband. Experimental results show that the proposed algorithm achieves considerable improvement on signal preservation of the desired speech in adverse noise environments over the comparative algorithms. Key words Speech enhancement, Microphone array, Generalized sidelobe canceller, Adaptive filter, Postfiltering. I. Introduction Microphone array has been widely used to improve the performance of speech communication and Automatic speech recognition (ASR) systems in adverse noise environments because of their effectiveness in enhancing the quality of the captured speech [1,2]. Compared with single channel systems, a substantial gain in performance is obtainable due to the spatial filtering capability to suppress interfering signals coming from undesired directions. In practical environments, there are both directional noises which have some determinable directions (e.g. competitive speaker s voice or background music) and diffuse noises which come from all directions due to the diffuse reflections of the room. To suppress directional noises, a lot of algorithms based on beamformer have been proposed [1,2]. Van Veen and Buckley [3] classified various types of beamformers according to spatial filtering methods and analyzed their beam patterns. The Frost beamformer [4] was one of the first array structures to handle adaptive broad-band array processing. Griffiths and Jim [5] proposed an alternative method of Frost s algorithm and introduced the Generalized sidelobe canceller (GSC) solution, which not only effectively reduces the computational complexity but also provides flexibility to implement different beamformers. However, GSC algorithm suffers from signal cancellation problem because of the steering vector error, reverberation or imperfect microphones [1,6]. This problem has been noticed by some researchers and many adaptive beamforming algorithms have been proposed to avoid that [7 13]. Most of these methods, however, are not robust in transient non-stationary noise environment. In order to prevent the algorithms from diverging, several trials need to be conducted before a proper step-size is found. These drawbacks obviously will obstruct the use of these adaptive beamforming algorithms in practice. To suppress diffuse noises, post-filtering is normally needed. Zelinski s postfilter [14] employs auto- and cross- correlation functions of received multi-channel signals to derive a proper gain for enhancement. However, this method is based on the assumption of incoherent noise field which is seldom satisfied in practical environments. A generalized expression for Zelinski postfilter has been derived based on the a priori knowledge of noise field [15]. J. Li and Masato Akagi [16,17] proposed a hybrid post-filter with the assumption of a diffuse noise field. A modified Zelinski post-filter is applied to the high frequencies to suppress spatially uncorrelated noise and a single-channel wiener post-filter is applied to the low frequencies for cancellation of spatial correlated noise. However, as the aperture of the array decreases, correlation of noise becomes stronger, which makes the distinction between noise and desired speech weaker. And the post-filters mentioned above become unreliable. Another drawback of these post-filtering techniques is that highly non-stationary noise components can not dealt with well in real world applications [18]. To deal with the problems of the traditional algorithms mentioned above, in this paper, a robust GSC algorithm which has the feature of interaction between beamforming and multichannel post-filtering is proposed, as shown in Fig.1. The outputs of Fixed beamforming (FBF) and a modified Blocking matrix (BM), which uses more spatial information are analy- Manuscript Received Dec. 2010; Accepted June 2011. This work is partially supported by the National Natural Science Foundation of China (No.10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319)

86 Chinese Journal of Electronics 2012 Fig. 1. The framework of the proposed algorithm zed in the Short-time Fourier transform (STFT) domain and regrouped into auditory subbands according to the Bark scale, which mimics the auditory characteristics of human ears. And adaptive interference cancellation is performed in each subband. A multi-channel signal presence probability estimation based post-filter [18] is adopted to further enhance the output of the robust GSC, which is particularly advantageous in nonstationary noise environments. Besides, this method does not need the difference of correlation between speech and noise, making it more robust on small aperture arrays. A closed-loop controller uses feedback to control states of a dynamic system can keep the control error to a minimum and dynamically compensate for disturbances to the system [19]. In speech enhancement area, adaptive beamforming can be seen as a dynamic system which is adaptive to the adverse environment. Besides, speech signal is sparse in time-frequency domain, traditional GSC algorithm does not using these characteristics. Based on these considerations, we propose a novel subband feedback controller based on speech presence probability which is derived from the post-filtering to feedback control the adaptive interference canceller of GSC in each subband. We modified Cohen s multi-channel post-filtering so that signal presence probability in each auditory subband can be derived. The update of the filter coefficients is slowed down when the desired speech is present so that the proposed algorithm is more robust to array imperfection or reverberation, as the desired speech may leak into the reference channel. The interaction between the multi-channel processing and the postfiltering leads to better signal preservation thus improves the algorithm s overall performance. The remainder of the paper is organized as follows: a detail of the proposed speech enhancement algorithm is introduced in Section II. In Section III, we evaluate our algorithm and compare it with other methods. Conclusions are drawn in Section IV. II. Proposed Speech Enhancement Algorithm Consider a four-sensor microphone array in noisy environment, the observed signal on each microphone is composed of desired speech signal, directional noises arriving from determinable directions and diffuse noises propagating in all directions. The aim of our task is to reduce both directional and diffuse noises simultaneously while keeping the desired speech distortionless. To implement this idea, we construct a speech enhancement system, as shown in Fig.1, which consists of three main parts: robust generalized sidelobe canceller for directional noises suppression, multi-channel post-filtering for diffuse noises suppression, and the interaction of these two parts through a signal presence probability-based subband feedback controller, detailed in the following three subsections. 1. Robust generalized sidelobe canceller To suppress directional noises, we proposed a robust GSC algorithm which has three main parts: FBF, modified BM, and auditory subband adaptive interferences cancellation as shown in Fig.1. In the original GSC beamformers, the BM parts was implemented by subtracting between observed signals on adjacent sensors, which indicates that only limited spatial information was used. Comparatively, the modified BM considers the spatial information not only between adjacent sensors but 1 1 0 0 also other sensor pairs, given by: 1 0 1 0. Experiment demonstrates the effectiveness of this BM in Section 1 0 0 1 III. Signal from the output of FBF and BM (denoted as y(n) and u m(n) respectively) are segmented into temporal frames and analyzed by STFT. y(n) ST F T Y (k, l), u m(n) ST F T U m(k, l) (1) in which l and k denote the index of temporal frames and frequency bins, respectively, m = 1, 2, M 1, M is the number of the microphones. We regroup the frequency domain signal of each frame into B groups according to Bark scale. The vectors of bins within the bth group are denoted as Y b (l) and U b m(l) respectively. Recall that our goal is to minimize the output power under a constraint on the response at the desired direction. Since the constraint is satisfied in the fixed beamformer, this is an unconstrained minimization similar to Widrow s classical Adaptive

Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse... 87 noise cancellation problem [20]. J b m(l) = E[ Y b (l) W b m(l)u b m(l) 2 ] (2) J b m(l) denotes the energy in bth band, and E( ) is the expectation operator. Minimizing J b m(l) leads to where W b m(l) opt = Φb U my (l) Φ b U mu m (l), if dj b m(l) dw b m(l) = 0 (3) Φ b U my (l) = E[U b m(l)(y b (l)) H ] (4) Φ b U mu m (l) = E[U b m(l)(u b m(l)) H ] (5) ( ) H is the Hermitian transpose operator. In order to track changes, we process the signals by segments. The following Unconstrained frequency domain normalized LMS (UFNLMS) algorithm is used. The adaptive interference canceller filter in each of the subband is updated by a modified UFNLMS with a different norm constraint. where W b m(l + 1) = W b m(l) + µ U b m(l)(y b (l)) H P b est(l) (6) M 1 Pest(l) b = αpest(l b 1) + (1 α) Xm(l) b 2 (7) m=1 For a standard UFNLMS algorithm, we should calculate P b est(l) using the power of the noise reference signals, but we find in experiment that the signal cancellation problem is serious if we update the weight during speech presence, so we usex b m(l) which is the frequency domain representation of input sensor signals, instead. The performance is improved due to the fact that the adaptation term becomes relatively small during speech presence. This can be seen as an implicit control of the adaptive filter. In order to precisely control the filter adaptation in Generalized sidelobe canceller, we proposed a method to use the signal presence probability derived from the post-filtering to feedback control the adaptive interference canceller of GSC in each subband, which will be detailed in Section II.3. As speech is concerned, the energy of desired signal mainly centralizes in low frequencies, so the signal in this area appears to be more colorful, while in higher frequencies, signal energy appears to be much weaker. So it is reasonable that non-uniform filter banks, instead of the uniform ones, should be used to make the low frequency bands narrower to proceed explicit analysis while in the high frequency bands, the bandwidth should be broader to contain more signal energy in order that the adaptive interference canceller may converge more smoothly. Functioning adaptive interferences canceller in a series of subbands can improve the system s SNR gain as well as enable it to deal with multiple interferences in different bands. This auditory subband method has been proved to be effective in our previous work [21]. 2. Multi-channel post-filtering The residual diffuse noises are further suppressed by a signal presence probability based multi-channel post-filtering [18], which uses a multi-channel soft signal detection based on the non-stationary of the signals and the transient power ratio between the beamformer primary output and its reference noise signals to estimate the speech presence probability and noise power spectral density and then an optimal gain function that minimizes the mean square error of the log-spectral amplitude is applied. The post-filtering estimates the Ephraim-Malah (EM) gain [23] : G EM (k, l) and SPP: P (k, l). And final gain for enhancement G(k, l) is reached by G(k, l) = (G EM (k, l)) P (k,l) 1 P (k,l) Gmin (8) where G min is the minimum gain allowed. G EM (k, l) is derived from single channel approach mainly and is able to reduce the stationary and quasi-stationary noises. And P (k, l), which suggests the probability of the desired speech exists in the corresponding time-frequency unit, is calculated by considering the ratio between the transient power of the GSC output Z(k, l) and the transient power of the BM output reference signal U m(k, l). A low ratio indicates a larger transient power in the reference channel, which means that an interfering source is probably present. In this case, a smaller P (k, l) is assigned. Thus the non-stationary noise in Z(k, l) will be further suppressed according to Eq.(8) because a small P (k, l) will make the final gain approach G min. The enhanced spectrum is given by Ŝ(k, l) = G(k, l) Z(k, l) (9) and the enhanced signal is obtained by taking the inverse Fourier transform of the enhanced spectrum using the phase of the original noisy spectrum. Finally, the standard overlapand-add method is used to obtain the enhanced signal. As mentioned in Section II.3, SPP in each auditory subband is needed for constraining filter updates. This can be achieved by averaging SPP of the time frequency units within the corresponding subbands. 3. Subband feedback controlled adaptive filters In practical implementations, the target speaker may not stay precisely at 0. Moreover, the desired speech will also leak into the reference channel due to echo and reverberation characteristics of the room. Furthermore, the position and frequency response of the microphones may not be as precise as expected, leading to imperfect cancellation of the desired speech in the reference channel. So the minimization of Jm(l) b in Eq.(2) does not necessarily lead to maximization of output SNR, instead, a certain proportion of speech signal will be canceled as a result. The leakage will also cause false fluctuations of filter coefficients. To improve the system s robustness against the adversities mentioned above, it is preferable that the updating rate of the adaptive filters should be controlled according to the presence of the desired speech. When the desired speech is present, update mentioned in Eq.(6) should be slowed down. The adaptation speed and steady state error of the adaptive filter are highly related to the step-size constant [24], but it is very hard to find the optimal step-size which guarantees the good performance in a general environment. So u in Eq.(6) must vary in different frequency bands and temporal frames.

88 Chinese Journal of Electronics 2012 We propose an time-varying step-size which is controlled by the speech presence probability in each subband which is derived from the post-filtering described in the last section. p b (l) = 1 N b i=i 1,i 2,,i Nb P (i, l) (10) in which N b is the number of frequency bins within the bth subband, i 1, i 2,, i Nb is the index of frequency bins within the bth subband. ( µ b (l) = (1 p b (l))µ = 1 1 P (i, l))µ (11) N b i=i 1,i 2,,i Nb p b (l) is the signal presence probability derived from the postfiltering in the last section, 0 < p b (l) < 1. A greater p b (l) indicates a high probability that the desired signal may exist in the bth subband during the lth frame. Thus a smaller µ b (l) is achieved according to Eq.(11), resulting in slow updates of the adaptive filters which preserves the speech components. And a small p b (l) means the desired signal is mostly absent. So the updates become fast enough to adapt to the changing nature of the interferences. III. Evaluations and Discussions 1. Experimental configuration The microphone array used in this work is composed of 4 omni-directional MEMS (Micro electrical mechanical system) microphone in broadside orientation. The distance between the microphones is set to be 5cm. The system is implemented under a sampling rate of 8kHz. Fig. 2. Configuration of experiments in a room environment The experiment was taken place in a 6m 5m 3m conference room with a reverberation time of 300ms as shown in Fig.2. Two interferences (a competing speaker and a gauss white noise source) are located in 90 and 45 of the array, respectively. The speech source is ten male and ten female TIMIT sentences. The multi-channel clean speech is generated by computer simulation in a virtual room [25] with the same size and reverberation time of the conference room in which the interferences are recorded, so that clean speech signal can be obtained for objective evaluations. And then we mix the two parts with different global SNR levels ( 6 6dB). All the sound sources are 1m away from the array. For comparison, the multi-channel noisy speech is processed with six methods listed below. (1) GSC algorithm in time domain (GSC-TD) [5]. (2) GSC algorithm in frequency domain (GSC-FD) [26]. (3) GSC-FD with modified Blocking matrix (GSC-FD*). (4) GSC-FD*with Subband-feedback-controlled adaptive filters (GSC-FD*-SFC). (5) Cohen s algorithm [22]. (6) Proposed algorithm. 2. Objective evaluation measures and results To evaluate the studied noise reduction methods for speech enhancement, three objective speech quality measures were used: Noise reduction (NR), Log-spectral distance (LSD) and Perceptual evaluation of subjective quality (PESQ). (1) Noise reduction (NR) [22]. This measure compares the noise level in the enhanced signal to the noise level recorded by the first microphone. It is designed to test the system s noise canceling ability during non-speech segments. NR = 1 l 10 log fore2 (12) L l L l Ŝ2 in which fore denotes the signal received by one of the microphones, and Ŝ is the signal estimates. L is the set of frames containing only noise, and L is its cardinality. (2) Log-spectral distance(lsd) [22], which can be expressed as LSD = 1 L 1 { N/2 1 [10 log AS(k, l) L N/2 + 1 l=0 k=0 10 log AŜ(k, l)]2 } 1/2 (13) where AS(k, l) = max( S(k, l) 2, δ) is the spectral power clipped such that log-spectrum dynamic range is confined to about 50dB (that is δ = 10 50/10 max k,l { X(k, l) 2 }). And N is the order of Fast Fourier transform (FFT). (3) Perceptual evaluation of subjective quality (PESQ). This measure is able to predict subjective quality with good correlation in a very wide range of conditions specified by the ITU-T as recommendation P.862 [27]. Note that a higher PESQ means the higher speech quality of the enhanced signal. Table 1. PESQ-MOS scores Input SNR(dB) 6 3 0 3 6 Noisy 1.73 1.87 2.02 2.19 2.46 GSC-TD 1.93 2.03 2.13 2.27 2.43 GSC-FD 2.04 2.14 2.26 2.38 2.46 GSC-FD* 2.24 2.42 2.57 2.67 2.72 GSC-FD*-SFC 2.46 2.64 2.76 2.85 2.93 Cohen s method 2.64 2.76 2.85 2.93 3.01 Proposed method 2.66 2.83 2.98 3.11 3.20 The experiment results in the real room acoustic conditions are shown in Fig.3. Compared with (1) (4) algorithms, our proposed algorithm shows considerable improvement in terms of noise reduction and LSD in various SNR conditions. Compared with Cohen s algorithm, although, the noise reduction performance is similar, our proposed algorithm shows better signal preservation. It also shows that adaptive beamforming using frequency-domain adaptive filter exhibits fast convergence behavior and better performance of nulling wideband interferences. We can also notice that the modified fixed BM

Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse... 89 using more spatial information gains some improvement. The subband feedback controlled method can alleviate the problem of signal cancellation in adaptive beamformer and has better desired signal preservation. To further demonstrate this point, PESQ-MOS is employed, as shown in Table 1. Fig. 3. Performance comparison in real room environment under different noise level among different algorithms: GSC-TD (+), GSC-FD (*), GSC-FD* ( ), GSC-FD- SFC ( ), Cohen s algorithm ( ) and Proposed algorithm ( ) 3. Discussions From the experimental results presented in the last section, the superiorities of the proposed noise reduction method to the other traditional methods are discussed in the following paragraphs. The proposed modified Block matrix outperforms traditional GSC Block matrix due to the fact that detailed in the following. In the original GSC beamformer, the BM parts was implemented by subtracting between observed signals on adjacent sensors, which indicates that only limited spatial information was used. Comparatively, the modified BM considers the spatial information not only between adjacent sensors but also other sensor pairs. The proposed method outperforms the GSC beamformer. The traditional GSC beamformer suffers from signal cancellation problem because of the steering vector error, reverberation or imperfect microphones. To overcome this problem, we propose a subband feedback controller based on speech presence probability which is derived from the post-filtering to feedback control the adaptive interference canceller of GSC in each subband. The update of the filter coefficients is slowed down when the desired speech is present so that the proposed algorithm is more robust to array imperfection or reverberation, as the desired speech may leak into the reference channel. This method leads to better signal preservation thus improves the algorithm s overall performance. Furthermore, the partitioning of the signals in subbands will effectively convert a wideband signal to a number of narrow-band signals, thus a more effective processing will become possible. Adaptive beamforming using the frequencydomain NLMS exhibits fast convergence behavior and better performance of nulling wideband interferences than using the NLMS, especially for the larger eigenvalue spread. Compared with Cohen s method (GSC with a multichannel post-filtering), we can see that although, noise reduction performance is similar because we use the similar speech presence probability based multi-channel post-filtering to overcome the diffuse noises and transient noises, the improvement of signal preservation is considerable by our subband feedback controlled method. As a result, the proposed speech enhancement method provides the highest performance among the studied speech enhancement algorithms under all experimental conditions, as shown in Fig.3 and Table 1. Considering that the speech presence probability used by subband feedback controller is obtained from the post-filtering, it does not increase much computational cost. This method can also be applied to other Adaptive noise cancellation or Acoustic echo cancellation applications which need carefully control of adaptive filter. IV. Conclusion A multi-channel speech enhancement algorithm is proposed. The algorithm consists of three parts: directional noise suppression, which is based on a robust Generalized sidelobe canceller with subband feedback controlled adaptive filters; diffuse noise suppression which is implemented by a multichannel post-filtering based on speech presence probability; and the interaction of adaptive beamforming and post-filtering through a subband feedback controller. Experimental results indicate that the subband feedback controller make the filter adaptation more robust and alleviate the problem of signal cancellation in adaptive beamformer. The proposed algorithm achieves considerable improvement on signal preservation of the desired speech in adverse noise environments over the comparative algorithms. References [1] J. Benesty, J. Chen and Y. Huang, Microphone Array Signal Processing, Berlin, Germany: Springer-Verlag, 2008. [2] M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications, Berlin: Springer-Verlag, 2001. [3] V. Veen and B.D. Buckley, Beamforming: a versatile approach

90 Chinese Journal of Electronics 2012 to spatial filtering, IEEE Signal Processing Magazine, Vol.5, pp.4 24, 1988. [4] O.L. Frost, An algorithm for linearly constrained adaptive array processing, Proceedings of the IEEE, Vol.60, No.8, pp.926 935, Aug. 1972. [5] L.J. Griffths and C.W. Jim, An alternative approach to linearly constrained adaptive beamforming, IEEE Transactions on Antennas and Propagation, Vol.30, No.1, pp.27 34, Jan. 1981. [6] B. Widrow, Signal cancellation phenomena in adaptive antennas: causes and cures, IEEE Transactions on Antennas and Propagation, Vol.30, No.3, pp.469 478, 1982. [7] J.E. Greenberg, Evaluation of an adaptive beamforming method for hearing aids, J. Acoust. Soc. Am., Vol.91, No.3, pp.1662 1675, 1992. [8] O. Hoshuyama, A. Sugiyama and A. Hirano, A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters, IEEE Transactions on Signal Processing, Vol.47, No.10, pp.2677 2684, 1999. [9] S. Gannot, D. Burshtein and E. Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech, IEEE Transactions on Signal Processing, Vol.49, No.8, pp.1614 1626, 2001. [10] W. Herbordt and W. Kellermann, Analysis of blocking matrix for generalized sidelobe cancellers for non-stationary broadband signals, IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, USA, Vol.4, pp.iv 4187, May 2002. [11] W.H. Neo and B. Farhang-Boroujeny, Robust microphone arrays using subband adaptive filters, IEE Proc.-Vis. Image Signal Process., Vol.149, No.1, pp.17 25, 2002. [12] E. Warsitz, A. Krueger and R. Haeb-Umbach, Speech enhancement with a new generalized eigenvector blocking matrix for application in a generalized sidelobe canceller,, IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, USA, pp.73 76, 2008. [13] A. Krueger, E. Warsitz and R. Haeb-Umbach, Speech enhancement with a GSC-like structure employing eigenvectorbased transfer function ratios estimation, IEEE Trans. on Audio, Speech, and Language Processing, Vol.19, pp.206 219, Jan. 2011. [14] R. Zelinski, A microphone array with adaptive post-filtering for noise reduction in reverberant rooms, IEEE International Conference on Acoustics, Speech, and Signal Processing, New York, USA, Vol.5, pp.2578 2581, May 1988. [15] I.A. McCowan and H. Bourlard, Microphone array post-filter based on noise field coherence, IEEE Transactions on Speech and Audio Processing, Vol.11, No.6, pp.709 716, 2003. [16] J. Li and M. Akagi, A noise reduction system based on hybrid noise estimation technique and post-filtering in arbitrary noise environments, Speech Communication, Vol.48, No.2, pp.111 126, 2006. [17] J. Li and M. Akagi, A hybrid microphone array post-filter in a diffuse noise field, Applied Acoustics, Vol.69, No.2, pp.546 557, 2008. [18] I. Cohen, Multichannel post-filtering in nonstationary noise environments, IEEE Transactions on Signal Processing, Vol.52, No.5, pp.1149 1160, 2004. [19] J.G.F. Franklin and A. Emami-Naeini, Feedback Control of Dynamic Systems, Addison-Wesley, Reading, MA, 1994. [20] B. Widrow, Adaptive noise cancelling, principles and applications, Proceedings of the IEEE, Vol.63, pp.1692 1716, 1975. [21] H. Zhang, Q. Fu and Y. Yan, Speech enhancement using compact microphone array and applications in distant speech acquisition, Chinese Journal of Electronics, Vol.18, No.3, pp.481 486, July 2009. [22] I. Cohen, Analysis of two-channel generalized sidelobe canceller (GSC) with post-filtering, IEEE Transactions on Speech and Audio Processing, Vol.11, No.6, pp.684 699, 2003. [23] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.33, No.2, pp.443 445, 1985. [24] A. Mader, H. Puder and G.U. Schmidt, Step-size control for acoustic echo cancellation filters- an overview, Signal Processing, Vol.80, pp.1697 1719, 2000. [25] J.B. Allen and D.A. Berkley, Image method for efficiently simulating small-room acoustics, J. Acoust. Soc. Am., Vol.65, No.4, pp.943 950, Apr. 1979. [26] Y.H. Chen and H.D. Fang, Frequency-domain implementation of griffiths-jim adaptive beamformer, J. Acoust. Soc. Am., Vol.91, No.6, pp.3354 3366, 1992. [27] A.W. Rix, J.G. Beerends, M.P. Hollier and A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.2, pp.749 752, 2001. LI Kai received the B.E. degree from Electronic Engineering Department of Wuhan University in 2007. Currently he is a Ph.D. candidate at the Institute of Acoustics, Chinese Academy of Sciences. His research interests include single and multi-channel speech enhancement, microphone array signal processing and distant-talking speech recognition. (Email: likai@hccl.ioa.ac.cn) FU Qiang received the B.E. degree from the Xi an Technological Uninversity, Xi an, China, in 1994, the M.S. degree in electronic engineering from Chongqing University of Posts and Telecommunications, Chongqing, China, in 1997, and the Ph.D. degree in electronic engineering from Xidian University, Xi an, in 2000. In 2000, he was working as a Researcher in Motorola China Research Center (MSRC), Shanghai, China. From 2001 to 2002, he was working as a senior Research Associate in Center for Spoken Language Understanding (CSLU), OGI School of Science and Engineering at Oregon Health & Science University, Oregon, USA. From 2002 to 2004, he was working as a Senior Postdoctoral Research Fellow in Department of Electric and Computer Engineering, University of Limerick, Ireland. He is currently an Associated Professor in Institute of Acoustics, Chinese Academy of Sciences, China. His research interests include speech analysis, microphone array processing and audio-visual signal processing, etc. Dr. Fu is a member of IEEE Signal Processing Society. YAN Yonghong received the B.E. degree from the Electronic Engineering Department of Tsinghua University in 1990, and Ph.D. degree in Computer Science and Engineering from Oregon Graduate Institute of Science and Engineering in 1995. From 1995 to 1998, he worked in OGI as an Assistant Professor, Associate Director and Associate Professor of the Center for Spoken Language Understanding. From 1998 to 2001 he worked as the Principal Engineer of Intel Microprocessors Research Lab, Director and Chief Scientist of Intel China Research Center. In 2002 he returned to China to work for Chinese Academy of Sciences. He is a professor and director of Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences. His research interests include large vocabulary speech recognition, speaker/language recognition and audio signal processing. He has published more than 100 papers and holds 40 patents.