An Integrated Real-Time Beamforming and Postfiltering System for Nonstationary Noise Environments

Similar documents
IN REVERBERANT and noisy environments, multi-channel

MULTICHANNEL systems are often used for

Dual Transfer Function GSC and Application to Joint Noise Reduction and Acoustic Echo Cancellation

546 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY /$ IEEE

Dual-Microphone Speech Dereverberation in a Noisy Environment

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging

LETTER Pre-Filtering Algorithm for Dual-Microphone Generalized Sidelobe Canceller Using General Transfer Function

Speech Enhancement for Nonstationary Noise Environments

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

Recent Advances in Acoustic Signal Extraction and Dereverberation

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

AN ADAPTIVE MICROPHONE ARRAY FOR OPTIMUM BEAMFORMING AND NOISE REDUCTION

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

OPTIMUM POST-FILTER ESTIMATION FOR NOISE REDUCTION IN MULTICHANNEL SPEECH PROCESSING

Speech Signal Enhancement Techniques

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Single channel noise reduction

Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse Environments

Automotive three-microphone voice activity detector and noise-canceller

/$ IEEE

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment

SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

A COHERENCE-BASED ALGORITHM FOR NOISE REDUCTION IN DUAL-MICROPHONE APPLICATIONS

Speech Enhancement Based On Noise Reduction

SPEECH MEASUREMENTS USING A LASER DOPPLER VIBROMETER SENSOR: APPLICATION TO SPEECH ENHANCEMENT

Noise Reduction: An Instructional Example

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

Local Relative Transfer Function for Sound Source Localization

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Wavelet Speech Enhancement based on the Teager Energy Operator

NOISE ESTIMATION IN A SINGLE CHANNEL

Different Approaches of Spectral Subtraction Method for Speech Enhancement

ROBUST echo cancellation requires a method for adjusting

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Published in: Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control

DISTANT or hands-free audio acquisition is required in

Calibration of Microphone Arrays for Improved Speech Recognition

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Adaptive Noise Reduction Algorithm for Speech Enhancement

Microphone Array Power Ratio for Speech Quality Assessment in Noisy Reverberant Environments 1

Towards an intelligent binaural spee enhancement system by integrating me signal extraction. Author(s)Chau, Duc Thanh; Li, Junfeng; Akagi,

Mel Spectrum Analysis of Speech Recognition using Single Microphone

International Journal of Advanced Research in Computer Science and Software Engineering

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Multiple Sound Sources Localization Using Energetic Analysis Method

REAL-TIME BROADBAND NOISE REDUCTION

Estimation of Non-stationary Noise Power Spectrum using DWT

Chapter 4 SPEECH ENHANCEMENT

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Implementation of decentralized active control of power transformer noise

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

Broadband Microphone Arrays for Speech Acquisition

Sound Processing Technologies for Realistic Sensations in Teleworking

A MULTI-CHANNEL POSTFILTER BASED ON THE DIFFUSE NOISE SOUND FIELD. Lukas Pfeifenberger 1 and Franz Pernkopf 1

LOCAL RELATIVE TRANSFER FUNCTION FOR SOUND SOURCE LOCALIZATION

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Subspace Noise Estimation and Gamma Distribution Based Microphone Array Post-filter Design

Microphone Array Feedback Suppression. for Indoor Room Acoustics

High-speed Noise Cancellation with Microphone Array

Phase estimation in speech enhancement unimportant, important, or impossible?

Speech Enhancement Using Microphone Arrays

MINUET: MUSICAL INTERFERENCE UNMIXING ESTIMATION TECHNIQUE

Robust Low-Resource Sound Localization in Correlated Noise

Modern spectral analysis of non-stationary signals in power electronics

STATISTICAL METHODS FOR THE ENHANCEMENT OF NOISY SPEECH. Rainer Martin

THE problem of acoustic echo cancellation (AEC) was

Mikko Myllymäki and Tuomas Virtanen

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

ROBUST SPEECH RECOGNITION USING AN AUXILIARY LASER-DOPPLER VIBROMETER SENSOR

Comparison of LMS and NLMS algorithm with the using of 4 Linear Microphone Array for Speech Enhancement

A Novel Adaptive Algorithm for

DETECTION AND LOCATION OF ANONYMOUS SIGNAL USING SENSOR NETWORK

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

RECENTLY, there has been an increasing interest in noisy

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 5, SEPTEMBER

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION

Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics

260 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

INTERFERENCE REJECTION OF ADAPTIVE ARRAY ANTENNAS BY USING LMS AND SMI ALGORITHMS

University Ibn Tofail, B.P. 133, Kenitra, Morocco. University Moulay Ismail, B.P Meknes, Morocco

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Nonuniform multi level crossing for signal reconstruction

Performance Evaluation of Noise Estimation Techniques for Blind Source Separation in Non Stationary Noise Environment

Transcription:

EURASIP Journal on Applied Signal Processing : 6 7 c Hindawi Publishing Corporation An Integrated Real-Time Beamforming and Postfiltering System for Nonstationary Noise Environments Israel Cohen Department of Electrical Engineering Technion Israel Institute of Technology Haifa Israel Email: icohen@eetechnionacil Sharon Gannot School of Engineering Bar-Ilan University Ramat-Gan 59 Israel Email: gannot@siglabtechnionacil Baruch Berdugo Lamar Signal Processing Ltd Andrea Electronics Corp PO Box 57 Yokneam Ilit 69 Israel Email: bberdugo@lamarcoil Received September and in revised form 6 March We present a novel approach for real-time multichannel speech enhancement in environments of nonstationary noise and timevarying acoustical transfer functions (ATFs) The proposed system integrates adaptive beamforming ATF identification soft signal detection and multichannel postfiltering The noise canceller branch of the beamformer and the ATF identification are adaptively updated online based on hypothesis test results The noise canceller is updated only during stationary noise frames and the ATF identification is carried out only when desired source components have been detected The hypothesis testing is based on the nonstationarity of the signals and the transient power ratio between the beamformer primary output and its reference noise signals Following the beamforming and the hypothesis testing estimates for the signal presence probability and for the noise power spectral density are derived Subsequently an optimal spectral gain function that minimizes the mean square error of the log-spectral amplitude (LSA) is applied Experimental results demonstrate the usefulness of the proposed system in nonstationary noise environments Keywords and phrases: array signal processing signal detection acoustic noise measurement speech enhancement spectral analysis adaptive signal processing INTRODUCTION Postfiltering methods for multimicrophone speech enhancement algorithms have recently attracted an increased interest It is well known that beamforming methods yield a significant improvement in speech quality [] However when the noise field is spatially incoherent or diffuse the noise reduction is insufficient and additional postfiltering is normally required [] Most multimicrophone speech enhancement methods comprise a multichannel part (either delaysum beamformer or generalized sidelobe canceller (GSC) []) followed by a postfilter which is based on Wiener filtering (sometimes in conjunction with spectral subtraction) Numerous articles have been published on that subject for example [ 5 6 7 8 9 ] to mention just a few A major drawback of these multichannel postfiltering techniques is that highly nonstationary noise components are not dealt with The time variation of the interfering signals is assumed to be sufficiently slow such that the postfilter can track and adapt to the changes in the noise statistics Unfortunately transient interferences are often much too brief and abrupt for the conventional tracking methods Recently a multichannel postfilter was incorporated into the GSC beamformer [ ] The use of both the beamformer primary output and the reference noise signals (resulting from the blocking branch of the GSC) for distinguishing between desired speech transients and interfering transients enables the algorithm to work in nonstationary noise environments In [5] the multichannel postfilter is combined with the transfer function GSC (TF GSC) [6] and compared with single-microphone postfilters namely the mixture-maximum (MIXMAX) [7] and the optimally modified log-spectral amplitude (OM LSA) estimator [8] The multichannel postfilter combined with the TF GSC proved the best for handling abrupt noise spectral variations However in all past contributions the beamformer

An Integrated Beamforming and Postfiltering System 65 stage feeds the postfilter but the adverse is not true The decisions made by the postfilter distinguishing between speech stationary noise and transient noise might be fed back to the beamformer to enable the use of the method in real-time applications Exploiting this information will also enable the tracking of the acoustical transfer functions (ATFs) caused by talker movements In this paper we present a real-time multichannel speech enhancement system which integrates adaptive beamforming and multichannel postfiltering The beamformer is based on the TF GSC However the requirement for the stationarity of the noise is relaxed Furthermore we allow the ATFs to vary in time which entails an online system identification procedure We define hypotheses that indicate either the absence of transients presence of an interfering transient or presence of desired source components (the stationary noise persists in all cases) The noise canceller branch of the beamformer is updated only during the absence of transients and the ATF identification is carried out only when desired source components are present Following the beamforming and the hypothesis testing estimates for the signal presence probability and for the noise power spectral density (PSD) are derived Subsequently an optimal spectral gain function that minimizes the mean square error of the log-spectral amplitude (LSA) is applied The performance of the proposed system is evaluated under nonstationary noise conditions and compared to that obtained with a single-channel postfiltering approach We show that single-channel postfiltering is inefficient at attenuating highly nonstationary noise components since it lacks the ability to differentiate such components from the desired source components By contrast the proposed system achieves a significantly reduced level of background noise whether stationary or not without further distorting the signal components The paper is organized as follows In Section we introduce a novel approach for real-time beamforming in nonstationary noise environments under the circumstances of time-varying ATFs The noise canceller branch of the beamformer and the ATF identification are adaptively updated online based on hypothesis test results In Section the problem of hypothesis testing in the time-frequency plane is addressed Signal components are detected and discriminated from the transient noise components based on the transient power ratio between the beamformer primary output and its reference noise signals In Section we introduce the multichannel postfilter and outline the implementation steps of the integrated TF GSC and multichannel postfiltering algorithm Finally in Section5 we evaluate the proposed system and present experimental results which validate its usefulness TRANSFER FUNCTION GENERALIZED SIDELOBE CANCELLING Let x(t) denote a desired speech source signal that subject to some acoustic propagation is received by M microphones along with additive uncorrelated interfering signals The interference at the ith sensor comprises a pseudostationary noise signal d is (t) and a transient noise component d it (t) The observed signals are given by z i (t) = a i (t) x(t)+d is (t)+d it (t) i = M () where a i (t) is the impulse response of the ith sensor to the desired source and denotes convolution Using the shorttime Fourier transform (STFT) we have Z(kl) = A(kl)X(kl)+D s (kl)+d t (kl) () in the time-frequency domain where k represents the frequency bin index l the frame index and Z(kl) [ Z (kl) Z (kl) Z M (kl) ] T A(kl) [ A (kl) A (kl) A M (kl) ] T D s (kl) [ D s (kl) D s (kl) D Ms (kl) ] T D t (kl) [ D t (kl) D t (kl) D Mt (kl) ] T The observed noisy signals are processed by the system shown in Figure This structure is a modification to the recently proposed TF GSC [6] which is an extension of the linearly constrained adaptive beamformer [ 9] for arbitrary ATFs A(kl) In [6] transient interferences are not dealt with since signal enhancement is based on the nonstationarity of the desired source signal contrasted with the stationarity of the noise signal As such the ATF estimation was conducted in an offline manner Here the requirement for the stationarity of the noise is relaxed So a mechanism for discriminating interfering transients from desired signal components must be included Furthermore in contrast to the assumption of time-invariant ATFs in [6] we allow time-varying ATFs provided that their change rate is slow in comparison to that of the speech statistics This entails online adaptive estimates for the ATFs The beamformer comprises three parts: a fixed beamformer W which aligns the desired signal components; a blocking matrix B which blocks the desired components thus yielding the reference noise signals {U i : i M}; and a multichannel adaptive noise canceller {H i : i M} which eliminates the stationary noise that leaks through the sidelobes of the fixed beamformer The reference noise signals U(kl) = [U (kl) U (kl) U M (kl)] T are generated by applying the blocking matrix to the observed signal vector: U(kl)=B H (kl)z(kl) =B H (kl) [ A(kl)X(kl)+D s (kl)+d t (kl) ] The reference noise signals are emphasized by the adaptive noise canceller and subtracted from the output of the fixed beamformer yielding () () Y(kl) = [ W H (kl) H H (kl)b H (kl) ] Z(kl) (5)

66 EURASIP Journal on Applied Signal Processing Z (kl) Z (kl) Z M (kl) W H (kl) + Y(kl) B H (kl) U (kl) U (kl) U M (kl) H (kl) + H (kl) + + HM (kl) Figure : Block diagram of the TF GSC where H(kl) = [H (kl) H (kl) H M (kl)] T Itis worth mentioning that a perfect blocking matrix implies B H (kl)a(kl) = In that case U(kl) indeed contains only noise components: U(kl) = B H (kl) [ D s (kl)+d t (kl) ] (6) In general however B H (kl)a(kl) thus desired signal components may leak into the noise reference signals Let three hypotheses H s H t andh indicate respectively the absence of transients presence of an interfering transient and presence of a desired source transient at the beamformer output The optimal solution for the filters H(kl) is obtained by minimizing the power of the beamformer output during the stationary noise frames (ie when H s is true) [] Let Φ DsD s (kl) = E{D s (kl)d H s (kl)} denote the PSD matrix of the input stationary noise Then the power of the stationary noise at the beamformer output is minimized by solving the unconstrained optimization problem { [W(kl) ] HΦDsD min B(kl)H(kl) s (kl) H [ W(kl) B(kl)H(kl) ]} (7) A multichannel Wiener solution is given by [] H(kl) = [ B H (kl)φ DsD s (kl)b(k) ] B H (kl)φ DsD s (kl)w(kl) (8) In practice this optimization problem is solved by using the normalized least mean squares (LMS) algorithm [] H(kl +) µ h H(kl)+ = P est (kl) U(kl)Y (kl) H(kl) if H s is true otherwise (9) where P est (kl) α p P est (kl ) + ( ) α p U(kl) = P est (kl ) if H s is true otherwise () represents the power of the noise reference signals µ h is a step factor that regulates the convergence rate and α p is a smoothing parameter The fixed beamformer implements the alignment of the desired signal by applying a matched filter to the ATF ratios [6]: where W(kl) Ã(kl) Ã(kl) () Ã(kl) A(kl) A (kl) [ = A (kl) A (kl) AM(kl) ] T A (k) [ Ã (kl) Ã M (kl) ] T () denotes ATF ratios with A (kl) chosen arbitrarily as the reference ATF The blocking matrix B is aimed at eliminating the desired signal and constructing reference noise signals A proper (but not unique) choice of the blocking matrix is given by [6] Ã (kl) Ã (kl) Ã M(kl) B(kl) = () Hence for implementing both the fixed beamformer and the

An Integrated Beamforming and Postfiltering System 67 blocking matrix we need to estimate the ATF ratios In contrast to previous works[ 5 6] the system identification should be incorporated into the adaptive procedure since the ATFs are time varying In [6] the system identification procedure is based on the nonstationarity of the desired signal Here a modified version is introduced employing the already available time-frequency analysis of the beamformer and the decisions made by hypothesis testing From () and() we have the following input-output relation between Z (kl)andz i (kl): Z i (kl) = Ã i (kl)z (kl)+u i (kl) i = M () Accordingly φ ZiZ (kl) = Ã i (kl)φ ZZ (kl)+φ UiZ (kl) i = M (5) where φ ZiZ (kl) = E{Z i (kl)z (kl)} is the cross PSD between z i (t)andz (t) and φ UiZ (kl) is the cross PSD between u i (t) andz (t) The use of standard system identification methods is inapplicable since the interference signal u i (t) is strongly correlated to the system input z (t) However when hypothesis H is true that is when transient noise is absent the cross PSD φ UiZ (kl) becomes stationary Therefore φ UiZ (kl)maybereplacedwithφ UiZ (k) For estimating the ATF ratios Ã(kl) we need to collect several estimates of the PSD φ ZZ (kl) each of which is based on averaging several frames Let a segment define a concatenation of N frames for which the hypothesis H is true and let an interval contain R such segments Then the PSD estimation in each segment r (r = R) is obtained by averaging the periodograms over N frames: ˆφ (r) ZZ (kl) = N l r Z(kl)Z (kl) (6) where r represents the set of frames that belong to the rth segment Denoting by ε (r) i (kl) = ˆφ (r) U iz (kl) φ UiZ (k) the estimation error of the cross PSD between u i (t) andz (t) in the rthsegment (5) implies that ˆφ (r) Z iz (kl) = Ã i (kl) ˆφ (r) Z Z (kl)+φ UiZ (k)+ε (r) i (kl) i = M r = R (7) The least squares (LS) solution to this overdetermined set of equation is given by [6] Ã(kl)= ˆφ ZZ (kl) ˆφ ZZ (kl) ˆφ ZZ (kl) ˆφ ZZ (kl) ˆφ Z Z (kl) ˆφ ZZ (kl) where the average operation on β(kl)is defined by β(kl) R (8) R β (r) (kl) (9) r= Practically the estimates for ˆφ (r) ZZ (kl) (r = R)are recursively obtained as follows In each time-frequency bin (kl) we assume that R PSD estimates are already available (excluding initial conditions) Values of Ã(kl) arethus readyforuseinthenextframe(kl +)Framesforwhich hypothesis H is true are collected for obtaining a new PSD estimate ˆφ (R+) ZZ (kl): ˆφ (R+) ZZ (kl +)= ˆφ (R+) ZZ (kl)+ N Z(kl)Z (kl) () Acountern k is employed for counting the number of times () is processed (counting the number of H frames in frequency bin k) Whenever n k reaches N the estimate in segment R + is stacked into the previous estimates the oldest estimate (r = ) is discarded and n k is initialized The new R estimates are then used for obtaining a new estimate for the ATF ratios Ã(kl + ) for the next bin (kl +)Thisprocedure is active for all frames l enabling a real-time tracking of the beamformer Altogether an interval containing N R frames for which H is true is used for obtaining an estimate for Ã(kl) Special attention should be given for choosing this quantity On the one hand it should be long enough for stabilizing the solution On the other hand it should be short enough for the ATF quasistationarity assumption to hold during the interval We note that for frequency bins with low speech content the interval (observation time) required for obtaining an estimate for Ã(kl) might be very long since only frames for which H is true are collected HYPOTHESIS TESTING Generally the TF GSC output comprises three components: a nonstationary desired source component a pseudostationary noise component and a transient interference Our objective is to determine which category a given time-frequency bin belongs to based on the beamformer output and the reference signals Clearly if transients have not been detected at the beamformer output and the reference signals we can accept hypothesis H s In case a transient is detected at the beamformer output but not at the reference signals the transient is likely a source component and therefore we determine that H is true On the contrary a transient that is detected at one of the reference signals but not at the beamformer output is likely an interfering component which implies that H t is true In case a transient is simultaneously detected at the beamformer output and at one of the reference signals a further test is required which involves the ratio between the transient power at beamformer output and the transient power at the reference signals Let be a smoothing operator in the PSD Y(kl) = α s Y(kl ) + ( ) w α s b i Y(k i l) i= w () where α s ( α s ) is a forgetting factor for the smoothing

68 EURASIP Journal on Applied Signal Processing Yes Λ Y (kl) > Λ No No Λ U (kl)>λ Yes Yes Λ U (kl)>λ No No Ω(kl)<Ω low or γ s (kl)< Yes Yes Ω(kl)>Ω high and γ s (kl)>γ No H H r H t H s Figure : Block diagram for the hypothesis testing in time and b is a normalized window function ( w i= w b i = ) that determines the order of smoothing in frequency Let denote an estimator for the PSD of the background pseudostationary noise derived using the minima controlled recursive averaging approach [8 ] The decision rules for detecting transients at the TF GSC output and reference signals are Λ Y (kl) Y(kl) Y(kl) > Λ () { } Ui (kl) Λ U (kl) max > Λ () i M U i (kl) respectively where Λ Y and Λ U denote measures of the local nonstationarities (LNS) and Λ and Λ are the corresponding threshold values for detecting transients [] The transient beam-to-reference ratio (TBRR) is defined by the ratio between the transient power of the beamformer output and the transient power of the strongest reference signal: Y(kl) Y(kl) Ω(kl) = { } () max i M Ui (kl) U i (kl) Transient signal components are relatively strong at the beamformer output whereas transient noise components are relatively strong at one of the reference signals Hence we expect Ω(kl) to be large for signal transients and small for noise transients Assuming that there exist thresholds Ω high (k)andω low (k) such that Ω(kl) Ht Ω low (k) Ω high (k) Ω(kl) H (5) the decision rule for differentiating desired signal components from the transient interference components is H t : γ s (kl) orω(kl) Ω low (k) H : γ s (kl) γ and Ω(kl) Ω high (k) H r : otherwise (6) where γ s (kl) Y(kl) (7) Y(kl) represents the a posteriori SNR at the beamformer output with respect to the pseudostationary noise γ denotes a constant satisfying (γ s (kl) γ H s ) < ɛ for a certain significance level ɛandh r designates a reject option where the conditional error of making a decision between H t and H is high Figure summarizes a block diagram for the hypothesis testing The hypothesis testing is carried out in the timefrequency plane for each frame and frequency bin Hypothesis H s is accepted when transients have been detected neither at the beamformer output nor at the reference signals In case a transient is detected at the beamformer output but not at the reference signals we accept H On the other hand if a transient is detected at one of the reference signals but not at the beamformer output we accept H t In case a transient is detected simultaneously at the beamformer output and at one of the reference signals we compute the TBRR Ω(kl) and the a posteriori SNR at the beamformer output with respect to the pseudostationary noise γ s (kl) and decide on the hypothesis according to (6) MULTICHANNEL POSTFILTERING In this section we address the problem of estimating the time-varying PSD of the TF GSC output noise and present the multichannel postfiltering technique Figure describes a block diagram of the multichannel postfiltering Following the hypothesis testing an estimate ˆq(kl) fortheapriori signal absence probability is produced Subsequently we derive an estimate p(kl) (H Y U) for the signal presence probability and an estimate ˆλ d (kl) for the noise PSD

An Integrated Beamforming and Postfiltering System 69 Z M dimensional TF GSC beamforming Y U M dimensional Hypothesis testing Apriori signal absence probability estimation ˆq Signal presence probability estimation p Noise PSD estimation ˆλ d Spectral enhancement (OM LSA estimator) ˆX Figure : Block diagram of the multichannel postfiltering Finally spectral enhancement of the beamformer output is achieved by applying the OM LSA gain function [8] which minimizes the mean square error of the LSA under signal presence uncertainty Based on a Gaussian statistical model [] the signal presence probability is given by { p(kl) = + q(kl) ( ) ( ) } +ξ(kl) exp υ(kl) q(kl) (8) where ξ(kl) λ x (kl)/λ d (kl) istheapriorisnrλ d (kl) is the noise PSD at the beamformer output υ(kl) γ(kl)ξ(kl)/( + ξ(kl)) and γ(kl) Y(kl) /λ d (kl) is the a posteriori SNR The a priori signal absence probability ˆq(kl)issettoifsignalabsencehypotheses(H s or H t ) areacceptedandissettoifsignalpresencehypothesis(h ) is accepted In case of the reject hypothesis H r a soft signal detection is accomplished by letting ˆq(kl) beinverselyproportional to Ω(kl)andγ s (kl): ˆq(kl) = max { γ γ s (kl) γ TheaprioriSNRisestimatedby[8] Ω } high Ω(kl) (9) Ω high Ω low ˆξ(kl) = αg H (kl )γ(kl ) +( α)max { γ(kl) } () where α is a weighting factor that controls the trade-off between noise reduction and signal distortion and G H (kl) ξ(kl) ( ) +ξ(kl) exp e t υ(kl) t dt () is the spectral gain function of the LSA estimator when the signal is surely present [] An estimate for noise PSD is obtained by recursively averaging past spectral power values of the noisy measurement using a time-varying frequencydependent smoothing parameter The recursive averaging is given by ˆλ d (kl +)= α d (kl)ˆλ d (kl) + β [ α d (kl) ] Y(kl) () where the smoothing parameter α d (kl) is determined by the signal presence probability p(kl): α d (kl) α d + ( α d ) p(kl) () and β is a factor that compensates the bias when the signal is absent The constant α d ( <α d < ) represents the minimal smoothing parameter value The smoothing parameter is close to when the signal is present to prevent an increase in the noise estimate as a result of signal components It decreases when the probability of signal presence decreases to allow a fast update of the noise estimate The estimate of the clean signal STFT is finally given by where ˆX(kl) = G(kl)Y(kl) () G(kl) = { G H (kl) } p(kl) G p(kl) min (5) is the OM LSA gain function and G min denotes a lower bound constraint for the gain when the signal is absent The implementation of the integrated TF GSC and multichannel postfiltering algorithm is summarized in Algorithm Typical values of the respective parameters for a sampling rate of 8 khz are given in Table The STFT and its inverse are implemented with biorthogonal Hamming windows of 56 samples length ( milliseconds) and 6 samples frame update step (75% overlap between successive windows) 5 EXPERIMENTAL RESULTS In this section we compare under nonstationary noise conditions the performance of the proposed real-time system to an offline system consisting of a TF GSC and a singlechannel postfilter The performance evaluation includes objective quality measures a subjective study of speech spectrograms and informal listening tests A linear array consisting of four microphones with 5 cm spacing is mounted in a car on the visor Clean speech signals are recorded at a sampling rate of 8 khz in the absence of background noise (standing car silent environment) An interfering speaker and car noise signals are recorded while the car speed is about 6 km/h and the window next to the driver is slightly open (about 5 cm; the other windows are

7 EURASIP Journal on Applied Signal Processing Initialize variables at the first frame for all frequency bins k: G H (k) = γ(k) = ; P est (k) = U(k) ; Y(k) = Y(k) = ˆλ d (k) = Y(k) ; Let n k = ; % n k is a counter for H frames in frequency bin k For i = M U i (k) = U i (k) = U i (k) ; H i (k) = ; Ã i (k) = For all time frames l For all frequency bins k Compute the reference noise signals U(kl) using() and the TF GSC output Y(kl)using(5) Compute the recursively averaged spectrum of the TF GSC output and reference signals Y(kl)and U i (kl) using () and update the MCRA estimates of the background pseudostationary noise Y(kl)and U i (kl)(i = M) using [] Compute the local nonstationarities of the TF GSC output and reference signals Λ Y (kl)andλ U (kl)using()and() Using the block diagram for the hypothesis testing (Figure ) determine the relevant hypothesis; it possibly requires computation of the transient beam-to-reference ratio Ω(kl) using() and the a posteriori SNR at the beamformer output with respect to the pseudostationary noise γ s (kl)using(7) Update the estimate for the power of the reference signals P est (kl)using() In case of absence of transients (H s ) update the multichannel adaptive noise canceller H(kl +)using(9) In case of desired signal presence (H ) update the estimate ˆφ (R+) ZZ (kl +)using() and increment n k by If n k Nthenstore ˆφ (r+) ZZ (kl +)as ˆφ (r) ZZ (kl +)forr = R update the ATF ratios Ã(kl)using(8) and reset ˆφ (R+) ZZ (kl +)andn k to zero In case of H s or H t set the a priori signal absence probability ˆq(kl)toIncaseofH set ˆq(kl) to In case of H r compute ˆq(kl) according to (9) Compute the a priori SNR ˆξ(kl)using() the conditional gain G H (kl)using() and the signal presence probability p(kl)using(8) Compute the time-varying smoothing parameter α d (kl)using() and update the noise spectrum estimate ˆλ d (kl +) using () Compute the OM LSA estimate of the clean signal ˆX(kl)using()and(5) Algorithm : The integrated TF GSC and multichannel postfiltering algorithm Table : Values of parameters used in the implementation of the proposed algorithm for a sampling rate of 8 khz Normalized LMS α p = 9 µ h = 5 ATF identification N = R = Hypothesis testing α s = 9 γ = 6 Λ = 67 Λ = 8 Ω low = Ω high = b = [5 5 5] Noise PSD estimation α d = 85 β = 7 Spectral enhancement α = 9 G min = db closed) The input microphone signals are generated by mixing the speech and noise signals at various SNR levels in the range [ 5 ] db Offline TF GSC beamforming [6] is applied to the noisy multichannel signals and its output is enhanced using the OM LSA estimator [8]Theresultisreferredto as single-channel postfiltering output Alternatively the proposed real-time integrated TF GSC and multichannel postfiltering is applied to the noisy signals Its output is referred to as multichannel postfiltering output Two objective quality measures are used The first is segmental SNR in db defined by [5] SegSNR = L L l= log K n= K n= x (n + lk/) [ ] x(n + lk/) ˆx(n + lk/) (6) where L represents the number of frames in the signal and K = 56 is the number of samples per frame (corresponding to milliseconds frames and 5% overlap) The SNR at each frame is limited to perceptually meaningful range between 5 db and db [6 7] The second quality measure is log-spectral distance (LSD) in db which is defined by LSD = L L { l= K/+ K/ k= [ log X(kl) log ˆX(kl) ] } / (7)

An Integrated Beamforming and Postfiltering System 7 Segmental SNR [db] LSD [db] 5 5 5 5 Input SNR [db] 5 5 (a) 5 5 Input SNR [db] (b) Figure : (a) Average segmental SNR and (b) average LSD at ( ) microphone ( ) TF GSC output( ) single-channel postfiltering output (solid line) multichannel postfiltering output and ( ) theoretical limit postfiltering output where X(kl) max{ X(kl) δ} is the spectral power clipped such that the log-spectral dynamic range is confined to about 5 db (ie δ = 5/ max kl { X(kl) }) Figure shows experimental results obtained for various noise levels The quality measures are evaluated at the first microphone the offline TF GSC output and the postfiltering outputs A theoretical limit postfiltering achievable by calculating the noise PSD from the noise itself is also considered It can be readily seen that TF GSC alone does not provide sufficient noise reduction in a car environment owing to its limited ability to reduce diffuse noise [6] Furthermore multichannel postfiltering is considerably better than single-channel postfiltering A subjective comparison between multichannel and single-channel postfiltering was conducted using speech spectrograms and validated by informal listening tests Typical examples of speech spectrograms are presented in Figure 5 The noise PSD at the beamformer output varies substantially due to the residual interfering components of speech wind blows and passing cars The TF GSC output is characterized by a high level of noise Single-channel postfiltering suppresses pseudostationary noise components but is inefficient at attenuating the transient noise components By contrast the proposed system achieves superior noise attenuation while preserving the desired source components This is verified by subjective informal listening tests 6 CONCLUSION We have described an integrated real-time beamforming and postfiltering system that is particularly advantageous in nonstationary noise environments The system is based on the TF GSC beamformer and an OM LSA-based multichannel postfilter The TF GSC beamformer primary output and the reference noise signals are exploited for deciding between speech stationary noise and transient noise hypotheses The decisions are used for deriving estimators for the signal presence probability and for the noise PSD The signal presence probability modifies the spectral gain function for estimating the clean signal spectral amplitude It is worth mentioning that the postfilter is designed for suppressing the stationary noise as well as transient noise components that do not overlap with desired signal components in the timefrequency domain The overlapping part between desired and undesired transients is not eliminated by the postfilter to avoid signal distortion particularly since such noise components are perceptually masked by the desired speech [8] The proposed system was tested under nonstationary car noise conditions and its performance was compared to that of a system based on single-channel postfiltering While transient noise components are indistinguishable from desired source components when using a single-channel postfiltering approach the enhancement of the beamformer output by multichannel postfiltering produces a significantly reduced level of residual transient noise without further distorting the desired signal components We note that the computational complexity and practical simplifications of the proposed system were not addressed Here the main contribution is the incorporation of the hypothesis test results into the beamformer stage The hypotheses control the noise canceller branch of the beamformer as well as the ATF identification thus enabling real-time tracking of moving talkers The novel method has applications in realistic environments where a desired speech signal is received by several microphones In a typical office environment scenario the speech signal is subject to propagation through time-varying ATFs (due to talker movements) stationary noise (eg air conditioner) and nonstationary interferences (eg radio or another talker) By adaptively updating the ATF ratios estimates the TF GSC beamformer is consistently directed toward the desired speaker An interfering source that is spatially separated from the desired source is therefore associated with TBRR lower than the desired source Accordingly transient noise components at the beamformer output can be differentiated from the desired speech components and further suppressed by the postfilter

7 EURASIP Journal on Applied Signal Processing (a) (b) (c) (d) (e) (f) Figure 5: Speech spectrograms (a) Original clean speech signal at microphone (transcribed text: five six seven eight nine ) (b) Noisy signal at microphone (SNR = 9 db SegSNR = 6 db and LSD = 5 db) (c) TF GSC output (SegSNR = 5 db LSD = db) (d) Single-channel postfiltering output (SegSNR = 8 db LSD = 7 db) (e) Multichannel postfiltering output (SegSNR = db LSD = 6 db) (f) Theoretical limit (SegSNR = db LSD = db) ACKNOWLEDGMENT The authors thank the anonymous reviewers for their helpful comments REFERENCES [] M S Brandstein and D B Ward Eds Microphone Arrays: Signal Processing Techniques and Applications SpringerVerlag Berlin Germany [] K U Simmer J Bitzer and C Marro Post-filtering techniques in Microphone Arrays: Signal Processing Techniques and Applications chapter pp 9 6 Springer-Verlag Berlin Germany [] L J Griffiths and C W Jim An alternative approach to linearly constrained adaptive beamforming IEEE Transactions on Antennas and Propagation vol no pp 7 98 [] R Zelinski A microphone array with adaptive post-filtering for noise reduction in reverberant rooms in Proc th IEEE Int Conf Acoustics Speech Signal Processing pp 578 58 New York NY USA April 988 [5] R Zelinski Noise reduction based on microphone array with LMS adaptive post-filtering Electronics Letters vol 6 no pp 6 7 99 [6] S Fischer and K U Simmer An adaptive microphone array for hands-free communication in Proc th International Workshop on Acoustic Echo and Noise Control pp 7 Røros Norway June 995

An Integrated Beamforming and Postfiltering System 7 [7] S Fischer and K U Simmer Beamforming microphone arrays for speech acquisition in noisy environments Speech Communication vol no - pp 5 7 996 [8] S Fischer and K-D Kammeyer Broadband beamforming with adaptive post-filtering for speech acquisition in noisy environments in Proc nd IEEE Int Conf Acoustics Speech Signal Processing pp 59 6 Munich Germany April 997 [9] J Meyer and K U Simmer Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction in Proc nd IEEE Int Conf Acoustics Speech Signal Processing pp 67 7 Munich Germany April 997 [] K U Simmer S Fischer and A Wasiljeff Suppression of coherent and incoherent noise using a microphone array Annales des Télécommunications vol 9 no 7-8 pp 9 6 99 [] J Bitzer K U Simmer and K-D Kammeyer Multimicrophone noise reduction by post-filter and superdirective beamformer in Proc 6th International Workshop on Acoustic Echo and Noise Control pp Pocono Manor Pa USA September 999 [] J Bitzer K U Simmer and K-D Kammeyer Multimicrophone noise reduction techniques as front-end devices for speech recognition Speech Communication vol no - pp [] I Cohen and B Berdugo Microphone array post-filtering for non-stationary noise suppression in Proc 7th IEEE Int Conf Acoustics Speech Signal Processing pp 9 9 Orlando Fla USA May [] I Cohen Multi-channel post-filtering in non-stationary noise environments to appear in IEEE Trans Signal Processing [5] S Gannot and I Cohen Speech enhancement based on the general transfer function GSC and post-filtering submitted to IEEE Trans Speech and Audio Processing [6] S Gannot D Burshtein and E Weinstein Signal enhancement using beamforming and non-stationarity with applications to speech IEEE Trans Signal Processing vol 9 no 8 pp 6 66 [7] D Burshtein and S Gannot Speech enhancement using a mixture-maximum model IEEE Trans Speech and Audio Processing vol no 6 pp 5 [8] I Cohen and B Berdugo Speech enhancement for nonstationary noise environments Signal Processing vol 8 no pp 8 [9] C W Jim A comparison of two LMS constrained optimal array structures Proceedings of the IEEE vol 65 no pp 7 7 977 [] B Widrow and S D Stearns Adaptive Signal Processing Prentice-Hall Englewood Cliffs NJ USA 985 [] S Nordholm I Claesson and P Eriksson The broadband Wiener solution for Griffiths-Jim beamformers IEEE Trans Signal Processing vol no pp 7 78 99 [] I Cohen Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging IEEE Trans Speech and Audio Processing volno5pp 66 75 [] Y Ephraim and D Malah Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator IEEE Trans Acoustics Speech and Signal Processing vol no 6 pp 9 98 [] Y Ephraim and D Malah Speech enhancement using a minimum mean-square error log-spectral amplitude estimator IEEE Trans Acoustics Speech and Signal Processing vol no pp 5 985 [5] S R Quackenbush T P Barnwell and M A Clements Objective Measures of Speech Quality Prentice-Hall Englewood Cliffs NJ USA 988 [6] JRDellerJHLHansenandJGProakis Discrete-Time Processing of Speech Signals IEEE Press New York NY USA nd edition [7] P E Papamichalis Practical Approaches to Speech Coding Prentice-Hall Englewood Cliffs NJ USA 987 [8] T F Quatieri and R Dunn Speech enhancement based on auditory spectral chance in Proc 7th IEEE Int Conf Acoustics Speech Signal Processing pp 57 6 Orlando Fla USA May Israel Cohen received the BS (summa cum laude) MS and PhD degrees in electrical engineering in 99 99 and 998 respectively all from the Technion Israel Institute of Technology From 99 to 998 he was a Research Scientist at RAFAEL research laboratories Israel Ministry of Defense From 998 to he was a Postdoctoral Research Associate at the Computer Science Department of Yale University New Haven Conn USA Since he has been a Senior Lecturer with the Electrical Engineering Department Technion Israel His research interests are multichannel speech enhancement image and multidimensional data processing anomaly detection and wavelet theory and applications Sharon Gannot received his BS degree (summa cum laude) from the Technion Israel Institute of Technology Israel in 986 and the MS (cum laude) and PhD degrees from Tel Aviv University Tel Aviv Israel in 995 and respectively all in electrical engineering Between 986 and 99 he was the Head of a research and development section in R&D center of the Israel Defense Forces In he held a postdoctoral position at the Department of Electrical Engineering (SISTA) at Katholieke Universiteit Leuven Belgium From to he held a research and teaching position at the Signal and Image Processing Lab (SIPL) Faculty of Electrical Engineering The Technion Israel Institute of Technology Israel Currently he is affiliated with the School of Engineering Bar-Ilan University Israel Baruch Berdugo received the BS (cum laude) and MS degrees in electrical engineering in 978 and 986 respectively and the PhD degree in biomedical engineering in all from the Technion Israel Institute of Technology From 978 to 98 he served in the Israeli Navy as an Engineer From 98 to 997 he was a Research Scientist at RAFAEL research laboratories Israel Ministry of Defense From 987 to 997 he was Head of RAFAEL s R&D group of the acoustic product line In 998 he joined Lamar Signal Processing Ltd as a Vice President R&D and since he has been the Chief Executive Officer His research interests include multichannel speech enhancement and direction finding