Aalborg Universitet Variable Speech Distortion Weighted Multichannel Wiener Filter based on Soft Output Voice Activity Detection for Noise Reduction in Hearing Aids Ngo, Kim; Spriet, Ann; Moonen, Marc; Wouters, Jan; Jensen, Søren Holdt Published in: Proceedings of the th International Workshop on Acoustic Echo and Noise Control Publication date: 2008 Document Version Publisher's PDF, also known as Version of record Link to publication from Aalborg University Citation for published version APA): Ngo, K., Spriet, A., Moonen, M., Wouters, J., & Jensen, S. H. 2008). Variable Speech Distortion Weighted Multichannel Wiener Filter based on Soft Output Voice Activity Detection for Noise Reduction in Hearing Aids. In Proceedings of the th International Workshop on Acoustic Echo and Noise Control International Workshop on Acoustic Echo and Noise Control, University of Washington campus in Seattle. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.? Users may download and print one copy of any publication from the public portal for the purpose of private study or research.? You may not further distribute the material or use it for any profit-making activity or commercial gain? You may freely distribute the URL identifying the publication in the public portal? Take down policy If you believe that this document breaches copyright please contact us at vbn@aub.aau.dk providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from vbn.aau.dk on: May 0, 207
VARIABLE SPEECH DISTORTION WEIGHTED MULTICHANNEL WIENER FILTER BASED ON SOFT OUTPUT VOICE ACTIVITY DETECTION FOR NOISE REDUCTION IN HEARING AIDS KimNgo,AnnSpriet,2,MarcMoonen,JanWouters 2 andsørenholdtjensen 3 KatholiekeUniversiteitLeuven,ESAT-SCD,KasteelparkArenberg0,B-300Leuven,Belgium 2 KatholiekeUniversiteitLeuven,ExpORL,O.&N2,Herestraat9/72,B-3000Leuven,Belgium 3 AalborgUniversity,Dept.ElectronicSystems,NielsJernesVej2,DK-9220Aalborg,Denmark ABSTRACT This paper presents a variable Speech Distortion Weighted Multichannel Wiener FilterSDW-MWF) based on soft output Voice Activity DetectionVAD) which is used for noise reduction in hearing aids. A traditional SDW-MWF uses a fixed parameter to trade-off between noise reduction and speech distortion. Consequently, the improvement in noise reduction comes at the cost of a higher speech distortion. WithavariableSDW-MWFthegoalistoimprovethenoise reduction without increasing the speech distortion. A soft output VAD is used to distinguish between speech, noise and to incorporate a variable trade-off. In speech dominant segments it is desirable to have less noise reduction to avoid speech distortion. In noise dominant segments it is desirable to have as much noise reduction as possible. Experimental results with a variable SDW-MWF show a SNR improvement with a lower speech distortion compared to a SDW-MWF. Index Terms Multichannel Wiener Filter, Noise reduction, Speech distortion, Soft output VAD.. INTRODUCTION Background noisemultiple speakers, traffic etc.) is a significant problem for hearing aid users and is especially damaging to speech intelligibility. To overcome this problem both single-channel and multichannel noise reduction schemes have been proposed. The limitation of single-channel noise reduction is that only temporal and spectral signal characteristics are used. Multichannel noise reduction in addition exploits the spatial diversity of the speech and the noise signals. The objective of noise reduction algorithms is to maximally reduce the noise while minimizing speech distortion. A known multichannel noise reduction technique is This research work was carried out at the ESATlaboratory of the Katholieke Universiteit Leuven, in the frame of the Marie-Curie Fellowship EST-SIGNAL program http://est-signal.i3s.unice.fr) under contract No. MEST-CT-200-027, and the Concerted Research Action GOA- AMBioRICS. Ann Spriet is a postdoctoral researcher funded by F.W.O.- Vlaanderen. The scientific responsibility is assumed by its authors. the Speech Distortion Weighted Multichannel Wiener Filter SDW-MWF)[][2] that allows for a trade-off between noise reduction and speech distortion. However, the improvement innoisereductioncomesatthecostofahigherspeechdistortion. Recently, soft output Voice Activity DetectionVAD) has been used in speech enhancement for gain modification and noise spectrum estimation[3][][]. The concept is toincreasethegainwhenthereisahighprobabilitythat speechispresentandtoapplyalowergaininthepresence ofnoisei.e. whenthereisalowerprobabilitythatspeech ispresent. SoftoutputVADhasalsobeenusedforcontrolling the compression gain when noise reduction and dynamic range compression are integrated[]. Here, the soft output VADwasusedtodistinguishbetweenthespeechandthe noise dominant segments in order not to amplify the residual noise after the noise reduction. This paper presents a variable SDW-MWFbasedonsoftoutputVADwhichallowsfora variable trade-off between noise reduction and speech distortion in the SDW-MWF procedure. Thepaperisorganisedasfollows. InSection2thesignal modelandthesdw-mwfisdescribed. InSection3the SDW-MWFisextendedwithasoftoutputVAD.InSection experimental results are presented. The work is summarized in Section. 2. MULTICHANNEL WIENER FILTER 2.. Signal model Let X i f), i =,..., Mdenotethefrequency-domainmicrophone signals X i f) = Xi s f) + Xn i f) ) where f is the frequency domain variable and the superscripts sand nareusedtorefertothespeechandthenoisecontributionofasignal,respectively.letxf) C M bedefined as the stacked vector Xf) = [X f) X 2 f)... X M f)] T 2) = X s f) +X n f) 3)
where the superscript T denotes the transpose. Defining Hi s f) as the acoustic transferfunctionfrom the speech source Sf)tothei-thmicrophone,X s f)canbewrittenas Xf) = Hf)Sf) +X n f) ) X s f) = H s f)sf) = H s f)x s f) ) with H s f)thevectorwithtransferfunctionratiosrelativeto the first microphone. In addition, we define the noise and the speech correlation matrix as R n f) = ε{x n f)x n,h f)} ) R s f) = ε{x s f)x s,h f)} = P s X i f) H s f) H s,h f) 7) where ε{} denotes the expectation operator, H denotes Hermitiantransposeand P s X i f)isthepowerspectraldensity PSD) of the speech in the i-th microphone signal. The MWF optimally estimates a desired signal, based on a Minimum Mean Square ErrorMMSE) criterion. Here, the desiredsignalisthespeechcomponent X s f)inthefirstmicrophone. The MWF has been extended to the SDW-MWF that allows for a trade-off between noise reduction and speech distortion using a trade-off parameter [][2]. Assume that the speech and the noise signals are statistically independent, then the optimal SDW-MWF that provides an estimate of the speech component in the first microphone is given by Wf) = R s f) + R n f)) R s f)e 8) wherethe M vectore equalsthefirstcanonicalvector definedase = [ 0... 0] T.Thesecond-orderstatistics ofthenoiseareassumedtobestationarywhichmeansr s f) canbeestimatedasr s f) =R x f) R n f)wherer x f) andr n f)areestimatedduringperiodsofspeech+noiseand periods of noise-only, respectively. For = the SDW-MWF solutionreducestothemwfsolutionwhichfor > the residualnoiselevelwillbereducedatthecostofahigher speechdistortion. Theoutput Zf)oftheSDW-MWFcan thenbewrittenas Zf) =W H f)xf). 9) 3. MULTICHANNEL WIENER FILTER WITH SOFT OUTPUT VAD Traditionally, the trade-off parameter is set to a fixed value andtheimprovementinnoisereductioncomesatthecostof a higher speech distortion. Furthermore, the speech+noise segments and the noise-only segments are weighted equally, whereasitisdesirabletohavemorenoisereductioninthe noise-only segments compared to the speech+noise segments. With a variable SDW-MWF it is possible to distinguish between the speech+noise segments and noise-only segments usingasoftoutputvad.thesoftoutputvadcanbeimplemented according to[3][][]. The variable SDW-MWF is derived from the MSE criterion asthe frequency parameter fisomittedinthesequelforthesakeofconciseness) argmin W ε{ Xs W H X 2 } 0) argmin W ε{p Xs W H X 2 + p) W H X n 2 } ) where pistheprobabilitythatspeechispresentinagiven signal segment. The solution is then given by p ε{x s X s,h } + p ε{x n X n,h } + p) ε{x n X n,h }) p ε{x s X s,h }2) p ε{x s X s,h } + ε{x n X n,h }) p ε{x s X s,h }3) the variable SDW-MWF can then be written as R s + p ) R n) R s e. ) ComparedtoEq.8withthefixed theterm p isnowchangingbasedonthesoftoutputvad.theconceptgoesasfollows If p = 0,i.e.theprobabilitythatspeechispresenceis zero, the variable SDW-MWF will attenuate the noise byapplyingw 0. If p = thevariablesdw-mwfsolutioncorresponds to the MWF solution. If 0 < p < thereisatrade-offbetweennoisereduction and speech distortion. 3.. Spatial and Spectral Filtering For further analysis the SDW-MWF can be decomposed into aspatialfilterandaspectralfilter[7][8].assumingthatr s isrankandusingthedefinitionsineq.7wecanwritethe optimal filter as P s X H s H s,h + p ) R n) P s X H s. ) Applying the matrix inversion lemma the optimal filter can then be decomposed into
where H s P s ) Rn X H s,h R n H s PX s }{{} + PX n }{{ } TF-GSC Postfilter P n X = ) H s,h p )Rn H s 7) is the output noise power from the Transfer Function Generalized Sidelobe CancellerTF-GSC) beamformer. This shows that the residual noise after the beamformerspatial filter) can be further suppressed by the postfilterspectral filter). The beamformer reduces the noise while keeping the speech component in the first microphone signal undistorted. The softoutputvad p onlyaffectsthespectralpostfiltering. The postfilter can be viewed as a single-channel Wiener filter where each frequency component is attenuated based on the signal-to-noise ratio.. EXPERIMENTAL RESULTS In this Section, experimental results for the variable SDW- MWF based on soft output VAD are presented and compared tosdw-mwfwithfixedvaluesfor... Set-up and performance measures We have performed simulations with a 2-microphone behindthe-earhearingaid.thespeechislocatedat 0 andthetwo multi-talkerbabblenoisesourcesarelocatedat 20 and 80. To assess the noise reduction performance the intelligibilityweighted signal-to-noise ratiosnr)[9] is used which is defined as SNR intellig = i I i SNR i,out SNR i,in ) 8) where I i isthebandimportancefunctiondefinedin[0]and SNR i,out andsnr i,in representstheoutputsnrandtheinputsnrindb)ofthei-thband,respectively.forthespeech distortion an intelligibility weighted spectral distortion measureisuseddefinedas SD intellig = i I i SD i 9) with SD i theaveragespectraldistortiondb)inthei-thone third octave band, SD i = 2 / 2 / )fi c 2 / f c i 2 / f c i 0 log 0 G s f) df 20) SNR db) 7. 7.. 0dB /p. 2 2. 3 a) SNR improvement for variable SDW-MWF and different settings of SD db)... 0dB 3. /p 3. 2 2. 3 b) Speech distortion for variable SDW-MWF and different settings of Fig.. A comparison of variable SDW-MWF with SDW- MWFwithfixedsettingsof withthecenterfrequencies f c i and Gs f)thepowerspectral transfer function for the speech component from the input to the output of the noise reduction algorithm..2. Variable vs. fixed SDW-MWF In the first experiment the variable SDW-MWF is compared tosdw-mwfwithdifferentvaluesof atinputsnr0db. TheSNRimprovementisshowninfigurea).TheSNRimprovement for the SDW-MWF with different s are shown withthesolidlineandherethesnrimprovementisasexpected increasing with >. On the other hand, the speech distortion is also increased which is shown in figure b). The variable SDW-MWF shows that the SNR improvement is achieved at lower speech distortion. The reason for this is that the noise dominant segments are suppressed more compared to the speech dominant segments, resulting in an improved SNR at lower speech distortion. In the second experiment the variable SDW-MWF is comparedtosdw-mwfwith = atinputsnr-dbtodb. The SNR improvement for different is shown in
SNR db) 2 0 8 2 0 = /p 2 0 a) SNR improvement for variable SDW-MWF at different SD db) 9 8 7 3 2 = /p 0 b) Speech distortion for variable SDW-MWF at different input SNR Fig. 2. A comparison of variable SDW-MWF with SDW- MWFwith = atdifferentinputsnr figure 2a). The solid line shows the SNR improvement for = which shows less SNR improvement compared to the variable SDW-MWF. As expected the speech distortion for = isstilllowercomparedtothevariablesdw-mwfat differentinputsnr.itisworthnotingthatatlowinputsnr like-dbthesnrimprovementcomesatthecostofamuch higher speech distortion. Whereas, at high e.g. db the SNR improvement is achieved with a speech distortionclosetothecasewith =.. CONCLUSION In this paper, we have presented a variable SDW-MWF that makes a trade-off between noise reduction and speech distortion based on the soft output VAD i.e. probability that speech is present in a given signal segment. Through simulations wehaveshownthatwithavariablesdw-mwfthenoisereduction performance can be improved without increasing the speech distortion compared to the SDW-MWF with a fixed trade-off parameter.. REFERENCES []S. Doclo, A. Spriet, J. Wouters, and M. Moonen, Frequency-domain criterion for the speech distortion weighted multichannel wiener filter for robust noise reduction, Speech Communication, vol. 7-8, pp. 3, July 2007. [2] A. Spriet, M. Moonen, and J. Wouters, Stochastic gradient based implementation of spatially pre-processed speech distortion weighted multi-channel wiener filtering for noise reduction in hearing aids, IEEE Transactions on Signal Processing, vol. 3, no. 3, pp. 9 2, Mar. 200. [3] R. J. McAulay and M. Malpass, Speech enhancement using a soft-decision noise suppression filter, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-28, no. 2, pp. 37, Apr. 980. [] I. Cohen, Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator, IEEE Signal Processing Letters, vol. 9, no., pp. 3, Apr. 2002. []S.GazorandW.Zhang, Asoftvoiceactivitydetector based on a laplacian-gaussian model, IEEE TransactionsonSpeechandAudioProcessing,vol.,no., pp. 98 0, Sept. 2003. []K.Ngo,S.Doclo,A.Spriet,M.Moonen,J.wouters,and S.H. Jensen, An integrated approach for noise reduction and dynamic range compression in hearing aids, accepted for publication in Proc. th European Signal Processing ConferenceEUSIPCO), Lausanne, Switzerland, Aug. 2008. [7]L.GriffithsandC.Jim, Analternativeapproachtolinearly constrained adaptive beamforming, IEEE Transactions on Antennas and Propagation, vol. 30, no., pp. 27 3, Jan 982. [8] S. Gannot and I. Cohen, Speech enhancement based on the general transfer function gsc and postfiltering, IEEE Trans. on Speech and Audio Processing, vol. 2, no.,pp. 7,Nov.200. [9]J. E. Greenberg, P. M. Peterson, and P. M. Zurek, Intelligibility-weighted measures of speech-tointerference ratio and speech system performance, J. Acoustic. Soc. Am., vol. 9, no., pp. 3009 300, Nov. 993. [0] Acoustical Society of America, ANSI S3.-997 American National Standard Methods for calculation of the speech intelligibility index, June 997.