AN INTEGRATED APPROACH FOR NOISE REDUCTION AND DYNAMIC RANGE COMPRESSION IN HEARING AIDS

Size: px

Start display at page:

Download "AN INTEGRATED APPROACH FOR NOISE REDUCTION AND DYNAMIC RANGE COMPRESSION IN HEARING AIDS"

Gabriel O’Brien’
5 years ago
Views:

1 AN INTEGRATED APPROACH FOR NOISE REDUCTION AND DYNAMIC RANGE COMPRESSION IN HEARING AIDS KimNgo 1,SimonDoclo 1,2,AnnSpriet 1,3,MarcMoonen 1,JanWouters 3 andsørenholdtjensen 4 1 KatholiekeUniversiteitLeuven Department of Electrical Engineering, ESAT-SCD Kasteelpark Arenberg, B-31 Leuven, Belgium kim.ngo@esat.kuleuven.be 3 KatholiekeUniversiteitLeuven Department of Neurosciences, ExpORL, O.& N2 Herestraat 49/721, 3 Leuven, Belgium Jan.Wouters@med.kuleuven.be 2 NXPSemiconductors Corporate I&T-Research, Sound and Systems Group Interleuvenlaan, 31 Leuven, Belgium simon.doclo@nxp.com 4 AalborgUniversity Department of Electronic Systems, MISP Niels Jernes Vej A6-3, 92 Aalborg, Denmark shj@es.aau.dk ABSTRACT Hearing aids typically use a serial concatenation of Noise Reduction(NR) and Dynamic Range Compression(). However, the in such a concatenation negatively affects the performance of thenrstage:theresidualnoiseafternrisamplifiedbythe, resulting in a signal-to-noise-ratio(snr) degradation. In this paper, we present an integrated solution for NR and. The solution isbasedonanestimateoftheamountofspeechandnoiseineach timesegment.incasethespeechisdominant,thenrislessactive anditisdesirabletohaveasmuchaspossible,whereasina noisedominantsegmentthenrismoreactiveandtheideaisnotto compromise this operation by applying. Experimental results confirmed that a serial concatenation of NR and degrades the SNR improvement, and that the proposed solution offers a better SNR improvement compared to a serial concatenation. 1. INTRODUCTION Reduced audibility and reduced dynamic range between threshold anddiscomfortlevelaresomeoftheproblemsthatpeoplewitha sensorineural hearing loss are dealing with. Furthermore, background noise(multiple speakers, traffic etc.) is a great problem and is especially damaging to speech intelligibility. It is known that hearing impaired people need a higher signal-to-noise-ratio(snr) to communicate effectively[1]. Therefore, Noise Reduction(NR) and Dynamic Range Compression() are basic components in hearing aids nowadays[2], but generally these components are developed and evaluated independently of each other. Although sophisticatedalgorithmsfornrandexistthereisstillanopen questionastohowthesealgorithmshouldbecombinedintoanintegrated approach, which has not received a lot of attention so far. The interesting issue now is to analyse undesired effects when these algorithms operate together in an integrated scheme. The integration of hearing aid algorithms is a challenging task since each algorithm can counteract and limit the functionality of other algorithms. When NR and are serially concatenated, undesired interaction effects typically occur, since each algorithm serves different purposes. For instance, can counteract NR by amplifying the residual noise after NR, which consequently degrades the SNRanddefeatsthepurposeofusingNR.AnintegrationofsinglechannelNRandwasproposedin[3]whereaminimummean square error and a maximum a posteriori optimal estimator are proposed that incorporate in the derivation of the NR algorithm. ThisresearchworkwascarriedoutattheESATlaboratoryofthe Katholieke Universiteit Leuven, in the frame of the Marie-Curie Fellowship EST-SIGNAL program( under contract No. MEST-CT , and the Concerted Research Action GOA- AMBioRICS. Ann Spriet is a postdoctoral researcher funded by F.W.O.- Vlaanderen. The scientific responsibility is assumed by its authors. Another issue is the evaluation of such integrated schemes where the lack of an overall evaluation criterion indeed makes the integration more difficult. In the evaluation the crucial question willbeastowhicheffectsaremostdamagingtospeechintelligibility, e.g. the amount of background noise or the audibility. In this work, objective quality measures are used to evaluate the integrated scheme, such as SNR and signal distortion measures. Subjective evaluation using hearing aid users is not included in this work. InthispaperwewillfocusonintegratingNRandandinvestigate if any undesired effect occurs when combining NR and. The integrated scheme applies to both single-channel and multi-channel NR. The paper is organised as follows. In Section 2 the standard scheme is introduced. Section 3 discusses the integration of NR and. In Section 4 experimental results are presented. The work is summarized in Section DYNAMIC RANGE COMPRESSION In this Section, we briefly introduce the basic concept behind. Theroleofistomapthewidedynamicrangeofaspeech signal into the reduced dynamic range of a hearing impaired listener. Thebasicconceptofistoautomaticallyadjustthegainbased ontheintensityleveloftheinputsignal. Segmentswithahigh intensity level are attenuated while segments with a low intensity are amplified. This makes weak sounds audible while loud sounds are not becoming uncomfortably loud. is typically defined by the following parameters: Compression threshold(ct). Compression ratio(cr). Attack and release time. HearingaidgainG db. CTisdefinedindBandisthepointwherebecomesactive. BelowCTthegainislinearandabovetheCT,isactivei.e.the gain is reduced. CR determines the degree of compression. A CR of2(i.e.2:1)meansthatforevery2dbincreaseintheinputsignal, theoutputsignalincreasesby1db.theattackandreleasetimeis defined in milliseconds and specifies how fast the gain is changed according to changes in the input signal. The attack time is defined asthetimeittakesforthecompressortoreacttoanincreaseininput signal level. The release time is the time taken for the compressor toreacttoadecreaseininputlevel. ThehearingaidgainG db is the maximum amount of amplification in db which is specified by thelinearparti.e.belowctofthecurve.acurvewith CR=2,CT=3dBandG db =dbisshowninfigure1. WedefinePin,dB andp out,db astheinputandoutputpowerin db of the, respectively, as in,db =log ( P in (ω,k) 2 ) (1)

2 NR 7 6 (a) Existing method for combining NR and (serial concatenation) CR NR CT Figure1:curve(CRdefineshowtheslopeischangedandCT isthepointatwhichtheslopechanges). and out,db =log ( P out (ω,k) 2 ) (2) attimeinstantkandfrequency ω=2πf.thecurveisdefined basedonalinearcurveandacompressioncurveineq.3andeq. 4, respectively: P linear,db = in,db +G db (3) P compression,db = CT + 1 CR (P in,db CT)+G db (4) wherepin,db istheinputpowerindb.thecurveisgivenby Eq.5 out,db = { Plinear,dB ifpin,db <CT P compression,db ifpin,db CT (5) ThegainindBiscalculatedastheoutputlevelminustheinput level, i.e. G,dB = out,db P in,db (6) Theattackandreleasetimearethenappliedtothegain G,dB usingafirst-orderrecursiveaveragingfilter,beforethe gainisappliedtotheinputpin,db. Thepowerestimationintheschemeusedinthispaper isbasedonindividualfftbins. Ifitisdesiredtohavethe working on a specific number of frequency bands e.g. critical bands, thiscanbeachievedbycombiningthefftbins(e.g. byusing individual FFT bins at low frequencies and combining FFT bins at higher frequencies) as is typically done in hearing aid applications [4]. 3. INTEGRATION OF NOISE REDUCTION AND DYNAMIC RANGE COMPRESSION ThegoalofaNRschemeistoimprovespeechintelligibilitybyreducing the effects of any noise source(e.g. multiple speakers, traffic etc). The NR can be a single-channel or a multi-channel algorithm. Theontheotherhandamplifiessignalsbasedontheintensity level and makes no distinction between speech or noise. This means that noise already attenuated by the NR algorithm can be amplified bythe.thisisanundesiredeffectleadingtoadegradation ofthesnr,sincetheresidualnoiseisamplifiedandthespeechis attenuated. This is one of the crucial problems at hand when cascadingnrand.anexistingmethodforcascadingnrand is a simple serial concatenation, depicted in figure 2(a). Dual (b) Novel method for combining NR and (Dual-) Figure2:AnexistingandanovelapproachforcombiningNRand 3.1 Integration Concept Weintroduceadual-concepttointegrateNRand.The goal is to control the without counteracting the NR performance. The integrated approach is depicted in figure 2(b). The basicideabehindthis,istoidentifytheamountofspeechandnoise presentinasignalsegment.theamountofspeechandnoiseisestimatedbasedontheinputandtheoutputofthenr(seealsosection 3.3).Theappliedgaininthedual-dependsontheamountof speech and noise. This distinction between speech and noise makes itpossibletoapplytomakethespeechaudiblewithoutamplifyingtheresidualnoise.notethatastandardschemeisbased only on the input intensity level and does not make any distinction between speech and noise. Thebasicconceptistoapplyadifferenttothespeechand the noise segments. We therefore introduce two curves which aredefinedsimilarlyasineq.3-5. P s,db -speech(speechdominantcase). P n,db -noise(noisedominantcase). Thesuperscriptssandnareusedtorefertospeechandnoise,respectively. In the case where speech is dominant we apply the speechbasedonp s,db. Ifnoiseisdominantitisundesirabletoamplifythenoiseandthereforealowergainisapplied i.e.weapplythenoisebasedonp n,db.inthecasewhere speechandnoisearepresentatthesametimewedefineaweighted sumofthetwocurvesgivenineq.7. P dual,db = (1 ) P n,db + Ps,dB (7) Here isanestimateoftheprobabilitythatspeechornoiseis present. If =1thereisnodual-andtheisbasedon P,dB s. As <1dual-isactivewithatradeoffbetween P,dB s andpn,db.thedual-gainis, G dual,db =P dual,db in,db () whichisappliedtotheoutputofthenralgorithm. 3.2 Speech and Noise curves The basic extension from a standard scheme to the dual- isthatinthedual-weusetwocurves. Thereasonthat weneedtwocurveisthatwewishtotradeoffbetweenspeech andnoisesegments.p s,db andpn,db arebasciallydefinedby chosingdifferentvaluesforthect,crandg db definedineq.3 andeq.4.thedual-curvesareshowninfigure3.thetradeoff parameter betweenthetwocurvesaredefinedineq.7. Therationalebehindthenoisecurveisthatithasalower gaincomparedtothespeechcurve,asweindeedwishtoapply

3 1 6 4 s n (a) Dual-1 s n (b) Dual-2 Figure 3: Two different approaches for dual-. alowergaintothenoisesegments.asmentionedwewanttoapply without compromising the NR but setting the noise curvetoolowwemightcompromisetheoperationofthe.the goalofdual-isthustofindapropertradeoffbetweennrand. For dual-, we introduce two different approaches which differinthewayp,db n isdefined. Inthefirstapproachdual- 1,thenoisecurveisdefinedbyalinearcurvewitha lineargaing n db ofdb.thisisdepictedinfigure3(a).thedashed linerepresentsthenoisecurvep,db n andthesolidlinerepresentsthespeechcurvep,db s.asmentioned,thetradeoff betweenthesetwocurvesisdefinedwiththeparameter. With dual-1theimpactof isreducedwhenthecrisincreased, sincebeyondtheintersectionbetweenp,db s andpn,db the dual- concept is not active. Dual-1 can have advantages whenthenoiseisdominantorhasalowintensitylevel,whichis wheredual-1hasthelargesttradeoff,orifalowcrisdesired. Thesecondapproachisdual-2.HereP,dB n hasthesame CRandCTasP,dB s butisshiftedtowardslowergainsi.e.gs db > G n db,seefigure3(b).indual-2therangefor iskeptconstant whenthecrisincreased.ifthegaing n db issetclosertogs db the integration is approaching a serial concatenation of NR and. 3.3 Speech and Noise Detection Theparameter thatisusedtotradeoffbetweenspeechandnoise isbasedonthepowerratiobetweentheoutputandtheinput ofthenralgorithm,definedineq.9, α NR (ω,k) = Ps out,nr (ω,k)+pn out,nr (ω,k) Pin,NR s (ω,k)+pn in,nr (ω,k) (9) 1 P n,db P dual,db P s,db α min α max 1 α NR Figure 4: Tradeoff parameters for dual- where Pout,NR s (ω,k) and Pn out,nr (ω,k) are the speech and the noisecomponentsattheoutputofthenrandpin,nr s (ω,k)and Pin,NR n (ω,k)arethespeechandthenoisecomponentsattheinput ofthenr. α NR isusedtodeterminewhetherspeechornoiseisdominant. The NR algorithm preserves speech while attenuating the noise. Hence,ifspeechisdominant α NR willapproachone.ifweassume that the noise is dominant the following statements can be written, Pout,NR s (ω,k) Ps in,nr (ω,k) Pn in,nr (ω,k),pn out,nr (ω,k) () so that, α NR (ω,k) Pn out,nr (ω,k) P n in,nr (ω,k) (11) which is considerably smaller than one. This means that we can evaluate the NR performance by observing the noise power before andafternr.ifthereisnoiseatlowerinputsnrthenralgorithm ismoreactivewhichmeansthat α NR issmall. Inthiscase should not counteract the NR and amplify the residual noise. On theotherhand,ifthereismorespeechpresentathighersnrthe NRislessactiveand α NR willbeclosertoone. Inthiscase,itis desirabletoapply.basically,wewanttotradeoffbetweennr andbasedonthespeechandthenoisecontributiondefinedby α NR. Thenextstepinthedual-approachistomap α NR into based on a threshold function. The threshold function can also be considered as a soft Voice Activity Detector which is illustrated in figure 4. Thethresholdfunctioniscontrolledbyparameters α min and α max,andisgivenineq. = =1 if α NR α max = if α NR α min = α NR α min α max α min otherwise () If α NR islargerthan α max accordingtop s,db isapplied, andifitisbelow α min theisbasedonp n,db. Inbetween thereisatradeoffaccordingtothepowerratiobetweentheoutputandinputofthenr.when α min and α max arebothsettozero the integration corresponds to a serial concatenation, where NR is performed before. Tosummarize,wefirstestimate α NR whichreflectstheamount ofspeechandnoiseintheinput. α NR isthenusedtoestimate the tradeoff parameter. Finally, a dual- gain is computed basedonaspeechandanoisecurve.theobjectivehereisthat the speech is amplified/compressed while the noise is attenuated or at least amplified less than the speech. 4. EXPERIMENTAL RESULTS In this Section, experimental results for the integrated approach for NR and are presented.

4 4.1 Set-up and performance measures The multi-microphone NR scheme used in this paper is the wellknown Generalized Sidelobe Canceler(GSC)[5] consisting of a fixed spatial pre-processor and a multichannel adaptive noise canceler(anc).thenralgorithmimplementedhereisbasedona Frequency Domain Adaptive Filter(FDAF)[6] using a Weighted Overlap-Add(WOLA) analysis/synthesis structure. We have performed simulations with a 2-microphone behind-the-ear hearing aid.thespeechandthenoisesourcesarelocatedat and1,respectively. The speech signals consist of sentences from the HINTdatabase[7]. The noise signal consist of a multi-talker babble from Auditec[],atdBinputSNR.Theinputlevelwassetto65dB SPLatthehearingaidmicrophones. TheFFTframesizewasset to256(i.e.,16ms),ata16khzsamplingrate,with7.5%overlap between sucessive frames. Each frame is weighted with a Hanning window. To assess the NR performance the intelligibility-weighted signal-to-noise ratio(snr)[9] is used which is defined as SNR intellig = I i (SNR i,out SNR i,in ) (13) i where I i is the band importance function defined in [] and SNR i,out andsnr i,in representstheoutputsnrandtheinputsnr (in db) of the ith band, respectively. For measuring the signal distortion a frequency-weighted log-spectral signal distortion(sd) is used defined as SD = 1 K ( f u Pout,k s K w ERB (f) log (f) ) 2 k=1 f l Pin,k s (f) df () wherekisthenumberofframes,pout,k s (f)istheoutputpowerspectrumofthekthframe,pin,k s (f)istheinputpowerspectrumofthekth frameandfisthefrequencyindex.thesdmeasureiscalculated withafrequency-weightingfactorw ERB (f)givingequalweightfor each auditory critical band, as defined by the equivalent rectangular bandwidth(erb) of the auditory filter[11]. The SD is measured between the compressed version of the clean speech reference signal(after the fixed spatial pre-processor) andthespeechcomponentoftheoutputsignalofthetotalscheme containing NR and. The clean speech reference signal is compressedwithouttheuseofdual-,wherethegainiscomputedbasedonthecleanspeechitself. Itisassumedthatnoise dominantframeswillhavealargereffectonthesdmeasureinthe dual- scheme. SimulationsareperformedfordifferentCRand α max forboth dual-1 and dual-2 and compared to a serial concatenation ofnrand.thefirstsimulationistoverifythatahighercr deteriorates the SNR for a serial concatenation compared to dual-.inthesecondsimulation,thecrisfixedandtheeffectof α max ondual-1comparedtodual-2isevaluated.experiments with dual-1 and dual-2 are basically to show that the performance of the integrated scheme is affected by the way the curves are defined. Forallsimulationstheattackandreleasetimearefixedto at=5msandrt=7ms.thehearingaidgaing s db issetto3dband thect s =3dB. 4.2 EffectofCRonSNRimprovementandSD In the first simulation the following setttings are used, SerialconcatenationofNRand(α min =, α max =) Dual-1(α min =.,α max =.7,G n db =db) Dual-2(α min =., α max =.7,G n db =db) CRisvariedfromonetofive.Figure5(a)showstheSNRimprovementforthecasewheretheCRisincreased. ThesolidlinerepresentstheSNRimprovementwhentheNRandareserially SNR (db) 15 5 input SNR db Serial concatenation Dual 1 Dual Compression Ratio (CR) (a) SNR improvement for dual- compared to a serial concatenation SD (db) 6 Serial concatenation Dual 1 Dual Compression Ratio (CR) (b) SD for dual- compared to a serial concatenation Figure 5: Results for dual- compared to a serial concatenation asafunctionofcr. concatenated, where a CR=1 corresponds to the SNR improvement forthenrwithout.thedashedlineandthe( )markerline represents the SNR improvement for dual-1 and dual-2, respectively. Notice that with dual- there is an SNR improvementsinceisappliedtothespeechsegments,buttheresidual noiseisnotamplified.atlowcrthedual-1hasabettersnr improvement but at higher CR dual-2 is better. This happens fortworeasons:firsttheimpactof isreducedindual-1when CRisincreased.Secondly,G n db fordual-1isdbandfordual- 2G n dbissettodb,whichmeansthat fordual-1has agreaterimpactatlowcr.settingg n db todbindual-2will resultinabettersnrimprovementforallvaluesofcr,butthis comesatthecostofless.fordual-2thesnrimprovementcanbecontrolledbychangingg n db,ifthisvalueiscloserto G s db theintegrationisapproachingaserialconcatenation. Figure5(b)showstheSDforthethreecasesandheretheSDis lowestwhennodual-isapplied.fordual-1thesdisinitially higher which corresponds to the better SNR improvement, and thenthesdisdecreasingascrisincreased,whichcorrespondsto smaller SNR improvement. For dual-2 the SD is higher when CR is increased which is also the case for the serial concatenation. 4.3 Effectof α max onsnrimprovementandsd In the second simulation the following settings are used, Dual-1(CR=1.5,G n db =db) Dual-1(CR=3,G n db =db) Dual-2(CR=1.5,G n db =db) Dual-2(CR=3,G n db =db)

5 SNR (db) SD (db) 16 6 input SNR db Dual 1 CR=1.5 Dual 1 CR=3 Dual 2 CR=1.5 Dual 2 CR= α max (a) SNR improvement for dual- for CR=1.5 and Dual 1 CR=1.5 Dual 1 CR=3 Dual 2 CR=1.5 Dual 2 CR= α max (b)sdfordual-forcr=1.5and3 Figure 6: Results for dual-1 and Dual-2 as a function of α max. α min =.and α max isvariedfrom α min toone.figure6(a)shows thesnrimprovementforthecasewhere α max isincreased. The solid line represents the dual-1 for CR=1.5 and this approach outperforms dual-2 for CR=1.5 represented by( ) marker line. When CR=3 the dual-2 represented by the( ) marker line outperforms dual-1 represented by the dashed line, and it is clear thatforcr=3dual-1cannotimprovethesnrmuchwhichis againduetothefactthattheimpactof isreducedwithhighercr. Figure6(b)showstheSDwherethedashedand( )markerline show almost similar distortion, but here it is worth noting that dual- 2 still has a significant SNR improvement compared to dual- 1withCR=3.ForthecasewithCR=1.5thedual-1shows asignificantsnrimprovement,butthiscomesatthecostofhigher SD(shownwiththesolidlinewhichshouldbecomparedtothe( ) marker line). Overall, the SNR improvement comes at the cost of greater distortion. The SD basically represents how far away the dual- is fromtheoriginalcurvep s,db,whichmeansthatthesdis higherwhentheimpactof islargerresultinginmoreactivedual-.inotherwords,thereisatradeoffbetweensnrimprovement andhowclosethedual-istotheoriginalcurve. The dual- plays an important role when the noise is dominant i.e. theisapproachingp n,db.thisisthecasewherethesdis highest, which means that especially the noise dominant segments contribute to the SD. We therefore assume that the speech dominant part is less distorted. measureoftheamountofspeechandnoiseintheinput.this measure,definedby α NR,isestimatedbasedonthepowerratioof theoutputandtheinputofthenr. α NR isusedtotradeoffbetween the amount of that is applied without counteracting the NR performance. Weintroducetradeoffparameters α min and α max to control the integration. When these parameters are set to zero, the integration corresponds to a serial concatenation of NR and. Two dual- approaches have been proposed, dual-1 has a better performance at lower CR whereas dual-2 shows more flexibilityandworksoverawiderangeofcrsettings. WehaveshownthatincreasingtheCRleadstoareducedSNR improvement, when NR and are serially concatenated. Dual- resultedinanimprovementinsnrcomparedtoaserialconcatenationwhencrisincreased.withthecrfixedwehaveshown thatbyincreasing α max itispossibletoimprovethesnrasaresult of the dual- becoming more active. REFERENCES [1] H. Dillon, Hearing Aids. Boomerang Press, Thieme, 1. [2] V. Hamacher, J. Chalupper, J. Eggers, E. Fischer, U. Kornagel, H. Puder, and U.Rass, Signal processing in high-end hearing aids: State of the art, challenges, and future trends, EURASIP Journal on Applied Signal Processing, vol. 1, pp , 5. [3] D.Mauler,A.M.Nagathil,andR.Martin, Onoptimalestimation of compressed speech for hearing aids, Proc. Interspeech, pp , August 27-31, 7. [4] J. M. Kates, Principles of digital dynamic-range compression, Trends In Amplification, vol. 9, no. 2, pp , 5. [5] L. Griffiths and C. Jim, An alternative approach to linearly constrained adaptive beamforming, IEEE Transactions on Antennas and Propagation, vol. 3, no. 1, pp , Jan 192. [6] J. Shynk, Frequency-domain and multirate adaptive filtering, Signal Processing Magazine, IEEE, vol. 9, no. 1, pp. 37, Jan [7] M.Nilsson,S.D.Soli,andA.Sullivan, Developmentofthe Hearing in Noise Test for the measurement of speech receptionthresholdsinquietandinnoise, jasa,vol.95,no.2,pp. 5 99, Feb [] Auditec, Auditory Tests(Revised), Compact Disc, Auditec, St. Louis, St. Louis, [9] J. E. Greenberg, P. M. Peterson, and P. M. Zurek, Intelligibility-weighted measures of speech-to-interference ratio and speech system performance, J. Acoustic. Soc. Am., vol.94,no.5,pp.39 3,Nov [] Acoustical Society of America, ANSI S American National Standard Methods for calculation of the speech intelligibility index, June [11] B. Moore, An Introduction to the Psychology of Hearing, 5th ed. Academic Press, CONCLUSIONS In this paper, we have presented a novel approach for integrating NRandbasedonadual-concept.Thedual-usesa

Published in: Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control

Published in: Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control Aalborg Universitet Variable Speech Distortion Weighted Multichannel Wiener Filter based on Soft Output Voice Activity Detection for Noise Reduction in Hearing Aids Ngo, Kim; Spriet, Ann; Moonen, Marc;