Internatonal Journal of Emergng Technology n Computer Scence & Electroncs (IJETCSE) ISSN: 976-1353 Volume 23 Issue 2 JUNE 216 (SPECIAL ISSUE) NEUOMOPHIC NOISE ATTENUATION BASED ON PITCH IN HEAING AIDS adhka.v Electroncs and Communcaton Engneerng Easwar Engneerng College Chenna, Inda radh.muchel@gmal.com Dr.Sudha.S Electroncs and Communcaton Engneerng Easwar Engneerng College, Chenna, Inda hod.ece@srmeaswar.ac.n Abstract The Neuromorphc nose attenuaton whch s based on the ptch for hearng ads wth low computatonal complexty and hardware complexty has been presented n ths paper. The proposed nose reducton algorthm conssts of nose attenuator to attenuate the background nose and enhance the speech. Then t has ptch scale harmonc flter (PSHF) to detect the speech by calculatng the ptch of the sgnal. The neuromorhc nose attenuator reduces the nose accordng to the performance of PSHF Smulaton results shows that ptch based nose reducton algorthm has very good SN compared to non- ptch based nose reducton algorthms lke mean square error, wener flter, etc., both n statonary and non- statonary nose envronments. Keywords Neuromorhc nose attenuator; Hearng ad; Ptch Scaled Harmonc Flter; computatonal complexty; Nose reducton; Statonary and non-statonary nose. I. INTODUCTION Speech s the most sgnfcant mode of human communcaton. The essental speech communcaton process ncludes a talker, who utters a speech sound, and a recever, who lstens to the sound and then decodes the meanng. Ths process, however, s subect to nterference n realstc acoustc envronments; the acoustc waveform reachng the lstener s ears s usually composed of sound energy from multple envronmental sources. The nterferng sound can be a statonary nose, such as an ambent nose from an ar condtoner, or a non-statonary nterference, such as door slams, musc, and other speech utterances. A rather nterestng example occurs at a crowded party, where many people talk smultaneously wth a varety of nterferng noses n the background. In hearng ads (HA) systems, sgnals are enhanced to balance the hearng loss of patents. Conversely, the mproved background nose may degrade the speech qualty and ntellgblty or even spol the left over hearng capacty of patents. Thus, nose reducton s a key block n hearng ad system applcatons. The nose reducton algorthms based on one mcrophone s dvded nto three types: spectral subtracton algorthm [1], [2], statstcal model based algorthm [3], [4] and subspace algorthm [5]. The nose reducton algorthm whch depends on the statstcal mode can effectvely reduce background nose, but the computatonal complexty s too hgh to be mplemented for HA applcatons and produce artfcal nose problem. Thus, the spectral subtracton algorthm s frequently used n a low power HA hardware mplementaton. Generally n an HA system VAD s used to dfferentate between speech domnated duraton and nose domnated duraton. The tradtonal VAD usually detects voce based on energy [6], zero crossng rate [7] or entropy [8]. The computatonal complexty of these methods s low enough for HA applcatons and the accuracy s qute hgh statonary nose envronment, but, at non-statonary nose envronments, the accuracy s qute low due to naccurate estmaton of background nose. LTSV-VAD [9] uses longterm sgnal varablty measure to separate nose from nosy speech and ths characterstc s used as VAD. Due to the computaton of the frames, the computatonal complexty s very hgh and the algorthm wll exceed the latency tolerance of HA (about 1 ms 15 ms) [1]. Hdden-Markov-modelbased (HMM-based) VAD [11] gves very hgh accuracy even n non-statonary nose envronments but the computatonal complexty s very hgh because Mel-frequency spectral s requred whch s very hard to apply n hearng ads applcatons. Another method has been developed for hearng ads s unformly modulated flter bank system for sgnal analyss and synthess. In ths two stage fler banks are ntroduced [12], ts frequency resoluton s good but at the low frequences SN s also very low. Therefore, a neuromorphc ptch-based nose reducton algorthm had been proposed [13]. The complexty of the desgn s calculated by ntroducng nserton gan (IG) block amplfes the enhanced sgnal of each sub band ndvdually to compensate the hearng loss of patents n each sub band [14]. To ntroduce low complexty hardware and less 88
Internatonal Journal of Emergng Technology n Computer Scence & Electroncs (IJETCSE) ISSN: 976-1353 Volume 23 Issue 2 JUNE 216 (SPECIAL ISSUE) computatonal complexty Ptch based flter s ntroduced [15]. Then perform a flterng n the frequency doman usng the short-tme Fourer transform n order to separate the harmonc and non-harmonc parts of the processed sgnal [16]. PSHF has been desgned to separate the perodc and aperodc components of speech sgnals [17]. Then statstcal learnng of feature extracton lke ptch determnaton by Nonnegatve Matrx Factorzaton (NMF) and APT algorthms [18]. Next to evaluate the expermental results keele ptch reference database had been used [19]. By consderng all the results the neuromorphc attenuator cascaded wth Ptch Scaled Harmonc Flter (PSHF) to obtan better performance. The outlne of ths paper s organzed as follows. Secton II descrbes the Neuromorhc attenuator wth Ptch Scaled Harmonc Flter (PSHF) and ts workng process. Secton III, descrbes the evaluaton results and performance of obectve measures of Ptch based Neuromorphc Attenuator and ts SN. Fnally, Secton IV the paper s concluded. II. METHODOLOGY USED A. Neuromorphc Ptch Voced speech sgnals can be consdered as quas-perodc wth Neuromorphc Ptch. The basc perod s called the neuromorphc ptch perod. The average ptch frequency (n short, the ptch), tme pattern, gan, and fluctuaton change from one ndvdual speaker to another. For speech sgnal analyss, and especally for synthess, beng able to dentfy the Neuromorphc ptch s extremely mportant. A well-known method for ptch detecton s ptch-based voce actvty detector (ptch-based VAD). It s based on the fact that two consecutve cycles have a hgh cross-correlaton value, as opposed to two consecutve speech fractons of Neuromorphc Ptch length but dfferent from the ptch cycle tme. Ths process wll be followed on ptch-scaled harmonc flterng (PSHF). B. Neuromorphc Nose Attenuator Nosy Sgnal Neuromorphc Nose Attenuator Deco der Speech Enhance ment Nose attenuat on Enhanced sgnal However, the mproved background nose may corrupt the speech qualty and transparency. To avod background nose amplfcaton, the system has to suppress the background nose and enhance the speech sgnal. A neuromorphc nose attenuator s used to protect speech and suppress background nose. The neuromorphc nose attenuator also employs multplcaton for tme-doman gan smoothng to reduce the artfcal nose problem of tradtonal spectral subtracton algorthm. The ptch scaled harmonc flter (PSHF) s used to compute the ptch of the speech sgnal. The calculated ptch s cascaded wth neuromorphoc nose attenuator to enhance the speech. C. Ptch Scaled Harmonc Flter The ptch-scaled harmonc flter (PSHF) s a technque for decomposng speech sgnals nto ther perodc and aperodc consttuents, durng perods of phonaton. Fg 2. Ptch Scaled Harmonc Flter The ptch scaled harmonc flter has the followng steps: 1. Input nosy sgnal 2. Applyng wndow functon 3. Takng Fourer Transform Let s(k) and n(k) be represented by a speech sgnal and nose sgnal respectvely. The sum of the two s then denoted by x(k), x(k) = s(k) + n(k). (1) Takng the Fourer Transform of both sdes gves where X ( e ) S ( e ) N ( e ) (2) x ( k ) X ( e ) L 1 k X ( e ) x ( k ) e k Ptch from PSHF PSHF 1 k x ( k ) X ( e ) e d. 2 (3) Fg 1. Neuromorphc Nose Attenuaton wth PSHF The nose reducton s a key block n hearng ad system applcatons. In hearng ads (HA) systems, sgnals are amplfed to compensate the hearng loss of patents. 4. Compute Nose Spectrum Magntude Through manpulaton and substtuton of equaton (2) we obtan the spectral subtracton estmator ˆ S ( e ) : 89
Ampltude Frequency n Hz Internatonal Journal of Emergng Technology n Computer Scence & Electroncs (IJETCSE) ISSN: 976-1353 Volume 23 Issue 2 JUNE 216 (SPECIAL ISSUE) ˆ S ( e ) X ( e ) ( e ) e x ( e ). (4) The error that results from ths estmator s gven by ( ) ˆ x e S ( e ) S ( e ) N ( e ) ( e ) e. (5) 5. Frame Averagng In efforts to reduce ths error local averagng s used because ( e ) s smply the dfference between N ( e ) and ts mean. Therefore X ( e ) s replaced wth X ( e ). M 1 1 Where X ( e ) X ( e ) X ( e ) =th M tme-wndowed transform of x(k) By substtuton n equaton (4) we have ˆ S ( e ) X ( e ) ( e ) e A The spectral error s now approxmately where x ( e ) (6) ( e ) = ˆ S ( e ) - ˆ S ( e ) N (7) A M 1 1 N ( e ) N ( e ). M The output of the spectral estmate of Neuromorphc ptch ncludng sgnal attenuaton s gven by: 12 ( ) T d B S e S ( e ) c X ( e ) T 12dB 8. Sgnal econstructon Frstly the ptch s calculated usng PSHF and then t s cascaded wth neuromorphc nose attenuator to enhance the speech. The ptch-scaled harmonc flter (PSHF) s a method for dvdng speech sgnals nto ther perodc and aperodc components. The perodc component can be used as an estmate of the part of speech, and the aperodc component can act as an estmate of nose. The calculated ptch from PSHF s then gven to the neuromorphc nose attenuator. The PSHF s based on a calculaton of harmoncs-to-nose rato. The neuromorphc nose attenuator decodes the speech and nose. Then the nose s suppressed and the speech s enhanced accordng to the performance of the PSHF. Fnally the mproved speech s obtaned. III. ESULTS AND DISCUSION A. Non-statonary Nose. To evaluate the performance of the neuromorphc ptch based nose reducton for non-statonary nose, speech database s used and resample. For the obectve evaluatons, the speech s dstngushed by averagng the results. Whte Gaussan nose or speech-shaped statonary nose s added to these speech sgnals at db segmental nput SN to generate nosy speech fles..25.2.15.1 Input Sgnal 5 4 Spectrogram Graph for Measured Input Sgnal Thus, the sample mean of N ( e ) wll converge to.5 -.5 -.1 -.15 3 2 1 ( e ) as a longer average s taken. It has also been noted that averagng over more than three half-overlapped frames, wll weaken ntellgblty. 6. Nose educton Nose educton s mplemented as: S ˆ ( e ) S ˆ ( e ) for S ˆ ( e ) m a x N ( e ) S ˆ ( e ) m n S ˆ ( e ) 1,, 1 for S ˆ ( e ) m a x N ( e ) (8) where N n ( e ) N e and m a x N ( e ) = maxmum value of nose measured durng nose actvty. -.2 -.25.5 1 1.5 2 2.5 x 1 4 (a).2.4.6.8 1 1.2 1.4 1.6 1.8 Tme Fg 3.Spectrogram and Waveform of non-statonary Nosy Speech. Fgure3 shows the waveform and spectrogram of nosy nput speech sgnal. Fgure 3(a) shows the nosy nput speech sgnal at db SN. Fgure 3(b) shows the spectrogram of the nosy speech sgnal. Ths s the result when neuromorphc ptch based PSHF s appled to the non-statonary nosy speech sgnal. The proposed neurpmorhc attenuator fully depends on the qualty of the ptch from PSHF. (b) 7. Attenuate Sgnal durng Non-Speech Actvty 9
SN Ampltude Frequency n Hz Ampltude Frequency n Hz Ampltude Frequency n Hz Internatonal Journal of Emergng Technology n Computer Scence & Electroncs (IJETCSE) ISSN: 976-1353 Volume 23 Issue 2 JUNE 216 (SPECIAL ISSUE).3 Neuro Morphc Attenuaton Based PSHF Sgnal Spectrogram Graph for Neuromorphc Attenuator PSHF Sgnal.8 Neuro Morphc Attenuaton Based PSHF Sgnal 4 Spectrogram Graph for Neuromorphc Attenuator PSHF Sgnal.25 5.6 35.2.15 4.4 3.1.2 25.5 3 2 2 -.2 15 -.5 -.4 1 -.1 1 -.6 5 -.15 -.2.5 1 1.5 2 2.5 x 1 4.2.4.6.8 1 1.2 1.4 1.6 1.8 Tme -.8.5 1 1.5 2 2.5 x 1 4.5 1 1.5 2 2.5 Tme (a) (b) Fg 4. Neuromorphc Attenuaton Based PSHF Sgnal (Non-statonary nose) Fgure 4 shows the waveform and spectrogram of the nonstatonary neuromorhc attenuated sgnal. From ths fgure t s clearly shown that nose s reduced, and the sgnal s enhanced and even the SN s also good compared to non- ptch based flters. B. Statonary Nose. To evaluate the performance of the neuromorphc ptch based nose reducton for statonary nose, speech database s used. For the obectve evaluatons, the speech s dstngushed by averagng the results. Arport nose s added wth orgnal speech sgnal s consdered for the evaluaton...6 Input Sgnal 4 Spectrogram Graph for Measured Input Sgnal Fg 6. Neuromorphc Attenuaton Based PSHF Sgnal (Statonary nose) Fgure 6 shows the waveform and spectrogram of the output from the neuromorphc attenuator for statonary (arport) nose. Fgure 6(a) clearly shows the sgnal s almost free from nose and the sgnal gets enhanced. From fgure 6(b) t s nferred that the frequency resoluton s hgh and the SN gets ncreased. It s clear that for both statonary and non-statonary nose envronments, the SN s not vared much and t gves better SN. SN Estmaton The flters are compared wth the performance metrcs, SN and MSE SN= Ps(ώ)/Pᵥ(ώ) Comparng the SN of dfferent flters wth neuromorhc ptch based nose attenuator..4.2 35 3 25 3 SN comparson graph between WF, AWF, SBAWF and NPSHF 2 -.2 -.4 -.6 15 1 5 25 -.8.5 1 1.5 2 2.5 x 1 4 (a).5 1 1.5 2 2.5 Tme (b) Fg 5. Spectrogram and Waveform of Statonary Nosy Speech Fgure 5 shows the waveform and the spectrogram of nput nosy sgnal of statonary nose of arport nose. Fgure 5(a) shows the nosy speech sgnal of statonary nose. Fgure 5(b) shows the spectrogram of the nosy speech sgnal of statonary nose. The output from the neuromorphc attenuator based on PSHF shows that even for statonary nose the SN s hgh and the speech s enhanced and background nose s attenuated. From the results t s observed that for both statonary and non- statonary nose envronments the SN s not vared much and t s hgh compared to non- ptch based flters.. 2 15 1 5 1 1.5 2 2.5 3 3.5 4 4.5 5 WF AWF SBAWF NPSHF-Statonary NPSHF-Non statonar Fg 7. Comparson Graph for SN of dfferent flters. It s clear from the graph that SN of Non- Ptch based flters are less compared to Ptch based Neuromorphc attenuator (PSHF). And that the same tme n ptch based flters for statonary and non- statonary nose envronments the SN s not vared much. 91
Internatonal Journal of Emergng Technology n Computer Scence & Electroncs (IJETCSE) ISSN: 976-1353 Volume 23 Issue 2 JUNE 216 (SPECIAL ISSUE) C. Subectve Evaluaton Generally nose reducton algorthms whch are non ptch based do not acheve the speech ntellgblty gan. To evaluate the mprovements n sgnal qualty, the ptch based flters are used and tested. The neuromorhc attenuator based on ptch s mplemented wth Ptch Scaled harmonc Flter fully based on Ptch gves better obectve measures compared to non-ptch based and easer to mplement on hearng ads. D. Obectve Evaluaton For obectve evaluaton of the performance of Neuromorphc attenuator wth PSHF has been gven wth statonary (arport nose) and non- statonary (musc) nose speech sgnals are used. It s observed that the background nose s attenuated and orgnal speech sgnal s enhanced. The obectve measure shows that Neuromorhc attenuator wth PSHF for statonary (arport nose) and non- statonary (musc) nose the SN s better compared to non-ptch based flters and sgnal s enhanced by suppressng the background nose. IV. CONCLUSION A low computatonal complexty hardware-orented neuromorphc ptch based nose reducton algorthm had desgned and mplemented by usng Ptch Scaled Harmonc Flter to mprove the frequency resoluton and mprove the SN for statonary and non-statonary nose envronments. Dfferent approaches had been evaluated for calculatng the ptch wth less hardware complexty. The backdrop nose s reduced and speech sgnal s mproved by computng the ptch. Then the SN s compared for dfferent flters. Ths can be further mproved n the performance of nose reducton and SN mprovement by ntroducng dfferent approaches to calculate the ptch and mprovng the qualty of the ptch for the speech sgnal. EFEENCES [1] M. Berout,. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustc nose, n Proc. IEEE Int. Conf. Acoust., Speech, Sgnal Processng, 1979, vol. 4, pp. 28 211. [2] S. Kamath and P. C. Lozou, A mult-band spectral subtracton method for enhancng speech corrupted by colored nose, n Proc. IEEE Int. Conf. Acoustc, Speech, Sgnal Processng, 22, vol. 4, pp. 4164 4167. [3] Y. Ephram and D. Malah, Speech enhancement usng a mnmum mean-square error log-spectral ampltude estmator, Proc. IEEE Trans. Acoustcs, Speech, Sgnal Process., vol. 33, no. 2, pp. 443 445, 1985. [4] P. C. Lozou, Speech enhancement based on perceptually motvated Bayesan estmators of the magntude spectrum, IEEE Trans. Speech Audo Process., vol. 13, no. 5, pp. 857 869, 25. [5] H. Y and P. C. Lozou, A generalzed subspace approach for enhancng speech corrupted by colored nose, IEEE Trans. Speech Audo Process., vol. 11, no. 4, pp. 334 341, 23. [6] M. Berout,. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustc nose, n Proc. IEEE Int. Conf. Acoust., Speech, Sgnal Process., 1979, vol. 4, pp. 28 211. [7] J. C. Junqua, B. eaves, and B. Mak, A study of endpont detecton algorthms n adverse condtons: Incdence on a DTW and HMM recognze, n Proc. Conf. Eurospeech, 1991, pp. 1371 1374. [8] C.W.We, C. C. Tsa, T. S. Chang, and S. J. Jou, Perceptualmultband spectral subtracton for nose reducton n hearng ads, n Proc. IEEE Int. Conf. APCCAS, May 21, pp. 692 695. [9] P. K. Ghosh, A. Tsartas, ands. Narayanan, obustvoceactvtydetecton Usng Long-Term Sgnal Varablty, Proc. IEEE Trans. Audo, Speech, Language Processng, vol. 19, no. 3, pp. 6 613, 211. [1] M. A. Stone and B. C. J. Moore, Tolerable hearng ad delays. II. Estmaton of lmts mposed durng speech producton, J. Ear Hearng, vol. 23, no. 4, pp. 325 338, 22. [11] H. Ves and H. Samet, Hdden-Markov-model-based voce actvty detector wth hgh speech detecton rate for speech enhancement, Sgnal Processng, IET, vol. 6, no. 1, pp. 54 63, 212. [12] Two-Stage Flter-Bank System for Improved Sngle-Channel Nose educton n Hearng Ads Alexander Schasse, Tmo Gerkmann, Member, IEEE, aner Martn, Fellow, IEEE,Wolfgang Sörgel, Thomas Plgrm, and Hennng Puder, Member, IEEE [13] Yu-Ju Chen, Cheng-Wen We, Y FanChang, Y-Le Meng, Y-Cheng Huang, and Shyh-Jye Jou Neuromorphc Ptch Based Nose educton for Monosyllable Hearng Ad System Applcaton IEEE TANSACTIONS ON CICUITS AND SYSTEMS I: EGULA PAPES, VOL. 61, NO. 2, FEBUAY 214 [14] Y. T. Kuo, T. J. Ln, W. H. Chang, Y. T. Lu, and C. W. Lu, Complexty- effectve audtory compensaton for dgtal hearng ads, n Proc. IEEE Int. Symp. Crcuts Syst., May 28, pp. 1472 1475. [15] Sergo oa, Maren Bennewtz, and Sven Behnke Fundamental Frequency Estmaton Based On Ptch-Scaled Harmonc Flterng 142447281/ 7/$2. 27 IEEE, ICASSP 27. FILTES PITCH SN MSE Non-Statonary (Neuromorhc PSHF) Statonary (Neuromorhc PSHF) 144 27.4731 4.366e- 6 1172 25.998 4.366e- 6 Wener Flter 1.8238 8.6686e- Adaptve Wener Flter Spectral Band Wener Flter 6 7.222 8.2964e- 6 14.349 4.2329e- 6 [16] Daryush Mehta, Aspraton Nose Durng Phonaton: Synthess, Analyss, And Ptch-Scale Modfcaton B.S., Electrcal Engneerng,Unversty of Florda, Massachusetts Insttute of Technology 26. [17] [17] P. J. B. Jackson and C. H. Shadle, Ptch-scaled estmaton of smultaneous voced and turbulence-nose components n speech, IEEE Transactons on Speech and Audo Processng, vol. 9, no. 7, pp. 713 726, October 21. [18] F. Sha and L. Saul, eal-tme ptch determnaton of one or more voces by nonnegatve matrx factorzaton, n Advances n Neural Informaton Processng Systems 17, L. K. Saul, Y. Wess, and L. Bottou, Eds., pp. 1233 124. MIT Press, Cambrdge, MA, 25. 92
Internatonal Journal of Emergng Technology n Computer Scence & Electroncs (IJETCSE) ISSN: 976-1353 Volume 23 Issue 2 JUNE 216 (SPECIAL ISSUE) [19] F. Plante, G. F. Meyer, and W. A. Answorth, A ptch extracton reference database, n Proc. Eurospeech 95, J. M. Pardo et al., Eds. 1995, vol. 1, pp. 837 84, UP Madrd. [2] P. Vary, An adaptve flter bank equalzer for speech enhancement, Sgnal Process., vol. 86, no. 6, pp. 126 1214, Jun. 26. [21] H. W. Löllmann and P. Vary, Generalzed flter-bank equalzer for nose reducton wth reduced sgnal delay, n Proc. Eur. Conf. Speech Commun. Technol. (Interspeech), Lsbon, Portugal, Sep. 25, pp. 215 218. [22] H. W. Löllmann and P. Vary, Effcent non-unform flter-bank equalzer, n Proc. Eur. Sgnal Process. Conf. (EUSIPCO), Antalya, Turkey, Sep. 25. 93