A NEW SPEECH ENHANCEMENT TECHNIQUE USING PERCEPTUAL CONSTRAINED SPECTRAL WEIGHTING FACTORS

A NEW SPEECH ENHANCEMENT TECHNIQUE USING PERCEPTUAL CONSTRAINED SPECTRAL WEIGHTING FACTORS T. Muni Kumar, M.B.Rama Murthy, Ch.V.Rama Rao, K.Srinivasa Rao Gulavalleru Engineering College, Gulavalleru-51356, AP, Inia CMR College of Engineering&Technology, Kanlaoya, Hyeraba-509001, AP, Inia. Gulavalleru Engineering College, Gulavalleru-51356, AP, Inia. TRR College of Engineering, Pathancheru-50319, AP, Inia. E-mail : muniumar.thottempui@gmail.com, mbrmurthy@gmail.com, chvramaraogec@gmail.com\ principaltrr@gmail.com Abstract- This paper eals with musical noise result from perceptual speech enhancement type algorithms an especially wiener filtering. Although perceptual speech enhancement methos perform better than the non perceptual methos, most of them still return annoying resiual musical noise. This is ue to the fact that if only noise above the noise masing threshol is filtere then noise below the noise masing threshol can become auible if its masers are filtere. It can affect the performance of perceptual speech enhancement metho that process auible noise only. In orer to overcome this rawbac here propose a new speech enhancement technique. It aims to improve the quality of the enhance speech signal provie by perceptual wiener filtering by controlling the latter via a secon filter regare as a psychoacoustically motivate weighting factor. The simulation results shows that the performance is improve compare to other perceptual speech enhancement methos 1. INTRODUCTION The objective of speech enhancement process is to improve the quality an intelligibility of speech in noisy environments. The problem has been wiely iscusse over the years. Many approaches have been propose lie subtractive type [1-4], Perceptual Wiener filtering algorithms. Among them spectral subtraction an the Wiener filtering algorithms are wiely use because of their low computational complexity an impressive performance. In these algorithms, Such methos return resiual noise nown as musical noise. This type of noise is quite annoying. In orer to reuce the effect of musical noise, several solutions have been propose. Some involve ajusting parameters of spectral subtraction so as to offer more flexibility as in [] an [3]. Other such as propose in [4], are base on signal subspace approaches. Despite the effectiveness of these techniques to improve the signal to noise ratio (SNR), the problem of eliminating or reucing musical noise is still a challenge to many researchers. In the last few ecaes the introuction of psychoacoustic moels has attracte a great eal of interest. The objective is to improve the perceptual quality of the enhance signal. In [3], a psychoacoustic moel is use to control the parameters of the spectral subtraction in orer to fin the best trae of between noise reuction an speech istortion. To mae musical noise inauible, the linear estimator propose in [5] incorporates the masing properties of the human auitory system. In [6], the masing threshol an intermeiate signal, which is slightly enoise an free of musical noise, are use to etect musical tones generate by the spectral subtraction methos. This etection can be use by a post-processing aime at reucing the etecte tones. These perceptual speech enhancement systems reuce the musical noise but introuce some unesire istortion to the enhance speech signal. When this istorte estimate speech signal is applie to the recognition systems their performance egraes rastically. The basic iea of the propose metho is to remove, perceptually significant noise components from the noisy signal, so that the clean speech components are not affecte by processing. In aition, the technique requires very little a priori information of the features of the noise. In the present paper, we propose to control the perceptual wiener filtering by psychoacoustically motivate filter that can be regare as weighting factor. The purpose is to minimize the perception of musical noise without egraing the clarity of the enhance speech. International Journal of Electronics Signals an Systems (IJESS), ISSN No. 31-5969, Volume-1, Issue-, 01 30

. STANDARD SPEECH ENHANCEMENT TECHNIQUE Let the noisy signal can be expresse as y ( s( + (, (1) Where x ( is the original clean speech signal an ( is the aitive ranom noise signal, uncorrelate with the original signal. Taing DFT to the observe signal gives Y ( S( + D(. () Where m 1,,..., M is the frame inex, 1,,..., K is the frequency bin inex, M is the total number of frames an K is the frame length, Y (, S( an D( represent the short time spectral components of the y (, S( an(, respectively. Clean speech spectrum Sˆ ( is obtaine by multiplying noisy speech spectrum with filter gain function as given in eqation (3) S ˆ( H ( Y ( (3) Where H ( is the noise suppression filter gain function (conventional Wiener filter (WF)), which is erive accoring to MMSE estimator an H ( is given by ξ ( H ( 1+ ξ ( (4) represents the estimate noise power spectrum an clean speech power spectru respectively. A posteriori estimation is given by γ ( (6) Y ( ( Γ An estimate of ˆ ξ ( of ξ ( is given by the well nown ecision irecte approach [9] an is expresse as Hm ( 1, Y ) ( m 1, ˆ( ξ m, ) α + (1 α) PV '[ ( m, ]. Γ (7) Where ( ( 1, x 0 an P [ x] 0 otherwise. V γ [] x x P if The noise suppression gain function is chosen as the Wiener filter similar to [13] 3. PERCEPTUAL SPEECH ENHANCEMENT Although the Wiener filtering reuces the level of musical noise, it oes not eliminate it [15]. Musical noise exists an perceptually annoying. In an effort to mae the resiual noise perceptually inauible, many perceptual speech enhancement methos have been propose which incorporates the auitory masing properties [-9]. In these methos resiual noise is shape accoring to an estimate of the signal masing threshol [9, 13]. Figure 1 epicts the complete bloc iagram of the propose speech enhancement metho. Where ξ ( is an apriori SNR, which is efine as Γs ( ξ (. (5) Γ ( { D( Γ ( E { S( Γ ( E s an International Journal of Electronics Signals an Systems (IJESS), ISSN No. 31-5969, Volume-1, Issue-, 01 31

Noisy signal Winowing + FFT Amplitue Noise estimation Estimation of NMT By incluing the above constraint an substituting { D( { S( Γ ( E an Γ s ( E in (9) the cost function will become as Enhance signal Phase IFFT- Overlap -A * PWF * WP WF ATH ){ max[ ( Γ ( T( )),0]} J ( H1( 1) Γ ( + H1 ( (11) s The esire perceptual moification of Wiener is obtaine by ifferentiating J w.r.t H1( an equating to zero. The obtaine perceptually efine Wiener filter gain function is given by ( Γs ( H1 Γs ( + max( Γ ( T(,0) (1) Figure1. Bloc iagram of the propose speech enhancement metho 3.1 Gain of Perceptual Wiener filter (PWF) The perceptual Wiener filter (PWF) gain function H 1( is calculate base cost function, J which is efine as J Sˆ ( S( (8) Substituting () an (3) in (9) results to {( H } 1 ( 1) S( H1( D( E + (9) Where i i + r i 1) E[ S( ] [ D( ] ( H an 1 ( ) r i H1 ( E represents speech istortion energy an resiual noise energy. To mae this resiual noise inauible, the resiual noise shoul be less than the auitory masing threshol, T (. This constraint is given by r T ( (10) i By multiplying an iviing equation (1) with Γ (, H ( ) will become as 1 ˆ( ξ m, ) H1( m, ) max( Γ ( m, ) T( m, ), 0) ˆ( ξ m, ) + Γ ( m, ) (13) T ( is noise masing threshol which is estimate base on[16] noisy speech spectrum. A priori SNR an noise power spectrum were estimate using the two -step a priori SNR estimator propose in [15] an weighte noise estimation metho propose in[17],respectively. 3. WEIGHTED PWF Although perceptual speech enhancement methos perform better than the non-perceptual methos, most of them still return annoying resiual musical noise. Enhance speech signal obtaine using above mentione perceptual Wiener filter still contains some resiual noise ue to the fact that only noise above the noise masing threshol is filtere an noise below the noise masing threshol is remain. It can affect the performance of perceptual speech enhancement metho that processes auible noise only. In orer to overcome this rawbac we propose to weight the perceptual Wiener filters using a psychoacoustically motivate weighting filter. Psychoacoustically motivate weighting filter is given by International Journal of Electronics Signals an Systems (IJESS), ISSN No. 31-5969, Volume-1, Issue-, 01 3

H( m, ), ifathm (, <Γ T( W( m, ) 1, otherwise (15) Where ATH ( is the absolute threshol of hearing. This weighting factor is use to weight the perceptual wiener filter. The gain function of the H ( of the propose weighte perceptual Wiener filter is given by H H ( ) W ( ) (16) 1 5. SIMULATION RESULTS To evaluate an compare the performance of the propose scheme of speech enhancement, simulations are carrie out with the NOIZEUS, A noisy speech corpus for evaluation of speech enhancement algorithms, atabase [18]. The noisy atabase contains 30 IEEE sentences (prouce by three male an three female speaers) corrupte by eight ifferent real worl noises at ifferent SNRs. Speech signals were egrae with ifferent types of noise at global SNR levels of 0 B, 5 B, 10 B an 15 B. In this evaluation only five noises are consiere those are babble, car, train, airport an street noise. The objective quality measures use for the evaluation of the propose speech enhancement metho are the segmental SNR an PESQ measures[19]. It is well nown that the segmental SNR is more accurate in inicating the speech istortion than the overall SNR. The higher value of the segmental SNR inicates the weaer speech istortion. The higher PESQ score inicates better perceive quality of the propose signal [19]. The performance of the propose metho is compare with Wiener filter an perceptual Wiener filter. The simulation results are summarize in Table 1 an Table. The propose metho leas to better enoising quality for temporal an the better improvements are obtaine for the high noise level. The time-frequency istribution of speech signals provies more accurate information about the resiual noise an speech istortion than the corresponing time omain wave forms. we compare the spectrograms for each of the metho an confirme a reuction of the resiual noise an speech istortion. Figure. Represents the spectrograms of the clean speech signal, noisy signal an enhance speech signals. Table.1 Segmental SNR values of Enhance Signals Noise Type Input SNR WF PWF Propose metho Babble Car Train Airport Street (B) 0-4.59-0.61 0. 5-1.39 0.01 0.3 10 0.0 0.65.14 15 0.75.71 3.97 0-3.93-0.4 0.85 5-1.65 0.5 1.0 10 0.69 0.70.37 15 0.7.31 3.81 0-3.45-0.49 0.15 5-0.86 0.38 0.43 10-0.39 0.77.0 15 0.75.6 3.5 0-4.37-0.4 0.19 5 -.57 0.15 0.43 10-0.06 0.14 1.09 15 0.75 1.88 3.65 0 -.88-0.15 0.08 5 -.13 0.61 0.73 10 0.69 1.0.70 15 0.77.5 3.4 Table. PESQ values of the enhance signals Noise Type Babble Car Train Airport Input SNR (B) WF PWF Propose metho 0 1.1 0.95 1.47 5 1.78 1.750 1.836 10.034.76.40 15.17.609.718 0 1.165 1.439 1.734 5 1.694 1.697.107 10 1.91.168.318 15.65.645 3.17 0 1.450 1.48 1.731 5 1.680 1.715.133 10.009.096.479 15.040.03.714 0 1.47 1.561 1.759 5 1.49 1.769.4 10.05.413.538 15.49.579.715 Street 0 1.636 1.78 1.817 5 1.679 1.857 1.968 10.119.60.39 15.380.573.683 6. CONCLUSION In this paper, an effective approach for suppressing musical noise presente after wiener filtering has been introuce. Base on the perceptual properties of the human auitory syste a weighting factor accentuates the enoising process when noise is International Journal of Electronics Signals an Systems (IJESS), ISSN No. 31-5969, Volume-1, Issue-, 01 33

perceptually insignificant an prevents that resiual noise components might become auible in the absence of ajacent masers. When the speech signal is aitively corrupte by babble noise an car noise objective measure results showe the improvement brought by the propose metho in comparison to some recent filtering techniques of the same type. Figure. speech spectrogra(a)original clean signal,(b) noisy signal(babble noise SNR5B),(c)enhance signal using Wiener filter()enhance signal using PWF,(e)enhance signal using Weighte PWF 7. REFERENCES [1] Y. Ephraim an D. Malah, Speech enhancement using a minimum mean square error short-time spectral amplitue estimator, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-3, pp. 1109 111, Dec 1984. [] R. Schwartz M. Berouti an J. Mahoul, Enhancement of speech corrupte by acoustic noise, Proc. of ICASSP, 1979, vol. I, pp. 08 11. [3] N.Virag, Single channel speech enhancement base on masing properties of the human auitory syste IEEE Trans. Speech an Auio Processing, vol. 7, pp. 16 137, 1999. [4] Y. Ephraim an H.L. Van Trees, A signal subspace approach for speech enhancement, IEEE Trans. Speech an Auio Processing, vol. 3, pp. 51 66, 1995. [5] Y. Hu an P. Loizou, Incorporating a psychoacoustic moel in frequency omain speech enhancement, IEEE Signal Processing Letters, vol. 11(), pp. 70 73, 004. [6] F. Jabloun an B. Champagne, Incorporating the human hearing properties in the signal subspace approach for speech enhancement, IEEE Trans. Speech an Auio Processing,vol. 11, pp. 700 708, 003. [7] Y.M. Cheng an D. O Shaughnessy, Speech enhancement base conceptually on auitory evience, IEEE Trans. Signal Processing, vol.39, no.9, pp.1943 1954, 1991. [8] D. Tsoualas, M. Parasevas, an J. Mourjopoulos, Speech enhancement using psychoacoustic criteria, IEEE ICASSP, pp.359 36, Minneapolis, MN, 1993. [9] Y. Hu an P.C. Loizou, "A perceptually motivate approach for speech enhancement," IEEE Trans. Speech Auio Processing, pp. 457-465. Sept. 003. [10] L. Lin, W. H. Holmes an E. Ambiairajah, Speech enoising using perceptual moification of Wiener filtering, IEE Electronic Letters, vol. 38, pp. 1486 1487, Nov 00. [11] P. Scalart C. Beaugeant, V. Turbin an A. Gilloire, New optimal filtering approaches for hans-free telecommunication terminals, Signal Processing, vol. 64 (15), pp. 33 47, Jan 1998. [1] T. Lee an Kaisheng Yao, Speech enhancement by perceptual filter with sequential noise parameter estimation, Proc. of ICASSP, vol. I, pp. 693 696, 004. [13] M. Jahangir Ala Si-Ahme Selouani, Douglas O Shaughnessy an S. Ben Jebara, Speech enhancement using a Wiener enoising technique an musical noise reuction in the Proceeing of INTERSPEECH 08, Brisbane, Australia, pp. 407-410, September 008. [14] Amehraye, D. Pastor, an A. Tamtaoui, Perceptual improvement of Wiener filtering. Proc. of ICASSP, pp. 081 084, 008. [15] M. Jahangir Ala Douglas O Shaughnessy an Si-Ahme Selouani, Speech enhancement base on novel two-step a priori SN estimators, in the Proceeing of INTERSPEECH 08, Brisbane,Australia, pp. 565-568, September 008. [16] J. D. Johnston, Transform coing of auio signals using perceptual noise criteria, IEEE on Selecte Areas in Comm., vol. 6, pp. 314 33, February1988. [17] M. Kato, A. Sugiyama an M. Serizawa, Noise suppression with high speech quality base on weighte noise estimation an MMSESTSA, IEICE Trans. Funamentals, vol. E85-A, no.7, pp. 1710-1718, July 00. International Journal of Electronics Signals an Systems (IJESS), ISSN No. 31-5969, Volume-1, Issue-, 01 34

[18] http://www.utallas.eu/~loizou/speech/noizeus/ [19] Yi Hu an Philipos C. Loizou, Evaluation of Objective Quality Measures for Speech Enhancement, IEEE Trans. on Auio, Speech an Language Processing, vol. 16, no. 1, pp. 9-38, January 008. International Journal of Electronics Signals an Systems (IJESS), ISSN No. 31-5969, Volume-1, Issue-, 01 35