[Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY SPEECH ENHANCEMENT BASED ON SELF ADAPTIVE LAGRANGE MULTIPLIER WITH WEIGHTED PERCEPTUAL WIENER DE-NOISING TECHNIQUE S. Arjuna Rao*, K. Murali Krishna * Dept. of ECE Gudlavalleru Engg. College (GEC) Gudlavalleru, krishna-5356, AP, India Dept. of ECE Gudlavalleru Engg. College (GEC) Gudlavalleru, krishna-5356, AP, India DOI:.58/zenodo.59536 ABSTRACT The most voice based communication systems facing many problems such as lack of perceptual clarity, musical noise or residual noise, speech distortion and noise distortion. The main objective of speech enhancement is to improve the speech quality and intelligibility. By using wiener filter with multiplier makes tradeoff between the speech distortion and residual noise, when the value of multiplier is greater than or equal to zero otherwise causes speech distortion and residual noise. The perceptual wiener filter also contains some residual noise and there is a nonlinear relationship between the multiplier and threshold value causes noise distortion. In this a Psycho acoustically motivated method is used for choosing better multiplier value and to avoid nonlinear relationship. The objective evaluation showed that the proposed method performance is better than different existing methods. KEYWORDS: intelligibility, wiener filter, multipliers, threshold value, psychoacoustically. INTRODUCTION Speech is the most important parameter for human communication. Most of the speech based application systems faces the problem of degradation of speech quality and intelligibility [3] due to additive noise. Speech enhancement is a challenge to the many researchers to avoid additive noise (speech distortion and noise distortion). The spectral subtraction [] method subtracts the estimated power spectrum or magnitude spectrum of the noise from the power spectrum or magnitude spectrum of noisy speech signal. The main problem of this methodical is musical noise that is from the quicken coming and going of waves over consecutive frames. Wiener filter reduces the estimation error but the drawback is the fixed frequency response at all frequencies leads musical noise []. The wiener filter with multiplier [,] makes tradeoff between speech distortion and residual noise only when the value of multiplier is greater than or equal to zero. If it is large would produces more speech distortion and less residual noise or if it is small would produces less speech distortion and more residual noise. The perceptual speech enhancement [6,7] performs better than non perceptual enhancement method. By the use of multiplier and perceptual wiener filter for minimizing the speech distortion while constraining the noise distortion fall below a constant threshold value leads a non linear relationship between multiplier and threshold value causes noise distortion []. The proposed method uses the multiplier with weighted perceptual wiener de-noising technique to choose the better multiplier maintains linear relationship between multiplier and threshold value then it results better perceptual quality, the speech distortion and noise distortion are reduced without degrading the clarity of enhanced speech signal. []
[Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 BASIC WIENER FILTER IS USED FOR NOISE DIMINISHING IN SPEECH ENHANCEMENT Let the input speech signal be the noisy speech signal [] can be expressed as y(n) = c(n) + a(n) () Where c(n) is the original clean speech signal and a(n) is the additive contingent noise signal, interrelated with the original signal. By applying DFT to the observed signal gives Y(i, k) = C(i, k) + A(i, k) () Where i=,, I is the frame index, k=,,.k is the frequency bin index, I is the total number of frames and K is the frame length. The short time spectral components of the y(n),c(n) and a(n)represented as Y(i,k),C(i,k) and A(i,k)respectively. An estimate of clean speech spectrum Ĉ(i,k) is obtained by multiplying the filter gain function with noisy speech spectrum is given as C (i, k) = H(i, k)y(i, k) (3) Where H(i,k) is the noise suppression gain function of conventional wiener filter and can be expressed as H(i, k) = ξ(i,k) +ξ(i,k) Where (i,k) is a priori SNR calculation defined as ξ(i, k) = Γ c(i,k) (5) Γ a (i,k) Γ a (i, k) = E{ A(i, k) } (6) Γ c (i. k) = E{ C(i, k) } (7) Equation (6) and (7) represents the predicted panorama of noise power and panorama of clean speech power respectively. A posteriori SNR can be estimated as γ(i, k) = Y(i,k) Γ a (i,k) In DD approach an estimate of the current a priori SNR is estimated by using the speech spectrum estimated in the previous frame and the a priori SNR accompanies the a posteriori SNR with a delay of one frame. This delay causes undesired gain distortion and thus generates the audible distortion during abrupt transient periods. To avoid this we can use modified a priori SNR, in this α will be changed dynamically and is expressed as follows. If (k) > Thrld M (i. k) = then ξ M(i, k) = H(i,k)Y(i,k) Γ a (i.k) else + ( )P(γ(i, k) ) (9) ξ M(i, k) = M (i, k) H(i,k)Y(i,k) + ( Γ a (i.k) M (i, k)p(γ(i, k) ) () Where < α M (i, k) < is the modified factor depends on the previous a posteriori SNR and is having a chance by the following affinity M (i, k) = () γ(k) +[ max(γ(i,k),γ(i,k))+ ] Where γ(k) = (γ(i, k) γ(i, k)), the threshold Thrld= E{γ(i, k)}, k=,, K is the spectral bin index and i=,, I is the frame index, K is the length of frame and I is number of frames. The noise suppression gain function is (4) (8) H(i, k) = ξ M(i,k) +ξ M(i,k) () GAIN OF MODIFIED PERCEPTUAL WIENER FILTER The gain function of the modified perceptual wiener filter H M(i,k) is calculated by using cost function, J which is expressed as J = [ C (i, k) C(i, k) ] (3) []
[Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 Substituting equation () & (3) in (3) gives J = E{ (H M (i, k) )C(i, k) + H M (i, k)a(i, k) } J = e d + e r (4) Where e d = (H M (i, k) ) E[ C(i, k) ] and e r = H M (i, k)e[ A(i, k) ] represents the distortion energy and residual noise energy. If the residual noise is less than the auditory masking threshold then only we can make it inaudible otherwise it is audible. To make this inaudible the constraint is given as e r T M (i, k) (5) By using above constraint and substituting Γ a (i, k) = E{ A(i, k) } and Γ c (i, k) = E{ C(i, k) } in the (3) cost function will results as J = (H M (i, k) ) Γ c (i, k) + H M (i, k){max[(γ a (i, k) T M (i, k)), ]} (6) The modified perceptual wiener filter gain function can be obtained by differentiating the J with respect to H M (i, k) = Γ c (i,k) Γ c (i,k)+maxγ a (i,k) T M (i,k),) (7) By multiplying and dividing the above equation with Γ a (i, k), H M (i, k) will gives H M (i, k) = ξ M(i,k) ξ M (i,k)+max (Γ a (i,k) T M (i,k),) +μ(i,k) Γa(i,k) ξ N(i, k) = ξ M(i, k) + max (Γ a (i,k) T M (i,k),) Γ a (i,k) Substituting the eq.(9) into eq.(8) we get (8) (9) H M (i, k) = ξ M(i.k) ξ N (i,k)+μ(i,k) () Where T M (i, k) is the noise masking threshold, it is estimated based on noisy speech spectrum. THE LAGRANGE MULTIPLIER To minimize the speech distortion energy in the frequency domain while maintaining the energy of residual noise below the preset threshold the multiplier is in [] used. The multiplier [,] creates the tradeoff between the speech distortion and residual noise. If the value of μ is large would produce more speech distortion and less residual noise. If the value of μ is small would produce less speech distortion and more residual noise. The value of μ have to made based on the estimated a priori SNR ξ M(i, k) is derived as μ(i, k) = + U ( ) () +e ξ db (i,k) Where ξ db (i, k) = log ξ M(i, k) and U is constant chosen experimentally. WEIGHTED PERCEPTUAL WIENER FILTER The perceptual wiener filter causes some residual noise, due to fact that only noise greater than the noise masking threshold is percolated and below the noise masking threshold is remain, there is no guarantee for this whether it is []
[Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 audible or not. The multiplier value should be better for reducing speech distortion and residual noise, also for avoiding the non linear relationship between the μ and threshold value. To overcome these drawbacks we proposed to weight the perceptual wiener filter with multiplier using a psychoacoustic motivated weighting filter [4] and is given as W(i, k) = { H(i, k), if ATH(i, k) < Γ a T M (i, k) (), otherwise Where ATH (i,k) is the absolute threshold of hearing. The gain function for the proposed weighting factor is given as H M (i, k) = H M (i, k)w(i, k) (3) PERFORMANCE EVALUATION AND COMPARISON To compare and judge the value of the performance of the proposed speech enhancement scheme with different existing schemes, the simulation results are carried out with NOIZEOUS. It is a noisy speech corpus for evaluation of speech enhancement algorithms and database []. The data base is made up of 3 IEEE sentences by three manly and three womanly speakers corrupted by eight different real world noises with different SNRs. The quality of speech signals were degraded by several types of noises at global SNR levels of db, 5dB, db and 5dB. In this evaluation only seven noises are considered those are Babble, airport, car, exhibition, station, train and street. The objective quality measures for the proposed speech enhancement method are the segmental SNR, the PESQ and the WSS measures. These parameters are more accurate to indicate speech distortion than overall SNR. The higher value of segmental SNR indicates debilitated speech distortion. The superior PESQ score reveals the better perceptual quality. The lower value of WSS indicates the weaker speech distortion. The performance of the proposed method is compared with the spectral subtraction, wiener filter and wiener filter with multiplier. The simulation results are compared in the Table, Table and Table3. The observation of the simulation results in the table shows the proposed method have the better and accurate readings compared to existing methods. Noise Type Table. The Output Average Segmental SNR values of Enhanced Signals Input Spectral Wiener Wiener With with SNR(dB) Subtraction( Filter(dB) (db) (db) db) Babble -3.7366 -.845 -.966 -.9 5 -.737 -.3774.45.93 -.75.446.7345.8 5 -.7395.856.76.96 Airport -3.795 -.58 -.397 -.386 5 -.99 -.356.4456.4995 -.334.559.4877.559 5.7845.8.77.864 Car -3.639 -.6348 -.475.5 5 -.566.3487.3589.369 -.888.547.78.66 5.9473.7556.7535.7556 Exhibition -3.73 -.8684 -.76 -.55 5 -.9.3.66.66 -.983.894.837.93 5.864.39.937.9569 Station -3.8588 -.8558 -.668 -.35 5 -.9.3735.493.43 [3]
[Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 -.983.3478.346.457 5.864..799.588 Train -3.87 -.485 -.754 -.445 5 -.47.446.3496.6896 -.373.595.595.757 5.7646..35.35 Street -3.63 -..753.838 5 -.6857 -.8.395.475 -.4977.3536.354.49 5.4.8658.93.837 Table. The Output PESQ values of Enhanced Signals Noise Input SNR(dB) Wiener Filter(dB) Wiener With with Type (db) (db) Babble.6.467.846 5.775.636.863.344.897.544 5.69.4.673 Airport.473.56.573 5.495.66.7964.59.3.559 5.498.39.56 Car.658.3978.44 5.6946.75.784.9.8935.993 5.653.63.656 Exhibition.998.8.85 5.4547.456.4769.9846.8863.9956 5.37.486.3 Station.976.34.93 5.6663.667.677.88.663.98 5.9949.9957. Train.459.493.533 5.688.6685.76.87.88.79 5.4.4.64 Street.6364.654.75 5.6797.79.757.97.448.577 5.389.367.3994 Table.3 The Output WSS values of Enhanced Speech Signals Noise Input SNR(dB) Wiener Filter(dB) Wiener With with Type (db) (db) Babble 9.89 3.3895 9.569 5.4 5.6359.574 93.4644 9.5866 87.8359 5 83.553 76.9853 73.5549 [4]
Output Average Segmental SNR (db) [Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 Airport 9.787.8996 5.9685 5 5.56 7.77.34 83.846 77.96 7.93 5 75.74 73.63 69.57 Car 7.6957 9.68 4.779 5 97.86 9.766 87.6798 89.7359 84.676 79.3 5 79.5697 7.776 67.663 Exhibition 9.785 9.467 4.694 5 7.545 7.87 98.4 99.94 9.56 83.3784 5 8.666 73.4 7.93 Station.65.466 9.839 5 3.5956 96.4847 9.933 86.69 79.95 74.9699 5 96.937 9.865 83.768 Train 9.38 6.3934.337 5 98.9948 95.8433 89.3867 87.933 85.574 8.536 5 73.794 7.85 67.35 Street 4.976 99.895 94.36 5.884 97.639 9.88 8.644 79.5334 75.53 5 75.798 74.856 7.977 The graphical representation for the measured values in the Tables,, 3, are as follows in figure,, 3, respectively. Each figure consists of different input SNR values are compared with respective parameter. Figure represents the Average Segmental SNR, figure represents the PESQ values, figure 3 represents the WSS values of the enhanced signals. 3 - - -3-4 -5 For Babble Noise 5 5 (a ) for Babble Noise Spectral Subtraction Wiener Filter Wiener With with [5]
Output Average Segmental SNR (db) Output Average Segmental SNR (db) Output Average Segmental SNR (db) [Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 For Airport Noise 3 - - -3-4 -5 5 5 (b) for Airport Noise. Spectral Subtraction Wiener Filter Wiener With with For Car Noise 4-5 5 Spectral Subtraction Wiener Filter Wiener With with -4 (c) for Car Noise. 3 - - -3-4 -5 For Exhibition Noise 5 5 (d) For Exhibition Noise Spectral Subtraction Wiener Filter Wiener With with [6]
Output Average Segmental SNR (db) Output Average Segmental SNR (db) Output Average Segmental SNR (db) [Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 - - -3-4 -5 For Station Noise 5 5 (e) For Station Noise. Spectral Subtraction Wiener Filter Wiener With with 3 - - -3-4 -5 For Train Noise 5 5 (f) For Train Noise. Spectral Subtraction Wiener Filter Wiener With with 4 For Street Noise - -4 5 5 Spectral Subtraction Wiener Filter Wiener With with (g) For Street Noise. Fig. The graphical representation of comparison of Output Average segmental with Input SNR for different Noises are in (a) to (g). [7]
Output PESQ Values (db) Output PESQ Values (db) OutPut PESQ Values (db) Output PESQ Vlues (db) [Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6.5 For Babble Noise.5 For Airport Noise.5.5 5 5 Wiener filter with.5.5 5 5 Wiener filter with (a) Babble Noise (b)airport Noise For Car Noise For Exhibition Noise.5.5.5 Wiener filter.5 Wiener filter.5 5 5 with.5 5 5 with (c) Car Noise (d) Exhibition Noise [8]
Output PESQ Values (db) Output PESQ Values (db) Output PESQ Values (db) [Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 For Station Noise For Train Noise.5.5.5 Wiener filter.5 Wiener filter.5 with.5 with 5 5 5 5 (e) Station Noise (g) Train Noise For Street Noise 3.5.5.5 Wiener filter with 5 5 Input SNR (db) (g) Street Noise Fig. The PESQ values of different methods are compared with different Input SNRs in (a) to (g). [9]
Output WSS Values (db) Output WSS Values (db) Output WSS Values (db) Output WSS Values (db) [Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 4 8 6 4 For Babble Noise 5 5 Wiener 4 8 6 Multiplier 4 with For Airport Noise 5 5 Wiener Multiplier with (a) Babble Noise (b) Airport Noise 4 4 Wiener Wiener 8 8 6 4 Multiplier with 6 4 Multiplier with 5 5 5 5 (c) Car Noise (d) Exhibition Noise [3]
Output WSS Values (db) Output WSS Values (db) Output WSS Values (db) [Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 4 Wiener 8 Wiener 8 6 4 Multiplier with 6 4 Multiplier with 5 5 5 5 (e) Station Noise (f) Train Noise 8 6 Wiener 4 Multiplier with 5 5 (g) Street Noise Fig.3 The graphical representation of Output WSS Vs. Input SNR are compared for different methods in (a) to (g). The spectrograms of the clean speech signal, noisy speech signal and different enhanced speech signals tells the perceptual quality is improved, and noise is reduced compared to the existing methods are shown in figure 4. FREQUENCY SPECTRUM OF CLEAN SPEECH SIGNAL.9.8.7 Frequency.6.5.4.3.. 3 4 5 6 7 8 9 Time (a) Original clean speech signal. [3]
[Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 Frequency spectrum of Noisy Speech(bab.).9.8.7.6 Frequency.5.4.3.. 3 4 5 6 7 8 9 Time (b) Noisy signal (Babble noise SNR=dB). FREQUENCY SPECTRUM OF ENHANCED SIGNAL(SPECSUB).9.8.7 Frequency.6.5.4.3.. 3 4 5 6 7 8 9 Time (c) Enhanced signal using Spectral Subtraction. FREQUENCY SPECTRUM OF ENHANCED SIGNAL(WIEBAB).9.8.7 Frequency.6.5.4.3.. 3 4 5 6 7 8 9 Time (d) Enhanced Signal using Wiener filter. FREQUENCY SPECTRUM OF ENHANCED SIGNAL(WILAG BAB).9.8.7 Frequency.6.5.4.3.. 3 4 5 6 7 8 9 Time (e) Enhanced Signal using. [3]
[Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 FREQUENCY SPECTRUM OF ENHANCED SIGNAL(BAB).9.8.7 Frequency.6.5.4.3.. 3 4 5 6 7 8 9 Time (f) Enhanced Signal using Weighted PWF with Multiplier. Fig 4. Speech Spectrograms, Babble Noise with input SNR=dB (a) Original clean speech signal (b) Noisy signal(babble noise SNR=dB) (c) Enhanced signal using Spectral Subtraction (d) Enhanced Signal using Wiener filter (e) Enhanced Signal using (f) Enhanced Signal using Weighted PWF with Multiplier. CONCLUSION In this speech enhancement process, by the use of multiplier and psychoacoustic motivated weighting factor the noise below the noise masking threshold is filtered, the noise due to the non linearity between the multiplier and threshold value is avoided; the speech distortion and residual noise are reduced, the better perceptual quality is achieved. REFERENCES [] M. Priyanka, CH.V. Rama Rao, Speech Enhancement using Self Adaptive Wiener Filter based on Hybrid a priori SNR, proceedings of ICNAE & Advanced computing(icneac-),pp.-6,. [] Tsai-Tsung Han, Pei-Yun Liu, A Speech Enhancement System using Binary Mask Approach and Spectral Subtraction Method IEEE International Symposium on computer, Consumer and Control(ISCCC),pp.65-68,4. [3] Craig A. Anderson Paul D. Teal Mark A. Poletti, Multi channel Wiener Filter Estimation Using Source Location Knowledge for Speech Enhancement.4 IEEE Workshop on Statistical Signal Processing (SSP),pp.57-6,4. [4] Laksmikanth. S, Natraj. K. R, Rekha. K. R, Noise cancellation in Speech Signal Processing A Review. International Journal of Advanced Research in Computer and Communication Engineering, Vol.3, Issue,pp.575-586, January 4. [5] Vyankatesh Chapke, Prof.Harjeet Kaur, Review of Speech Enhancement Techniques using Statistical Approach, International Journal of Electronics Communication and Computer Engineering, Volume 5,pp.37-39, Issue(4) July, Technovision-4,ISSN 49-7X. [6] S. Alaya, N. Zoghlami, and Z. Lachiri, Speech Enhancement based on perceptual filter bank improvement International Journal of Speech Technology,pp.-6,4. [7] Astik Biswas and P. K. Sahu, Anirban Bhowmick and Mahesh Chandra, Acoustic Feature Extraction using ERB like Wavelet Sub-band Perceptual Wiener Filtering for Noisy Speech Recognition, 4 Annual IEEE India Conference(INDICON). [8] V. Sunnydayal, T. Kishore Kumar, Speech Enhancement using Sub-Band Wiener Filter with Pitch Synchronous Analysis,ICACCI,pp--5,IEEE-3. [9] R. Yu, A Low Complexity noise estimation algorithm based on smoothing of noise power estimation and estimation bias correction, in proc. IEEE Int. Confcoust. Speech, Signal process. Taipei, pp.44-444, April 9. [] T. Gerkmann and R. C.Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Trans. On Audio, Speech, and Language Process., Vol., no. 4, pp. 383-393, May. [] http:www.utdallas.edu/~loizou/speech/noizeus. [] P. C. Loizou, Speech Enhancement: Theory and Practice. BocaRaton, FL: CRC press, 7. [33]
[Rao* et al., 5(8): August, 6] ISSN: 77-9655 IC Value: 3. Impact Factor: 4.6 [3] J. Ma, Y.Hu, and P. Loizou, Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions, J. Acoustic. Soc. Am., Vol.5, no.5, pp.3387-345, May 9. [4] Y. Hu and P. Loizou, Incorporating a psychoacoustic model in frequency domain speech enhancement, IEEE signal processing letters, vol (), pp.7-73, 4. [5] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean square error short-time spectral amplitude estimator, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-3, pp.9-, Dec 984. [6] P. K. Daniel Lun, Tai-Chiu Hsung, Improved Wavelet Based A-Priori SNR Proc. IEEE International Symposium on Circuits and Systems (ISCAS), Paris, France, pp.38-385, May [7] Y. Hu and P. Loizou, Evaluation of objective quality measures for speech enhancement, IEEE Trans. On Audio, Speech, and language Process., vol. 6, no., pp.9-38, 8. [8] R. Schwartz M. Berouti and J. Makhoul, Enhancement of speech corrupted by acoustic noise, Proc.of ICASSP, vol., pp.8-, 979. [9] Sana Alaya, Novelene Zoghlami, Zied Lachiri, Speech enhancement using perceptual multi-band wiener filter, st International Conference on ATSIP, Sousse, Tunisia, pp.468-47, IEEE 4. [] J.D. Johnston, Transform coding of audio signals using perceptual noise criteria, IEEE Jour. Selected Areas Commun, vol.6, pp.34-33, February 988. [] Philipos C. Loizou, Gibak Kim, Reasons why current speech enhancement algorithms do not improve speech intelligibility and suggested solutions, IEEE Trans. On Audio, Speech, and Language Process., vol.9, no., pp.47-56, January. [34]