Advances in Applied and Pure Mathematics

Enhancement of speech signal based on application of the Maximum a Posterior Estimator of Magnitude-Squared Spectrum in Stationary Bionic Wavelet Domain MOURAD TALBI, ANIS BEN AICHA 1 mouradtalbi196@yahoo.fr, 2 ben.aicha.anis@gmail.com 1 High institute of applied mathematics and informatics of Kairouan, university of Kairouan, Tunisia 2 Laboratory of COSIM (SUP'COM), high school of communications, Tunis, Tunisia Abstract In this paper we propose a new speech enhancement technique based on the application of the Maximum a Posterior Estimator of Magnitude-Squared Spectrum (MSS-MAP) in Stationary Bionic Wavelet Domain. This technique consists at first step in applying the Stationary Bionic Wavelet Transform (SBWT) to the noisy speech signal and then applying the Maximum A Posterior Estimator of Magnitude- Squared Spectrum, to each stationary bionic wavelet sub-band in order to enhance it. The enhanced speech signal is obtained by applying the inverse of the SBWT, SBWT -1 to enhanced stationary wavelet coefficients. In order to evaluate the proposed technique, we have compared it some previous works such as MSS-MAP based denoising technique. This evaluation was performed on a number of Arabic speech sentences corrupted by different types of noise such as Gaussian white, Car, Tank, F16 and Pink noises. The obtained simulation results show that the proposed technique outperforms the others techniques used in our evaluation. Keywords: Stationary Bionic Wavelet Transform, Maximum a Posterior Estimator of Magnitude- Squared Spectrum, Speech enhancement. 1. Introduction Speech enhancement and the uncorrelated additive noise are important problems that have received much attention in the last two decades. This is the result of the rising employment of the speech processing systems in diverse real environments. The noise presence affects the speech processing systems performance. Those systems include speech recognition, mobile phones hearing aids, and voice coders. The speech enhancement aim is to improve the intelligibility and perceptual quality of speech by minimizing the effect of noise. Existing techniques for this task include Wiener filtering [1-5], spectral subtraction [6, 7], wavelet transform (WT) [8-14, 35], etc. An emerging tendency in the speech enhancement domain consists of employing a filter bank based on a specific psychoacoustic model of human auditory system (Critical bands). The principle behind this is based on the fact that embedding the model of psychoacoustic of human auditory system in filter bank can improve the intelligibility and the perceptual quality of speech. Furthermore, it is well known that the human auditory system can approximately be described as a nonuniform bandpass filter bank and humans are able to detect the desired speech in noisy environments without noise prior knowledge [15]. Different frequency transformations (scales) are proposed to consider the hearing perceptive aspect (ERB, Bark, Mel and so on). It deserves mentioning that the majority of the perceptual speech enhancement techniques are based on the wavelet packet transform [10, 11, 13, 15-18]. furthermore, the wavelet packet transform was effectively combined with other denoising techniques in order to ameliorate the speech enhancement techniques performance. They include the ISBN: 978-960-474-380-3 301

Wiener filtering [19], adaptive filtering [20], spectral subtraction [21-23], Ephraim and Malah approach [15] and coherence function [24]. The rest of the paper is organized as follows: Section 2 describes the proposed speech enhancement technique by giving a detailed overview of the 2. The proposed In this paper we propose a new speech enhancement technique based on the application of the Maximum A Posterior Estimator of bionic wavelet transform (BWT) and the Stationary Bionic Wavelet Transform (SBWT). Section 3 presents the objective quality measurement techniques. Experimental results are presented and discussed in section 4. Finally, the conclusion is given in section 5. Magnitude-Squared Spectrum (MSS-MAP) [25] in Stationary Bionic Wavelet Domain. The bloc diagram of the proposed technique is given by Figure 1. Figure.1. The bloc diagram of the proposed technique. As shown in figure, the proposed technique consists at first step in applying the SBWT to the noisy speech signal in order to obtain eight noisy stationary bionic wavelet subbands,. Then the MSS-MAP is applied to each subband in order to obtain eight enhanced stationary bionic wavelet subbands,. Finally the enhanced speech signal is obtained by applying the inverse of SBWT, SBWT-1 to the enhanced stationary bionic wavelet subbands,. 2.1. The Bionic Wavelet Transform By referring to the perceptual model, Yao and Zhang [14] have proposed the Bionic Wavelet Transform (BWT) as a new time-frequency method. The term bionic means that the BWT is guided by an active biological mechanism [18]. Furthermore, the BWT decomposition is both perceptually scaled and adaptive [16]. The initial perceptual aspect of the transform comes from the logarithmic spacing of the baseline scale variables, which are designed to match basilar membrane spacing [16]. Then, two adaptation factors control the time-support employed at each scale, based on a non-linear perceptual model of the auditory system [16]. The basis of this transform is the Giguerre Woodland non-linear transmission line model of the auditory system [19, 20], an active-feedback electro-acoustic model incorporating the auditory canal, middle ear and cochlea [16]. The model yields estimates of the time-varying acoustic compliance and resistance along the displaced basilar membrane, as a physiological acoustic mass function, cochlear frequencyposition mapping, and feedback factors representing the active mechanisms of outer hair cells. The net result can be seen as a method for the estimation of the time-varying quality factor of the cochlear filter banks as the input sound waveform function [16]. Giguere and Woodland [20] and Yao and Zhang [14] give the complete details on the elements of this model. The BWT adaptive nature is ensured by a timevarying linear factor representing the ISBN: 978-960-474-380-3 302

scaling of the cochlear filter bank quality factor at each scale over time [16]. For each scale and time, the BWT adaptation factor is calculated by employing the update equation [16]: (1) Where is a constant (typically ) that represents non-linear saturation effects in the cochlear model [14, 16]. The quantities and are respectively, the active gain factor, which represents the outer hair cell active resistance function, and the active gain factor representing the time-varying compliance of the basilar membrane [16]. Practically speaking, the partial derivative in equation (2) can be approximated by using the first difference of the previous points of the BWT at that scale [16]. represents the BWT of the signal. It is given by: (2) Where denotes the parameter of scale, the time the shifting parameter in time and is the mother wavelet envelope given by [18]: (3) Where is the base fundamental frequency of the unscaled mother wavelet. In practice, is equals to 15165.4 for the human auditory system [14]. The discretization of the scale is achieved by employing a predetermined logarithmic spacing across the desired frequency range, so that at each scale the center frequency is expressed by [16]: (4) For the implementation performed in [16] and based on original work for cochlear implant coding (Yao and Zhang, 2002), coefficients at 22 scales, are computed employing numerical integration of the Continuous Wavelet Transform (CWT) [16]. These 22 scales are corresponding to center frequencies logarithmically spaced from 225 Hz to 5300Hz [16]. In formula (4), the role of the first factor multiplying is to ensure that the energy remains unchanging for each mother wavelet. The role of the second factor is to adjust the envelope without adjusting the central frequency of [18]. Consequently, the major difference between CWT and BWT is based on the fact that the time-frequency resolution achieved by BWT can be adjusted in an adaptive manner not only by frequency variation of the signal but also by instantaneous amplitudes of this signal. It is the mother wavelet that makes the CWT adaptive, while the adaptive characteristic of the BWT comes from the mechanism of active control in the human auditory model, which adjusts the mother wavelet associated to BWT according to the analyzed signal. Basically, the idea of the BWT is inspired from the fact that we need to make the mother wavelet envelope variable in time according to the signal characteristics. The employed mother wavelet in the reference [18] is the Morlet wavelet and its envelope is given by [16]: (5) Where denotes the initial time-support. It can be shown [15, 18] that obtained BWT coefficients are derived by using the following formula [16]: (6) ISBN: 978-960-474-380-3 303

Where (7) is given by: where C represents a normalizing constant calculated from the squared mother wavelet integral. This representation yields to an effective computational technique for calculating in direct manner, the BWT coefficients from those of the wavelet transform without using the BWT definition given by equation (3). There are some key differences between the discretized CWT employing the Morlet wavelet used for the BWT and a filter-bank-based WPT employing an orthonormal wavelet. One of them is that the WPT provides a perfect reconstruction, while the discretized CWT is an approximation whose exactness depends on the number and placement of frequency bands selected [16]. 2.2. Stationary Bionic Wavelet Transform (SBWT) As previously mentioned, in this paper, we have applied a new wavelet transform which we call Stationary Bionic Wavelet Transform (SBWT). This new transform is obtained by replacing the Continuous Wavelet Transform (CWT) used in the computation of the Bionic Wavelet Transform, by the Stationary Wavelet Transform (SWT). As shown in Figure 2, we can see the difference between the SBWT and BWT. The part (a) of the Figure.2. shows the different steps of the application of the BWT and its inverse BWT-1. The bionic wavelet coefficients are obtained by multiplying the continuous wavelet coefficients by the K factor. Those continuous wavelet coefficients are obtained by the Continuous Wavelet Transform (CWT) application to the signal. To obtain the reconstructed signal, the Bionic Wavelet coefficients are multiplied by the inverse of the factor K,. Then the inverse of the Stationary Wavelet Transform, SWT -1, is finally applied to the obtained coefficients. The part (b) of the Figure.1, shows the different steps of the application of the SBWT and its inverse SBWT - 1. The stationary bionic wavelet coefficients are obtained by multiplying the stationary wavelet coefficients by the K factor. Those stationary wavelet coefficients are obtained by the Stationary Wavelet Transform (SWT) application to the signal. The reconstructed signal is obtained by multiplying at first step the stationary bionic wavelet coefficients by the and then applying the inverse of SWT, SWT -1. (a) ISBN: 978-960-474-380-3 304

(b) Figure.2. (a) The Bionic Wavelet Transform (BWT) and its inverse (BWT-1), (b) The Stationary Bionic Wavelet Transform (SBWT) and its inverse (SBWT-1) Tables 1 and 2 report the values of the Mean Squared Error (MSE) between the reconstructed and the original speech signals calculated by the application of the Bionic Wavelet transform and its inverse and the Stationary Bionic Wavelet Transform and its inverse. They show clearly that the better results are obtained from the application of the SBWT with ten scales. Consequently the stationary bionic wavelet transform permits to obtain a perfect reconstruction of speech signals. Table 1. Case of Female Voice. MSE Speech signal SBWT BWT Scale Number 10 22 30[26] Signal1 2.9859e-007 1.3109e-004 8.6425e-005 Signal2 4.8138e-007 0.0012 0.0011 Signal3 6.5933e-007 4.6588e-004 2.3876e-005 Signal4 8.1900e-007 4.5500e-004 2.3380e-004 Signal5 8.4110e-007 0.0019 0.0016 Signal6 4.5179e-007 2.6486e-004 1.1854e-004 Signal7 1.3790e-006 3.7538e-004 2.5310e-004 Signal8 5.6236e-007 3.2596e-004 1.4946e-004 Signal9 4.5083e-007 3.2943e-004 1.7669e-004 Signal10 1.1775e-006 0.0030 0.0021 ISBN: 978-960-474-380-3 305

Table 2. Case of Male Voice. MSE Speech signal SBWT BWT Scale Number 10 22 30[26] Signal1 2.9859e-007 0.0063 7.5171e-004 Signal2 4.8138e-007 0.0041 4.0659e-004 Signal3 6.5933e-007 0.0032 2.4023e-004 Signal4 8.1900e-007 0.0049 3.5989e-004 Signal5 8.4110e-007 0.0033 1.7969e-004 Signal6 4.5179e-007 0.0064 7.5298e-004 Signal7 1.3790e-006 0.0076 9.0613e-004 Signal8 5.6236e-007 0.0043 6.1179e-004 Signal9 4.5083e-007 0.0050 0.0013 Signal10 1.1775e-006 0.0028 2.0509e-004 3. Performance evaluation In this part of the paper, a number of objective tests used for speech enhancement techniques evaluation, are presented. 3.1. Signal-to-noise ratio The signal-to-noise ratio (SNR) of the enhanced speech signal is defined by: (8) where and represent respectively the original and enhanced speech signals, and is the samples number per signal. 3.2. Segmental signal to noise ratio The segmental signal-to-noise ratio (segsnr) is calculated by averaging the frame based SNRs over the signal: (9) where is the number of frames, is the size of frame, and is the beginning of the m-th frame. As the SNR can become negative and very small during silence periods, the segsnr values are limited to the range of [-10dB, 35dB]. 3.3. Itakura-Saito distance The distance of Itakura-Saito (ISd) measures the spectrum changes and can be computed employing the coefficients of linear prediction (LPC) according to the following equation: (10) where represents the LPC vector of the original speech signal. is the matrix of autocorrelation and is the LPC coefficients vector of the enhanced speech signal. In this paper, a 10th order LPC based measure is employed. 3.4. Perceptual evaluation of speech quality The perceptual evaluation of speech quality (PESQ) algorithm is an objective quality measure that is approved as the ITU-T recommendation P.862. It is a tool of objective measurement conceived to predict the results of a subjective Mean Opinion Score (MOS) test. It was proved that the PESQ is more reliable and correlated better with MOS than the traditional objective speech measures. 4. Results and evaluation Table 3, Table 4, Table 5 and Table 6 report the obtained results from SNR, SSNR, ISd and ISBN: 978-960-474-380-3 306

PESQ computation. These results are obtained by the application of the proposed speech enhancement technique, the technique of Loizou [27] based on Maximum a Posterior Estimator of Magnitude-Squared Spectrum (MSS-MAP) and Wiener Filtering on a number of noisy speech signals. These noisy speech signals are sampled at 16kHz and recorded from two voices, Male and Female. They are obtained by corrupting the original signals by different types of noise (car, white, tank, pink and F16) at different values of SNR (-5 to 15dB). Table 3. SNR measures obtained for noisy and enhanced speech signal. Noise Enhancement SNR(dB) Type technique Car Noisy 15 10 5 0-5 The proposed 20.161 18.72 16.8434 13.9896 10.2279 MSS-MAP[25] 19.997 18.598 16.3226 13.3676 10.0970 Wiener filter [26] 23.005 18.49 14.19 10.01 5.49 White Noisy 15 10 5 0-5 The proposed 20.579 17.721 14.1780 10.8897 7.43 MSS-MAP [25] 20.369 17.231 13.7811 10.7113 7.43 Wiener filter [26] 21.1877 17.121 12.7166 8.0649 3.4383 Tank Noisy 15 10 5 0-5 The proposed 20.304 16.914 13.2895 10.1871 6.4979 techniqe: MSS-MAP [25] 19.468 16.142 12.4587 9.4415 6.0316 Wiener filter [26] 20.8854 16.585 12.1807 7.6567 3.3172 Pink Noisy 15 10 5 0-5 The proposed 19.830 15.900 12.3911 9.0504 6.0741 techniqe: MSS-MAP [25] 19.591 15.763 12.1604 8.8551 6.0041 Wiener filter [26] 20.6119 16.556 12.4432 8.1910 3.7558 F16 Noisy 15 10 5 0-5 The proposed 18.622 15.342 11.4522 7.4723 3.6651 techniqe: MSS-MAP [25] 18.3656 15.025 10.8765 7.4011 3.2976 Wiener filter [26] 19.9325 15.675 11.4406 7.0666 2.4219 ISBN: 978-960-474-380-3 307

Table 4. SSNR measures obtained for noisy and enhanced speech signal Noise Enhancement SSNR(dB) Type technique Car Noisy 2.1559-0.5162-2.8454-5.1594-5.31 The proposed 8.5204 6.4097 3.8178 1.2115 3.65 MSS-MAP [25] 8.1725 6.4703 3.5663 0.8383-1.0961 Wiener filter [26] 16.86 13.19 9.91 6.75 3.32 White Noisy 1.9110-0.6606-3.0845-5.3875-5.94 The proposed 8.2984 5.5680 2.2030-0.3770 3.49 MSS-MAP [25] 8.1725 5.4148 1.9166-0.3494-2.2092 Wiener filter [26] 4.9745 2.5164 0.2528-2.0195-4.1193 Tank Noisy 1.8853-0.6841-3.1256-5.4297-7.3542 The proposed 7.3838 4.4624 1.2797-0.8800-2.6187 MSS-MAP [25] 6.9422 3.9915 1.0079-1.1889-2.8211 Wiener filter [26] 4.7898 2.2460 0.0137-2.1744-4.1477 Pink Noisy 2.0172-0.5460-2.9847-5.2914-7.3408 The proposed 7.2670 4.2152 0.9147-1.7745-3.1823 MSS-MAP [25] 7.2374 4.1414 0.7276-1.8083-3.0894 Wiener filter [26] 4.6302 1.9805-0.1367-2.1615-4.1084 F16 Noisy 2.0810-0.5095-2.9326-5.2430-7.2997 The proposed 5.7616 2.5033-0.1986-2.1676-4.0260 MSS-MAP [25] 5.7587 2.5631-0.2729-2.1863-4.1981 Wiener filter [26] 3.9806 1.6272-0.5136-2.5409-4.4837 ISBN: 978-960-474-380-3 308

Table 5. ISd measures obtained for noisy and enhanced speech signal Noise Enhancement ISd Type technique Car Noisy 4.9967e- 6.0156e-005 6.1621e-004 0.0050 0.0275 006 The proposed 1.3030e- 1.6658e-004 4.1258e-004 0.0021 0.0086 004 MSS-MAP [25] 1.0561e- 2.0344e-004 6.0310e-004 0.0028 0.0035 004 Wiener filter [26] 6.3473e- 006 1.2432e-005 3.9328e-004 0.0045 0.0245 White Noisy 1.1292 1.6417 2.3413 3.2621 4.76 The proposed 0.1155 0.1719 0.4559 0.7993 1.83 MSS-MAP [25] 0.1479 0.2073 1.2135 2.0319 7.2008 Wiener filter [26] 0.9988 2.0820 3.8875 7.2401 12.7610 Tank 0.1587 0.4058 0.9289 1.7374 2.8022 The proposed 0.0512 0.1784 0.3487 0.7552 1.9226 MSS-MAP [25] 0.0520 0.1806 0.3536 0.7802 1.9426 Wiener filter [26] 0.0569 0.1756 0.3799 0.8604 1.6731 Pink Noisy 1.4426 2.7376 4.7947 7.4170 10.1448 The proposed 0.3464 0.0702 0.0371 0.3972 1.2100 MSS-MAP [25] 0.2544 0.0640 0.0543 0.4323 1.2157 Wiener filter [26] 0.1259 0.4992 1.2910 2.6714 5.2027 F16 Noisy 2.1363 3.3382 4.9733 7.0885 8.9794 The proposed 0.0951 0.4834 1.2051 2.2309 4.5913 MSS-MAP [25] 0.0974 0.5361 1.3204 2.3639 5.1410 Wiener filter [26] 0.6339 1.3789 2.4218 3.7805 5.9566 Table 6. PESQ measures obtained for noisy and enhanced speech signal Noise Enhancement PESQ Type technique Car Noisy 3.7450 3.2882 2.9420 2.5746 2.1255 The proposed 3.9584 3.3897 3.2274 2.9797 2.5735 MSS-MAP [25] 3.3321 3.3732 3.0236 2.8512 2.5210 ISBN: 978-960-474-380-3 309

Wiener filter [26] 4.004 3.70 3.23 2.84 2.41 White Noisy 2.2374 1.8546 1.4516 1.0718 0.80 The proposed 2.9181 2.6264 2.3317 2.0637 1.4718 MSS-MAP [25] 2.8543 2.6076 2.2954 2.0224 1.4626 Wiener filter [26] 2.8057 2.5104 2.1553 1.7385 1.3165 Tank Noisy 2.5212 2.1697 1.7733 1.3195 1.0627 The proposed 3.0913 2.7666 2.4502 2.0789 1.7005 MSS-MAP [25] 2.9792 2.7020 2.3319 2.0428 1.6253 Wiener filter [26] 2.8820 2.5881 2.2521 1.8824 1.5094 Pink Noisy 2.3123 1.9230 1.4617 1.0150 0.8336 The proposed 3.0010 2.6412 2.2504 1.8479 1.5393 MSS-MAP [25] 2.9198 2.5412 2.2560 1.8428 1.3667 Wiener filter [26] 2.8583 2.5228 2.1350 1.7177 1.2906 F16 Noisy 2.2921 1.9029 1.3823 0.9936 0.8547 The proposed 2.8310 2.5817 2.1507 1.8484 1.5413 MSS-MAP [25] 2.7609 2.5255 2.1500 1.7562 1.4342 Wiener filter [26] 2.8252 2.4934 2.0996 1.7020 1.3232 Figure.3. an example of denoising speech signal The obtained results show that the proposed technique outperforms the others techniques used in our evaluation. corrupted by Car noise: (a) clean speech, (b) noisy speech (SNR=10dB), (c) Denoised speech signal using the proposed technique. Figure 3 illustrates an example of speech enhancement using the proposed technique. 8 7 Clean Speech Signal 30 6 20 5 10 Freq (khz) 4 3 0 2-10 1-20 0 0 0.5 1 1.5 2 2.5 Time (sec) This figure shows clearly that the proposed technique reduces efficiently the noise while ISBN: 978-960-474-380-3 310

preserving the quality of the original speech signal. (a) Freq (khz) Freq (khz) 8 7 6 5 4 3 2 1 Noisy Speech Signal 0 0 0.5 1 1.5 2 2.5 Time (sec) 8 7 6 5 4 3 2 1 (b) Enhanced Speech Signal 0 0 0.5 1 1.5 2 2.5 Time (sec) (c) Figure 4. (a) The spectrogram of the clean speech signal, (b) The spectrogram of the noisy speech signal (speech signal corrupted by car noise with SNR=dB), (c) The spectrogram of the enhanced speech signal. 5. Conclusion In this paper, we propose a new speech enhancement technique based on the application of the Maximum a Posterior Estimator of Magnitude-Squared Spectrum (MSS-MAP) in Stationary Bionic Wavelet Domain. The evaluation of the proposed technique is performed by comparing it to the speech enhancement technique based on MSS-MAP and the technique based on Wiener filtering. This evaluation is based on the use of a number of objective criterions which are the SNR, SSNR, ISd and PESQ. We have also used in this evaluation a number of speech signals (ten 30 20 10 0-10 -20 30 20 10 0-10 -20 sentences pronounced in Arabic language by a Male voice and ten others pronounced by a Female voice) and different types of noises which are Car, White, F16, Tank and pink noises. The obtained results from the application of the proposed technique ( ), the technique based on MSS-MAP and the third technique based on Wiener Filtering to the used noisy speech signal, show that the proposed technique outperforms the two others techniques. References [1] J. S. Lim and A. V. Oppenheim. Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12):1586-1604, 1979. [2] Y. Ephraim and D. Malah. Speech enhancement using a minimum mean square error short time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Processing, 32:1109-1121, 1984. [3] Y. Ephraim and D. Malah. Speech enhancement using a minimum mean square error log spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Processing, 33:443-445, 1985. [4] D. Malah, R. V. Cox, and A. J. Accardi. Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments. In ICASSP, volume 2, pages 789-792, 1999. [5] Scalart, P. and Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 629-632. [6] M. Berouti, R. Schwartz, and J. Makhoul. Enhancement of speech corrupted by acoustic noise. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, volume 4, pages 208-211, 1979. [7] S. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE tran. Signal Processing, 27(2):113-120, 1979. [8] J.W. Seok and K.S. Bae. Speech enhancement with reduction of noise components in the wavelet domain. In ICASSP 97, pages 1223-1326, Munich, Germany, April 1997. ISBN: 978-960-474-380-3 311

[9] M. Bahoura and J. Rouat. Wavelet speech enhancement using the Teager energy operator. IEEE Signal Processing Letters, 8:10-12, 2001. [10] M. Bahoura and J. Rouat. Wavelet speech enhancement based on time-scale adaptation. Speech Communication, 48(12):1620-1637, 2006. [11] I. Cohen. Enhancement of speech using bark-scaled wavelet packet decomposition. In Eurospeech 2001, pages 1933-1936, Aalborg, Denmark, 2001. [12] C. T. Lu and H. C. Wang. Enhancement of single channel speech based on masking property and wavelet transform. Speech Communication, 41(2-3):409-427, 2003. [13] S. H. Chen and J. F. Wang. Speech enhancement using perceptual wavelet packet decomposition and teager energy operator. J. VLSI Signal Process. Syst., 36(2-3):125-139, 2004. [14] Y. Hu and P. C. Loizou. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio Processing, 12(1):59-67, 2004. [15] H. Ta_smaz and E. Er_celebi. Speech enhancement based on undecimated wavelet packet-perceptual _lterbanks and MMSE-STSA estimation in various noise environments. Digital Signal Processing, 18(5): 797-812, 2008. [16] I. Pint_er. Perceptual waveletrepresentation of speech signals and its application to speech enhancement. Computer Speech and Language, 10(1): 1-22, 1996. [17] M. T. Johnson, X. Yuan, and Y. Ren. Speech signal enhancement through adaptive wavelet thresholding. Speech Communication, 49(2):123-133, 2007. [18] C. T. Lu and H. C. Wang. Speech enhancement using hybrid gain factor in criticalband-wavelet-packet transform. Digital Signal Processing, 17(1):172-188, 2007. [19] D. Mahmoudi. A microphone array for speech enhancement using multiresolution wavelet transform. In Proc. Of Eurospeech'97, pages 339-342, Rhodes, Greece, September 1997. [20] C. H. Yang, J. C. Wang, J. F. Wang, H. P. Lee, C. H. Wu, and K. H.Chang. Multiband subspace tracking speech enhancement for in-car human computer speech interaction. Journal of Information Science and Engineering, 22(5):1093-1107, 2006. [21] T. Gulzow, A. Engelsberg, and U. Heute. Comparison of a discrete wavelet transformation and nonuniform polyphase _lterbank applied to spectral- subtraction speech enhancement. Signal Processing, 64:5-19, 1998. [22] R. Nishimura, F. Asano, Y. Suzuki, and T. Sone. Speech enhancement using spectral subtraction with wavelet transform. Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi), 81(1):24-31, 1998. [23] Y. Shao and C. H. Chang. A generalized time-frequency subtraction method for robust speech enhancement based on wavelet _lter banks modeling of human auditory system. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 37(4):877-889, 2007. [24] J. Sika and V. Davidek. Multi-channel noise reduction using wavelet _lter bank. In EuroSpeech'97, pages 2595-2598, Rhodes, Greece, Spetember 1997. [25] Yang Lu and Philipos C. Loizou, Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011. [26] Talbi M., Salhi L., Abid S. and Cherif A. 2010. Recurrent Neural Network and Bionic Wavelet Transform for speech enhancement. Int. J. Signal and Imaging Systems Engineering,, Vol.3, No. 2, pp.136-144. [27] Philipos C. Loizou, Speech Enhancement Theory and Practice, Taylor & Francis, 2007. ISBN: 978-960-474-380-3 312