Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Similar documents
Different Approaches of Spectral Subtraction Method for Speech Enhancement

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Chapter 4 SPEECH ENHANCEMENT

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Speech Enhancement for Nonstationary Noise Environments

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Enhancement using Wiener filtering

Automotive three-microphone voice activity detector and noise-canceller

Transient noise reduction in speech signal with a modified long-term predictor

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Signal Enhancement Techniques

Robust Low-Resource Sound Localization in Correlated Noise

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

RECENTLY, there has been an increasing interest in noisy

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Convention Paper 7024 Presented at the 122th Convention 2007 May 5 8 Vienna, Austria

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Audio Restoration Based on DSP Tools

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

Introduction of Audio and Music

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

REAL-TIME BROADBAND NOISE REDUCTION

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

What is Sound? Part II

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt

Auditory modelling for speech processing in the perceptual domain

Enhanced Waveform Interpolative Coding at 4 kbps

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Converting Speaking Voice into Singing Voice

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Modulation Domain Spectral Subtraction for Speech Enhancement

Speech Enhancement Based on Audible Noise Suppression

Speech/Music Change Point Detection using Sonogram and AANN

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Mikko Myllymäki and Tuomas Virtanen

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

Estimation of Non-stationary Noise Power Spectrum using DWT

Single Channel Speech Enhancement in Severe Noise Conditions

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Voice Activity Detection Using Spectral Entropy. in Bark-Scale Wavelet Domain

VLSI Implementation of Impulse Noise Suppression in Images

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Can binary masks improve intelligibility?

Voice Activity Detection for Speech Enhancement Applications

Noise estimation and power spectrum analysis using different window techniques

Voice Activity Detection

Introduction to Audio Watermarking Schemes

The psychoacoustics of reverberation

Speech Enhancement Based On Noise Reduction

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

Research Article Subband DCT and EMD Based Hybrid Soft Thresholding for Speech Enhancement

VHF Radar Target Detection in the Presence of Clutter *

Enhancement of Speech Communication Technology Performance Using Adaptive-Control Factor Based Spectral Subtraction Method

Voiced/nonvoiced detection based on robustness of voiced epochs

Local Oscillators Phase Noise Cancellation Methods

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

1. Introduction. Keywords: speech enhancement, spectral subtraction, binary masking, Gamma-tone filter bank, musical noise.

BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music

Single-channel speech enhancement using spectral subtraction in the short-time modulation domain

Enhancement of Speech in Noisy Conditions

Quality Estimation of Alaryngeal Speech

Ultra Low-Power Noise Reduction Strategies Using a Configurable Weighted Overlap-Add Coprocessor

Performance Evaluation of Noise Estimation Techniques for Blind Source Separation in Non Stationary Noise Environment

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound

Speech Enhancement Techniques using Wiener Filter and Subspace Filter

Adaptive Optimum Notch Filter for Periodic Noise Reduction in Digital Images

SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK

ROBUST echo cancellation requires a method for adjusting

Sound Source Localization using HRTF database

Sound pressure level calculation methodology investigation of corona noise in AC substations

Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments

Fundamental frequency estimation of speech signals using MUSIC algorithm

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Phase estimation in speech enhancement unimportant, important, or impossible?

A Two-Step Adaptive Noise Cancellation System for Dental-Drill Noise Reduction

Transcription:

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC 2 Department of Multimedia and Game Science, Asia-Pacific Institute of Creativity, Miaoli, Taiwan, ROC Lucas@ms26.hinet.net Abstract. This study proposes a post-processor to reduce the effect of musical residual noise which is annoying to the human ear. First, a speech enhancement algorithm is employed to reduce background noise for noisy speech. Hence the enhanced speech is post-processed by a harmonic-adapted-median filter to reduce the musical effect of residual noise. In the case of a vowel-like spectrum, directional median filtering is performed to slightly reduce the musical effect of residual noise, where the harmonic spectrum can be well maintained. On the contrary, block median filtering is performed to heavily reduce the spectral variation for noise-dominant spectra, enabling musical tones to be significantly smoothed. Finally, the pre-processed and the post-processed spectra are fused according to speech-presence probability. Experimental results show that the proposed post processor can efficiently improve the performance of a speech enhancement system by reducing the musical effect of residual noise. Keywords: speech enhancement, spectral subtraction, musical residual noise, post-processing, harmonic. Introduction Many speech enhancement algorithms have been proposed to reduce the background noise in noisy speech []-[5]. These algorithms attempted to efficiently remove the corruption noise, but musical effect of residual noise is apparent in the enhanced speech. This musical noise is perceived as twittering and degrades the perceptual quality massively. If it is too prominent, it may be more disturbing than the inference before speech enhancement. Recently, many studies attempted to suppress the musical residual noise. Esch and Vary [6] proposed performing smoothing on the weighting gains for speech-pause and low SNR conditions, yielding the musical effect of residual noise being reduced. Jo and Yoo [3] considered a psycho-acoustically constrained and distortion minimized enhancement algorithm. This algorithm This research was supported by the National Science Council, Taiwan, under contract number NSC -222-E-468-. IST 23, ASTL Vol. 23, pp. 227-234, 23 SERSC 23 227

Proceedings, The 2nd International Conference on Information Science and Technology minimized speech distortion while the sum of speech distortion and residual noise was kept below the masking threshold. Based on the above findings, how to find an efficient method to remove the musical effect of residual noise is important for speech enhancement. In this paper, we employ a speech enhancement system to be the first stage for removing background noise; meanwhile, speech distortion should be maintained at a low level. The output signal is further processed by the harmonic-adapted-median (HAM) filter, yielding the musical effect of residual noise being efficiently reduced. An algorithm for estimating speech-presence probability [7] is employed and modified to classify the pre-processed spectrum as speech-dominant or noise-dominant. In the case of speech-dominant spectrum, the directional median filtering is performed to slightly reduce the musical effect of residual noise; meanwhile, the harmonic spectrum does not been seriously destroyed. When the value of speech-presence probability exceeds a high threshold, the spectrum is classified as a vowel. This spectrum is kept unchanged to maintain speech quality. Conversely, the block median filtering is performed to heavily reduce the spectral variation for noise-dominant spectra. Musical tones are then significantly smoothed, enabling the filtered speech to sound much less annoying than the pre-processed speech. Finally, the pre-processed and median filtered spectra are fused according to the speech-presence probability. If the value of speech-presence probability is high, the weighting of pre-processed speech is high. It enables the pre-processed to be preserved, resulting in less speech distortion in the post-processed speech. Conversely, the weighting is high for (block or directional) median filtered spectra, yielding the musical effect of residual noise being efficiently removed. Experimental results show that the proposed post processor can improve the performance of a speech enhancement system by efficiently removing the musical effect of residual noise, while speech distortion is not perceptible by the human ear for the post-processed signal. 2 Proposed Speech Enhancement System Initially, noisy speech is framed by a Hanning window, and then transformed into the frequency domain by fast Fourier transform (FFT). A minimum statistics algorithm [8] is employed to estimate the noise magnitude for each subband. Hence, this noise estimate is employed to adapt a speech enhancement system, enabling the background noise to be efficiently removed. Because the musical effect of residual noise is apparent in the pre-processed speech, a harmonic-adapted-median (HAM) filter is proposed to remove it. Noisy speech is utilized to estimate the pitch period. Hence, the robust harmonic spectra are searched for each frame. The number of robust harmonic is employed to adapt speech-presence probability which will be applied to control the fusion weighting between the pre-processed and the postprocessed signals. Each spectrum of pre-processed speech is analyzed to classify whether it is vowel-like. If the center spectrum of a local window is a vowel, the corresponding speech-presence probability would be large. The center spectrum is kept unchanged to maintain speech quality. If the value of speech-presence probability is less than a given threshold, the center spectrum is classified as vowel- 228

Reduction of Musical Residual Noise Using Harmonic-Adapted-Median Filter like. A directional median filter is employed to adjust the magnitude of the center spectrum, yielding the musical effect of residual noise being slightly reduced. Conversely, the center spectrum is classified as noise-like when the value of speechpresence probability is equal to zero. A block median filtering is performed, enabling the center spectrum to be heavily smoothed, ebabling the musical effect of residual noise to be significantly reduced. Finally, the pre-processed, the directional median filtered, and the block median filtered spectra are fused according to the speechpresence probability. In turn, the inverse FFT is performed to achieve post-processed speech. 2. Robust Harmonic Estimation A harmonic spectrum distributes in the frequency ranges from 5 to 5 Hz. We can perform low-pass filtering on noisy speech with cut-off frequency 5 Hz to obtain a low-pass signal φ (n) which can be applied to accurately estimate the pitch period by reducing the inference of high-frequency signals. In turn, we compute the autocorrelation function of the low-pass filtered signal R (τ ), given as N n= φ Rφ ( τ ) = φ( n) φ( n+ τ ) () N where N denotes frame size. In order to improve the accuracy for estimating the pitch period, an average magnitude difference function (AMDF)[9] is performed on the low-pass filtered signal φ (n), given as N ( ) τ AMDF τ = φ( n) φ( n+ τ ) (2) N n= In the position of pitch period, the value of AMDF is small, while the value of R φ (τ ) given in () is large. The ratio of AMDF and R φ (τ ) is enlarged, yielding the discriminability of pitch position increasing. It is beneficial to improve the accuracy in estimating the pitch period. A weighted autocorrelation function (WAC ) can be defined as Rφ ( τ ) WAC ( τ ) = (3) AMDF( τ ) + ε where ε is a very small value to prevent the denominator being zero. Harmonic estimation can be performed by the fundamental frequency F which can be obtained by the pitch period T, given as F = N /T (4) In the experiments, we find that the estimated fundamental frequency obtained by (4) suffers from underestimate. Thus we attempt to shift the location of fundamental frequency F to that of the spectral peak for each segment. The shifted frequency F can be expressed as * 229

Proceedings, The 2nd International Conference on Information Science and Technology * Bias F F F = (5) Bias where F denotes the offset from the fundamental frequency F obtained by (4). It can be computed by le Bias ( l) = F ( m) F '( m) le li m= li F (6) where l and i l represent the starting and ending frames of the l th segment. F '( ) e m denotes the fundamental frequency with spectral peak. Robust harmonic takes place on the multiple of fundamental frequencies, i.e., nf. The number of robust harmonic K can be decided by k k k { and k K = k F F + δ F F } F > δ (7) F k where F denotes the frequency of k th harmonic. δ F is the frequency threshold of adjacent harmonic for deciding robust harmonic. Observing (7), if the frequency offset between two adjacent harmonic varies heavily, the harmonic structure may become weak. Thus the boundary of robust harmonic can be marked. The more the number of the robust harmonic is, the higher the probability of the speech-presence is. Accordingly, we can employ the number of robust harmonic to adapt an algorithm for estimating speech-presence probability. 2.2 Speech-presence probability Speech presence can be determined by the ratio between the local energy of the noisy speech and its minimum within a specified time window. A speech-presence probability p ( m, can be computed by [7] p( m, = α p p( m, + ( α p ) I( m, (8) where α p ( α p =.2) is a smoothing parameter. I ( m, denotes an indicator function for speech-activity. It can be computed by, if ( m, > I(, m) =, o.w. δ ( m) ω (9) where δ (m) is a speech-presence threshold for a power ratio ( m, (the ratio between the smoothed local power and the minimum power in a local segment). In [7], the speech-presence threshold for the power ratio δ (m) is set to a constant 5. Here we modify this threshold by adapting with the number of robust harmonic K given in (7). If a frame is vowel-like, the speech indicator I ( m, should approach unity. Thus a weak vowel can be classify as speech-presence frame. The ratio δ (m) can be expressed by δ max δ min δ ( m) = δ max K () 2 23

Reduction of Musical Residual Noise Using Harmonic-Adapted-Median Filter where δ max and δ min are empirically chosen to 8 and 3, respectively. In order to prevent the threshold δ (m) from being too small or negative, a lower bound for the threshold δ (m) should be provided, given as δ (m) = max{ δ ( m ), δ min}. The value of speech-presence probability lies between and as shown in (8). We can employ it to control the fusion weighting for the pre-processed and the postprocessed spectra. 2.3 Directional-and-Block Median Filtering Directional median filtering is performed when a frame has strong harmonic structure. The direction candidates are shown in Fig., where the center spectrum is denoted by a filled circle. A center spectrum is classified as vowel-like when the number of robust harmonic is great enough. In turn, we further check whether the center spectrum is a vowel by the speech-presence probability. If the value of speechpresence probability exceeds a given threshold, the center spectrum is classified as a vowel and kept unchanged to maintain speech quality. On the other hand, if the value of speech-presence probability lies between.2 and.8, the center spectrum is classified as vowel-like and filtered by a directional median filter, given as ~ * M ( m, ω ) = median{ S ( m + m, ω +,( m, i } () where i* denotes the optimum direction. ~ S ( m, represents pre-processed spectrum. 3 2 Fig.. Motion directions of the center spectrum. As shown in Fig., the optimum motion direction of the center spectrum should be selected among three candidate directions (-3). The decision rule is finding the minimum spectral-distance among the three directions. The spectral-distance measure ( ) d i ( m, can be expressed by d ( i) ( m, = ~ 2 (2) m ω [ S ~ ( m + m, ω + S ~ ( m, ] S ( m, where i denotes the direction index of the center spectrum, i.e., i 3. The minimum of spectral-distance measure given in (2) is declared as the optimum motion direction for the center spectrum. The optimum distance measure is given as d ( i*) ( i) { d ( m,, 3} ( m, ω ) = min i (3) The directional median filter can mitigate the fluctuation of random spectral peaks, enabling the musical effect of residual noise to be reduced. In order to improve the performance in the reduction of musical tones, we employ a block median filter to significanlty smooth the variation of musical tones when a center spectrum is 2 23

Proceedings, The 2nd International Conference on Information Science and Technology classified as noise-like. The larger the size of the window is, the greater the reduction of the spectral variation is. However, increasing window size causes a quantity of speech distortion. Therefore, we adopt the window size 3 3 to analyze and filter the pre-processed spectra. 3 Experimental Results In the experiments, a speech signal is Mandarin Chinese spoken by five female and five male speakers. Noisy speech is obtained by corrupting clean speech with white, F6-cockpit, factory, and helicopter-cockpit noise signals which were extracted from the Noisex-92 database. Three SNR levels are of, 5 and dbs, which were used to evaluate the performance of a speech enhancement system. The Virag [] and the two-step-decision-directed (TSDD) [5] speech enhancement algorithms were also conducted as the first stage for comparisons. Table. Comparisons of Segmental SNR improvement for enhanced speech in various noise corruptions. SNR Average SegSNR improvement Noise type (db) TSDD TSDD+Post Virag Virag+Post 6.82 7. 6.38 7.86 White 5 4.79 4.96 4.9 5.9 3.4 3.25 3.48 4.5 4.99 5.4 5.9 6.25 F6 5 3.52 3.8 3.66 4.75 2.32 2.57 2.39 3.42 4.7 4.85 4.64 5.48 Factory 5 3.37 3.58 3.2 4.26 2.23 2.53.97 3. 6.75 7.22 6.44 7.6 Helicopter 5 4.87 5.47 4.7 5.9 3.24 3.92 3.9 4.33 Table presents the performance comparisons in terms of the average segmental SNR improvement. Cascading the proposed post processor after the TSDD (TSDD+Post) and the Virag (Virag+Post) methods performs better than that without using post-processing methods (Virag and TSDD). The major reason is attributed to the fact that the proposed method can remove much more quantity of musical residual noise; meanwhile, the speech components are not seriously deteriorated. Table 2 presents the performance comparisons in terms of the perceptual evaluation of speech quality (PESQ). The maximal PESQ score corresponds to the best speech quality. We can find that a speech enhancement method with post processing obtains higher PESQ score than that without post-processing. It shows that the proposed postprocessing method does not seriously deteriorate speech components while efficiently 232

Reduction of Musical Residual Noise Using Harmonic-Adapted-Median Filter suppressing the musical effect of residual noise. These results are consistent with that in terms of average segmental SNR improvement shown in Table. Table 2. Comparisons of perceptual evaluation of speech quality (PESQ) for enhanced speech in various noise corruptions. SNR PESQ Noise type (db) TSDD TSDD+Post Virag Virag+Post 2.5 2.2 2.7 2.33 White 5 2.36 2.42 2.45 2.69 2.65 2.7 2.8 2.98 2.8 2.24 2.29 2.44 F6 5 2.5 2.56 2.63 2.79 2.8 2.85 2.97 3..97 2.6 2.2 2.2 Factory 5 2.37 2.43 2.58 2.6 2.7 2.77 2.93 2.96 2.43 2.52 2.55 2.7 Helicopter 5 2.75 2.83 2.88 3.2 3.5 3. 3.6 3.29 (a) (d) (b) (e) (c) Fig. 2. Spectrograms of speech spoken by a female speaker, (a) clean speech, (b) noisy speech (corrupted by F6-cockpit noise with average segmental SNR = 5 db), (c) enhanced speech using TSDD method, (d) enhanced speech using TSDD method with post processing, (e) enhanced speech using Virag method, (f) enhanced speech using Virag method with post processing. Figure 2 shows the spectrograms of a speech signal which is corrupted by F6- cockpit noise with average segmental SNR equaling 5 db. It can be found that the post-processed speech (Figs. 2(d) and (f)) does not seriously deteriorate speech spectra. The harmonic structures of post-processed speech are very similar to that without post-processing (Figs. 2(c) and (e)). In Fig. 2(c), plenty of isolated spectral peaks with strong energy exist in speech-pause regions for the TSDD method. After post-processing by the proposed method, these isolated patches can be whiten (Fig. (f) 233

Proceedings, The 2nd International Conference on Information Science and Technology 2(d)), yielding the musical effect of residual noise being reduced. Comparing Figs. 2(e) and (f), there is a quantity of residual noise in the enhanced speech of Virag method which is annoying to the human ear. This noise can be significantly removed by the proposed post-processor (Fig. 2(f)). The major reason is attributed to residual noise being efficiently smoothed by block median filter, enabling the isolated random spectral peaks to vary smooth over successive frames and neighbor subbands. Accordingly, the musical effect of residual noise is efficiently reduced, resulting in the post-processed speech sounding less annoying than that without post-processing. 4 Conclusions Employing the harmonic-adapted-median filter (HAM) to post-process enhanced speech was proposed in this study. The major contribution is to significantly reduce the spectral variation of residual noise by block median filtering in a noise-dominant region, and to slightly smooth residual noise by directional median filtering in a speech-dominant region. Hence, the pre-processed the the (block or directional) median filtered spectra are adequately fused according to speech-presence probability. It ensures that the spectra in speech-dominant regions will not be severely deteriorated by the proposed post-processor. Experimental results show that the proposed post-processor can efficiently reduce the musical effect of residual noise for a speech enhancement system, yielding the post-processed speech sounding more comfortable than that without post-processing. In addition, the proposed postprocessor can be also cascaded after various kinds of speech enhancement systems. References. Virag, N.: Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System. IEEE Trans. Speech Audio Process. 7(2), 26--37 (999) 2. Lu, C.-T.: Enhancement of Single Channel Speech Using Perceptual-Decision-Directed Approach. Speech Commun. 53(4), 495--57 (2) 3. Jo, S., Yoo, C.D.: Psychoacoustically Constrained and Distortion Minimized Speech Enhancement. IEEE Trans. Audio Speech, Language Process. 8(8), 299--2 (2) 4. Ding, J., Soon, I.Y., Yeo, C.K.: Over-Attenuated Components Regeneration for Speech Enhancement. IEEE Trans. Audio Speech Language Process. 8(8), 24--24 (2) 5. Plapous, C., Marro, C., Scalart, P.: Improved Signal-to-Noise Ratio Estimation for Speech Enhancement. IEEE Trans. Audio Speech Languge Process. 4(6), 298--28 (26) 6. Esch, T. Vary, P.: Efficient Musical Noise Suppression for Speech Enhancement Systems. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 449-442. IEEE Press, New York (29) 7. Cohen, I., Berdugo, B.: Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement. IEEE Signal Process. Lett. 9(), 2--5 (22) 8. Martin, R.: Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics.: IEEE Trans. Speech Audio Process. 9(5) 54--52 (2) 9. Shimanura, T., Kobayashi, H.: Weighted Auto-Correlation for Pitch Extraction of Noisy Speech. IEEE Trans. Speech Audio Process. 9(7) 727--73 (2) 234