Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

Similar documents
ARTICLE IN PRESS. Signal Processing

Speech Enhancement for Nonstationary Noise Environments

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

RECENTLY, there has been an increasing interest in noisy

Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging

Speech Signal Enhancement Techniques

Different Approaches of Spectral Subtraction Method for Speech Enhancement

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

On using acoustic environment classification for statistical model-based speech enhancement

ANUMBER of estimators of the signal magnitude spectrum

Estimation of Non-stationary Noise Power Spectrum using DWT

International Journal of Advanced Research in Computer Science and Software Engineering

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Integrated acoustic echo and background noise suppression technique based on soft decision

STATISTICAL METHODS FOR THE ENHANCEMENT OF NOISY SPEECH. Rainer Martin

IN REVERBERANT and noisy environments, multi-channel

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Phase estimation in speech enhancement unimportant, important, or impossible?

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics

Single channel noise reduction

REAL-TIME BROADBAND NOISE REDUCTION

Wavelet Speech Enhancement based on the Teager Energy Operator

SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK

Fundamental frequency estimation of speech signals using MUSIC algorithm

NOISE ESTIMATION IN A SINGLE CHANNEL

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

Speech Enhancement Based On Noise Reduction

Mikko Myllymäki and Tuomas Virtanen

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

INTERNATIONAL TELECOMMUNICATION UNION

Codebook-based Bayesian speech enhancement for nonstationary environments Srinivasan, S.; Samuelsson, J.; Kleijn, W.B.

Transient noise reduction in speech signal with a modified long-term predictor

AS DIGITAL speech communication devices, such as

Audio Restoration Based on DSP Tools

Noise Reduction: An Instructional Example

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

Noise Tracking Algorithm for Speech Enhancement

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

HUMAN speech is frequently encountered in several

Advances in Applied and Pure Mathematics

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Performance Evaluation of Noise Estimation Techniques for Blind Source Separation in Non Stationary Noise Environment

PROSE: Perceptual Risk Optimization for Speech Enhancement

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

High-speed Noise Cancellation with Microphone Array

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

COM 12 C 288 E October 2011 English only Original: English

Measuring the complexity of sound

MULTICHANNEL systems are often used for

Nonuniform multi level crossing for signal reconstruction

Automotive three-microphone voice activity detector and noise-canceller

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

QUANTIZATION NOISE ESTIMATION FOR LOG-PCM. Mohamed Konaté and Peter Kabal

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Research Article Subband DCT and EMD Based Hybrid Soft Thresholding for Speech Enhancement

EMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation

Modulation Domain Spectral Subtraction for Speech Enhancement

IN RECENT YEARS, there has been a great deal of interest

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

A GENERALIZED LOG-SPECTRAL AMPLITUDE ESTIMATOR FOR SINGLE-CHANNEL SPEECH ENHANCEMENT. Aleksej Chinaev, Reinhold Haeb-Umbach

Speech Enhancement Using a Mixture-Maximum Model

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Enhanced Waveform Interpolative Coding at 4 kbps

Reliable A posteriori Signal-to-Noise Ratio features selection

Real time noise-speech discrimination in time domain for speech recognition application

A SUPERVISED SIGNAL-TO-NOISE RATIO ESTIMATION OF SPEECH SIGNALS. Pavlos Papadopoulos, Andreas Tsiartas, James Gibson, and Shrikanth Narayanan

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

A HYBRID APPROACH TO COMBINING CONVENTIONAL AND DEEP LEARNING TECHNIQUES FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION

Dual-Microphone Speech Dereverberation in a Noisy Environment

Impact Noise Suppression Using Spectral Phase Estimation

Real Time Noise Suppression in Social Settings Comprising a Mixture of Non-stationary and Transient Noise

OPTIMAL SPECTRAL SMOOTHING IN SHORT-TIME SPECTRAL ATTENUATION (STSA) ALGORITHMS: RESULTS OF OBJECTIVE MEASURES AND LISTENING TESTS

Robust Low-Resource Sound Localization in Correlated Noise

OPTIMUM POST-FILTER ESTIMATION FOR NOISE REDUCTION IN MULTICHANNEL SPEECH PROCESSING

TIME-FREQUENCY CONSTRAINTS FOR PHASE ESTIMATION IN SINGLE-CHANNEL SPEECH ENHANCEMENT. Pejman Mowlaee, Rahim Saeidi

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

Introduction of Audio and Music

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

Title. Author(s)Sugiyama, Akihiko; Kato, Masanori; Serizawa, Masahir. Issue Date Doc URL. Type. Note. File Information

Use of linear predictive features and pattern recognition techniques to develop a vector quantization based blind SNR estimation system

A HYBRID APPROACH TO COMBINING CONVENTIONAL AND DEEP LEARNING TECHNIQUES FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION

NOISE PSD ESTIMATION BY LOGARITHMIC BASELINE TRACING. Florian Heese and Peter Vary

Transcription:

Signal Processing 9 (2) 55 6 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast communication Minima-controlled speech presence uncertainty tracking method for speech enhancement Woojung Lee, Ji-Hyun Song, Joon-Hyuk Chang School of Electronic Engineering, Inha University, Incheon 42-75, Republic of Korea article info Article history: Received 3 February 2 Received in revised form April 2 Accepted 8 June 2 Available online 25 June 2 Keywords: Soft decision Speech absence probability Minima-controlled recursive averaging abstract In speech enhancement, soft decision, in which the speech absence probability (SAP) is introduced to modify the spectral gain or update the noise power, is known to be efficient. In many previous works, a fixed a priori probability of speech absence (q) is assumed in estimating the SAP, which is not realistic since speech is quasi-stationary and may not be present in each frequency bin. To address this problem, Malah et al. devised a novel method to obtain distinct values of q for each frequency bin in many frames by comparing the a posteriori SNR to a threshold value [9]. In this regard, a novel algorithm is achieved by taking an advantage of a minima-controlled recursive averaging (MCRA) technique that allows for the robust tracking of speech absence in time. This leads to the improved tracking performance of speech absence in speech enhancement and better results in the objective and subjective evaluation tests. & 2 Elsevier B.V. All rights reserved.. Introduction In general, listening to speech becomes more difficult as the ambient noise level increases. To avoid this problem, speech enhancement techniques attempt to remove the effect of the additive noise [ 7]. Among them, a conventional strategy of applying soft decision has been considered effective because the probability of speech absence (or speech presence) is incorporated as a key parameter for modifying the spectral gain and updating the noise power [8]. From this viewpoint, in the literature, it can be seen that a fixed probability of q, which is the a priori probability of speech absence, is assumed for all frequency components in the analyzed input frames [8,9]. In[2], q was set to.5 to address the worst-case scenario in which speech and noise are equally likely to occur, while q was set to.2 based on the listening test in []. Several algorithms have been proposed for estimating and updating q [9,]. In Corresponding author. Tel.: +82 32 86 7423; fax: +82 32 868 3654. E-mail address: changjh@inha.ac.kr (J.-H. Chang). particular, Malah et al. proposed an algorithm to obtain distinct values of q for each frequency in each frame based on a simple hypothesis test by comparing the a posteriori SNR with a given threshold [9]. However, it can be seen that the a posteriori SNR is sensitive to outliers, especially for time-varying noise. On the other hand, Cohen proposed a novel technique for estimating noise by averaging past spectral power values with a smoothing parameter that is adjusted by the speech presence probability in subbands []. In particular, the presence of speech in subbands is determined by the ratio between the local energy of noisy speech and its minimum within a given time window. Note that Cohen s method is known to be insensitive to the type and intensity of ambient noise. Also, this method is computationally efficient and characterized by the capability to quickly adapt to sudden changes in the noise spectrum. In this paper, we develop a novel method to track the a priori probability of speech absence which is a dominant parameter in computing the speech absence probability from the observation. To do this, we devise a method to track the a priori probability of speech absence by comparing the local energy of the noisy speech and its 65-684/$ - see front matter & 2 Elsevier B.V. All rights reserved. doi:.6/j.sigpro.2.6.9

56 W. Lee et al. / Signal Processing 9 (2) 55 6 corresponding minimum value in each frequency bin. It is found that it enables a more robust estimate of q, which is analogous to the advantage of Cohen s method []. Based on this, we performed an objective and subjective quality test by incorporating the proposed approach into the speech enhancement, and produced better results. 2. Review of tracking speech presence uncertainty In this section, we first review the notion of the tracking speech uncertainty introduced in [9]. At first, let y(n) denote a noisy speech signal, which is the sum of a clean speech signal, x(n), and an uncorrelated additive noise signal, d(n); y(n)=x(n)+d(n). Applying a short-time Fourier transform (STFT), we then have in the time frequency domain Yðk,lÞ¼Xðk,lÞþDðk,lÞ, ðþ where k is the frequency bin and l is the frame index, respectively. Given two hypotheses, H (k,l) and H (k,l), which indicate speech absence and presence, respectively, it is assumed that H ðk,lþ : Yðk,lÞ¼Dðk,lÞ, H ðk,lþ : Yðk,lÞ¼Xðk,lÞþDðk,lÞ: Like a number of other speech enhancement algorithms [8], we also assume that X(k,l) and D(k,l) are characterized by separate zero-mean complex Gaussian distributions, and the following is obtained: jyðk,lþj2 pðyðk,lþjh Þ¼ exp, pl d ðk,lþ l d ðk,lþ pðyðk,lþjh Þ¼ p½l d ðk,lþþl x ðk,lþš exp jyðk,lþj 2, l d ðk,lþþl x ðk,lþ ð3þ in which l x ðk,lþ and l d ðk,lþ are variances of the clean speech and noise in the kth frequency bin and lth frame index, respectively. Conditioned on the current observation, Y(k,l), the speech absence probability (SAP), pðh jyðk,lþþ, is given by [8] pðh jyðk,lþþ ¼ pðyðk,lþjh ÞpðH Þ pðyðk,lþþ pðyðk,lþjh ÞpðH Þ ¼ pðyðk,lþjh ÞpðH ÞþpðYðk,lÞjH ÞpðH Þ ¼ þqlðyðk,lþþ, ð4þ in which LðYðk,lÞÞ is the likelihood ratio computed in the kth subband and lth frame index as follows: LðYðk,lÞÞ ¼ pðyðk,lþjh Þ pðyðk,lþjh Þ gðk,lþxðk,lþ ¼ exp, ð5þ þxðk,lþ þxðk,lþ where gðk,lþ and xðk,lþ are the a posteriori SNR and the a priori SNR [8], respectively, as follows: gðk,lþ jyðk,lþj2 l d ðk,lþ, ð2þ ð6þ xðk,lþ l xðk,lþ l d ðk,lþ, ð7þ and q (=p(h )/p(h )) is the ratio of the a priori probability for speech presence and speech absence []. Indeed, q is a rough estimate of the ratio of silence time intervals between speech activities and the time duration of speech. This ratio q is assumed to be fixed in many previous works [,5,8]. However, Malah et al. proposed the method to allow different q s in different frequency bins for each frame since this number varies in time due to the non-stationarity of speech. Specifically, in the method of Malah et al., (4) becomes pðh jyðk,lþþ ¼ þqðk,lþlðyðk,lþþ, ð8þ where qðk,lþ¼a q qðk,l Þþð a q ÞIðk,lÞ, ð9þ and a q ðoa q oþ is a smoothing parameter. In particular, I(k,l) is an index function denoting the following hypothesis test by incorporating the a posteriori SNR such that gðk,lþ _ H g TH, ðþ H where g TH is a given threshold (i.e., I(k,l)= if H is accepted, and I(k,l)= if H is accepted). Note that, in the method of Malah et al. [9], the availability of a separate estimate of q in each bin for each frame adaptively controls the update of the noise power in the case of speech presence. 3. Proposed minima-controlled speech presence uncertainty tracking method In the previous section, the estimation of pðh jyðk,lþþ given by (4) is controlled by distinct values of q s obtained by the a posteriori SNR-based hypothesis test, as in the previous approach [9]. However, we note that the a posteriori SNR cannot be relevant due to its high variation over successive short-time frames [2]. For this reason, we consider a monotonic hypothesis test denoting the ratio between the local energy of the noisy speech and its derived minimum, as in the MCRA method proposed by Cohen []. This method is clearly insensitive to the type and strength of noise, which are very desirable characteristics []. To illustrate these characteristics, we first introduce the smoothed local energy of the noisy speech by a first order recursive averaging Sðk,lÞ¼a s Sðk,l Þþð a s ÞS f ðk,lþ, ðþ where S f (k,l) is a local energy of a current frame and a s ðoa s oþ is a smoothing parameter. The minimum of the local energy S min (k,l) is searched for in a samplewise comparison manner such that S min ðk,lþ¼minfs min ðk,l Þ,Sðk,lÞg, S tmp ðk,lþ¼minfs tmp ðk,l Þ,Sðk,lÞg, ð2þ where the minimum value for the current frame is yielded by a comparison of the local energy of the noisy speech

W. Lee et al. / Signal Processing 9 (2) 55 6 57 and the minimum value of the previous frame. Whenever L frames have been read, i.e., l is divisible by L, the temporary value should be employed and initialized by S min ðk,lþ¼minfs tmp ðk,l Þ,Sðk,lÞg, S tmp ðk,lþ¼sðk,lþ, ð3þ and (2) continues to search for the minimum values. The implementation of the minima tracking is summarized as follows: Initialize variables at the first frame (l=) for all frequency bin S(k,)=S f (k,) S min (k,)=s f (k,) For all time frames l ðl^þ For all frequency bins k compute S min =min {S min (k,l-, S(k,l)} using () and (2). save S tmpðk,lþ¼s min fs tmpðk,l Þ,Sðk,lÞg using (2) When l % L== compute S min ðk,lþ¼minfs tmpðk,lþ,sðk,lþg using (3) update S tmpðk,lþ¼sðk,lþ using (3) Using the obtained S min (k,l), we now consider the S r ðk,lþ9sðk,lþ=s min ðk,lþ which denotes the ratio between the local energy of the noisy speech and its derived minimum []. From this, we can derive the following: S r ðk,lþ _ H d, ð4þ H where d is a simple threshold. As an example, Fig. compares two statistics (a posteriori SNR vs. S r (k,l)) when the speech enhancement algorithm operates on noisy speech corrupted by the car noise. From the figure, it can be seen that the a posteriori SNR tends to fluctuate highly during noise intervals. In contrast, S r (k,l) does not exhibit large variation over successive frames during the noiseonly periods while S r (k,l) adapts the speech energy adequately during the speech. Using the decision rule of (4) in the MCRA scheme, we propose ^q, which has a different value of q as in the conventional tracking speech presence uncertainty scheme, such that ^qðk,lþ is given by ^qðk,lþ¼a p ^qðk,l Þþð a p ÞIðk,lÞ, ð5þ in which a p ðoa p oþ is a smoothing parameter and I(k,l) is an indicator function for the result of the decision rule of (4), i.e., I(k,l)= if S r ðk,lþ4d and I(k,l)= if S r ðk,lþod. Then, (8) implies pðh jyðk,lþþ ¼ þ ^qðk,lþlðyðk,lþþ : ð6þ It is not difficult to see from Fig. 2 that the SAP by the proposed method seems more accurate than the conventional method (a posteriori SNR-based). 4. Experiments and results The proposed minima-controlled speech presence uncertainty tracking method was adopted for softdecision-based speech enhancement, as in [8], and was evaluated with extensive objective and subjective tests. For these tests, phrases, spoken by four male and four.5..5 2. 2.5 3. 3.5 4..5..5 2. 2.5 3. 3.5 4. 5 5.5..5 2. 2.5 3. 3.5 4. Fig.. Comparison of two statistics (k=2, around 3 Hz) under street noise (SNR = 5 db). (a) Clean speech waveform, (b) noisy speech waveform, (c) gðk,lþ (dashed line) vs. S r (k,l) (solid line).

58 W. Lee et al. / Signal Processing 9 (2) 55 6..2.3.4.5.6.7.8.9...2.3.4.5.6.7.8.9. Speech Presence Probability.5..2.3.4.5.6.7.8.9. Fig. 2. Comparison of probability (k=2, around 3 Hz) under car noise (SNR = 5 db). (a) Clean speech waveform, (b) noisy speech waveform, (c) speech presence probability in short-time frames: probability using the a posteriori (dashed line), probability of the proposed algorithm (bold line). female speakers, were employed as the experimental data. Each phrase consists of two different meaningful sentences, and its duration was 8 s. For a real-time processing, the proposed method was conducted for each frame of ms with a sampling frequency of 8 khz. Four types of noise sources, such as white noise, car noise, street noise, and office noise, were digitally added to the clean speech waveform at SNRs of 5,, and 5 db. In all cases, speech enhancement was conducted with the experimentally optimized parameter values: a q ¼ :95, g TH ¼ :8, a p ¼ :2, d ¼ 5. At first, we carried out the perceptual evaluation of speech quality (PESQ) based on the ITU-T P.862 tests [3]. From Table, which shows the results of the PESQ, we can see that the proposed minima-controlled speech presence uncertainty tracking method outperformed three conventional methods proposed by McAulay [], Ephraim [2], Malah [9], and ideal q-based method under the given noise conditions. Specifically, the ideal q-based method has fixed values of q which are determined from the ratio of speech and noise in the each speech segment. Note that the performance gain becomes larger, especially for the non-stationary noise such as car and street noise. We also carried out a set of informal tests under the same noise conditions to evaluate the subjective quality of the proposed method. Subjective opinions were given by a group of 2 listeners; each listener gave a score for each test sentence: 5 (Excellent), 4 (Good), 3 (Fair), 2 (Poor), and (Bad). All listener scores were then averaged to Table PESQ scores of the conventional methods and the proposed method. Noise Method SNR (db) 5 5 White McAulay (q=.5).68.95 2.33 Ephraim (q=.2).96 2.34 2.67 Ideal 2.8 2.4 2.73 Malah 2.8 2.4 2.72 Proposed 2.9 2.42 2.75 Street McAulay (q=.5) 2.49 2.8 3.6 Ephraim (q=.2) 2.83 3.2 3.37 Ideal 2.85 3.3 3.38 Malah 2.83 3.2 3.39 Proposed 2.89 3.6 3.4 Car McAulay (q=.5) 2.97 3.2 3.4 Ephraim (q=.2) 3.26 3.54 3.83 Ideal 3.35 3.63 3.88 Malah 3.34 3.63 3.88 Proposed 3.39 3.67 3.9 Office McAulay (q=.5).96 2.34 2.68 Ephraim (q=.2) 2.2 2.62 2.95 Ideal 2.32 2.65 2.94 Malah 2.3 2.63 2.93 Proposed 2.34 2.67 2.96 yield a mean opinion score (MOS). The MOS test results, with a 95% confidence interval, are summarized in Table 2, in which a higher value indicates preference. It is noted that performance was found to improve for most of the

W. Lee et al. / Signal Processing 9 (2) 55 6 59 Table 2 MOS of the conventional methods and the proposed method (with 95% confidence interval). Noise Method SNR (db) 5 5 White McAulay.67.9.897.9 2.367.2 Ephraim.797.9 2.47.26 2.437.2 Ideal.687.26 2.397.9 2.77.8 Malah.847.7 2.457.5 2.847.9 Proposed.87.6 2.527.7 2.877.9 Car McAulay 3.57.22 3.687.26 3.827.3 Ephraim 3.77.23 4.7.27 4.257.2 Ideal 3.77.7 4.7.22 4.367.22 Malah 3.727.26 4.77.23 4.427.3 Proposed 3.757.23 4.77.23 4.427.6 Street McAulay 2.697.7 3.537.24 3.787.27 Ephraim 3.37.25 3.787.2 3.887.27 Ideal 3.37.9 3.77.23 3.97.2 Malah 3.37.6 3.687.8 3.847.2 Proposed 3.427.2 3.97.23 3.957.2 Office McAulay.887.2 2.537.8 3.97.7 Ephraim.87.2 2.597.22 3.67.24 Ideal.847.24 2.477.8 3.97.29 Malah.947.2 2.487.8 3.227.2 Proposed 2.67.2 2.637.8 3.377.8 Table 3 CCR test of the conventional method (Malah-based) and the proposed method (with 95% confidence interval). Noise SNR (db) Overall Speech Noise White 5.327.8.247.3.87.9.237.3.27.3.27.7 5.37.5.27.3.327.7 Car 5.87..7.6.37.9.27.6.7.3.77.6 5.37.5.7.3.7.9 Street 5.77.6.567.2.787.7.727.3.397.2.47.3 5.727.3.237..257.2 Office 5.527.3.67.9.37.9.387.4.97..27.8 5.427.3.87.6.87.7 noises at all SNRs. Indeed, it is observed that the performance differences in the MOS are more significant than the case of the PESQ in many cases. This phenomenon can be attributed to the fact that all parameters have been optimized for subjective quality enhancement. These results confirm that the proposed algorithm is consistently better than the conventional methods. We also conducted additional subjective tests via the ITU-T comparison category rating (CCR) to assess performance difference [4]. Ten listeners with normal hearing (six male and four female) participated in the experiment. The CCR test sheds light on perception quality of the signal of method A (proposed) over method B (Malah). The grades of the seven points scale range are as follows: 3 (much better), 2 (better), (slightly), (about), (slightly worse), 2 (worse), 3 (much worse). The results of CCR test between the proposed method and the conventional method based on Malah [9] are organized in Table 3. From the table, we confirm that the proposed method is found to improve the quality of speech, background noise, and overall speech. Finally, the speech spectrograms obtained with the conventional and proposed approach are presented in Fig. 3. From the figure, we can see that the proposed method effectively suppresses the background noise compared to the conventional method. 5. Conclusions In this paper, we have proposed a novel method to incorporate the minima-controlled technique into the

6 W. Lee et al. / Signal Processing 9 (2) 55 6 4 2 2 3 4 5 6 7 8 4 2 2 3 4 5 6 7 8 4 2 2 3 4 5 6 7 8 4 2 2 3 4 5 6 7 8 time (s) Fig. 3. Speech spectrograms (car noise, SNR = 5 db). (a) Spectrogram of the clean speech (Original), (b) spectrogram of the noisy speech (Noisy Speech), (c) spectrogram of the output signal obtained by Malah [9] (Malah), (d) spectrogram of the output signal obtained by the proposed method (Proposed). tracking speech presence uncertainty for speech enhancement. The ratio between a local energy and its minimum, which is introduced from the MCRA, controls q s for different bins since it provides us with a robust tracking performance of speech presence. Compared to the conventional tracking speech presence uncertainty, the performance of the proposed technique under various noise environments was superior in both subjective and objective tests. Acknowledgements This research was supported by the MKE, Korea, under the ITRC support program supervised by the NIPA (NIPA- 2-C9-2-7). And this work was supported by the IT R&D program of MKE/KEIT. [29-S-36-, Development of New Virtual Machine Specification and Technology]. References [] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-32 (6) (984) 9 2. [2] R.J. McAulay, M.L. Malpass, Speech enhancement using a softdecision noise suppression filter, IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-28 (2) (98) 37 45. [3] J.-H. Chang, Q.-H. Jo, D.K. Kim, N.S. Kim, Global soft decision employing support vector machine for speech enhancement, IEEE Signal Processing Letters 6 () (29) 57 6. [4] R. Martin, Spectral subtraction based on minimum statistics, in: Proceedings of the EUSIPCO, Edinburgh, UK, September 994, pp. 82 85. [5] I. Cohen, B. Berdugo, Speech enhancement for non-stationary noise environments, Signal Processing 8 () (2) 243 248. [6] G. Doblinger, Computationally efficient speech enhancement by spectral minima tracking in subbands, in: Proceedings of the Eurospeech, Madrid, Spain, September 995, pp. 53 56. [7] J. Meyer, K.U. Simmer, K.D. Kammeyer, Comparison of one- and two-channel noise-estimation techniques, in: Proceedings of the IWAENC, London, UK, September 997, pp. 37 45. [8] N.S. Kim, J.-H. Chang, Spectral enhancement based on global soft decision, IEEE Signal Processing Letters 7 (5) (2) 8. [9] D. Malah, R. Cox, A.J. Accardi, Tracking speech-presence uncertainty to improve speech enhancement in nonstationary noise environments. in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Pheonix, AZ, March 999, pp. 789 792. [] I. Soon, S. Koh, C. Yeo, Improved noise suppression filter using selfadaptive estimator of probability of speech absence, Signal Processing 75 (2) (999) 5 59.

W. Lee et al. / Signal Processing 9 (2) 55 6 6 [] I. Cohen, B. Berdugo, Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal Processing Letters 9 () (22) 2 5. [2] O. Cappé, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Transactions on Speech Audio Processing 2 (April) (994) 345 349. [3] ITU-T P.862, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, February 2. [4] ITU-T P.8, Methods for subjective determination of transmission quality, August 996.