Speech Enhancement Using the Minimum-Probability-of-Error Criterion

Similar documents
DNN-based Causal Voice Activity Detector

PROSE: Perceptual Risk Optimization for Speech Enhancement

Study on SLT calibration method of 2-port waveguide DUT

Design of FPGA-Based Rapid Prototype Spectral Subtraction for Hands-free Speech Applications

Application of Wavelet De-noising in Vibration Torque Measurement

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

Robustness Analysis of Pulse Width Modulation Control of Motor Speed

Example. Check that the Jacobian of the transformation to spherical coordinates is

Fuzzy Logic Controller for Three Phase PWM AC-DC Converter

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

A Comparative Analysis of Algorithms for Determining the Peak Position of a Stripe to Sub-pixel Accuracy

Soft-decision Viterbi Decoding with Diversity Combining. T.Sakai, K.Kobayashi, S.Kubota, M.Morikura, S.Kato

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin

Address for Correspondence

Research on Local Mean Decomposition Algorithms in Harmonic and Voltage Flicker Detection of Microgrid

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

DESIGN OF CONTINUOUS LAG COMPENSATORS

Module 9. DC Machines. Version 2 EE IIT, Kharagpur

Nevery electronic device, since all the semiconductor

DYE SOLUBILITY IN SUPERCRITICAL CARBON DIOXIDE FLUID

Synchronous Machine Parameter Measurement

Joanna Towler, Roading Engineer, Professional Services, NZTA National Office Dave Bates, Operations Manager, NZTA National Office

Multi-beam antennas in a broadband wireless access system

Improving Iris Identification using User Quality and Cohort Information

Dynamic Power Quality Compensator with an Adaptive Shunt Hybrid Filter

High-speed Simulation of the GPRS Link Layer

CSI-SF: Estimating Wireless Channel State Using CSI Sampling & Fusion

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

Information-Coupled Turbo Codes for LTE Systems

ScienceDirect. Adaptive LMS Filter using in Flexible Mechatronics System with Variable Parameter Control

STATISTICAL COMPLEXION-BASED FILTERING FOR REMOVAL OF IMPULSE NOISE IN COLOR IMAGES

Adaptive Network Coding for Wireless Access Networks

10.4 AREAS AND LENGTHS IN POLAR COORDINATES

The Discussion of this exercise covers the following points:

GNSS MULTIPATH MITIGATION USING LOW COMPLEXITY ADAPTIVE EQUALIZATION ALGORITHMS

Simulation of Transformer Based Z-Source Inverter to Obtain High Voltage Boost Ability

Performance of Adaptive Multiuser Receivers for the WCDMA Uplink

A Slot-Asynchronous MAC Protocol Design for Blind Rendezvous in Cognitive Radio Networks

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

A Development of Earthing-Resistance-Estimation Instrument

EXIT CHARTS FOR TURBO RECEIVERS IN MIMO SYSTEMS

RSS based Localization of Sensor Nodes by Learning Movement Model

Adaptive VoIP Smoothing of Pareto Traffic Based on Optimal E-Model Quality

Hardware Implementation of Image Compression Technique using Wavelet

To provide data transmission in indoor

Synchronous Machine Parameter Measurement

Experiment 3: Non-Ideal Operational Amplifiers

Engineer-to-Engineer Note

Energy Harvesting Two-Way Channels With Decoding and Processing Costs

Simultaneous Adversarial Multi-Robot Learning

Implementation of Different Architectures of Forward 4x4 Integer DCT For H.264/AVC Encoder

On the Effectivity of Different Pseudo-Noise and Orthogonal Sequences for Speech Encryption from Correlation Properties

Secret Key Generation and Agreement in UWB Communication Channels

Experiment 3: Non-Ideal Operational Amplifiers

Modeling of Conduction and Switching Losses in Three-Phase Asymmetric Multi-Level Cascaded Inverter

A New Algorithm to Compute Alternate Paths in Reliable OSPF (ROSPF)

A New Stochastic Inner Product Core Design for Digital FIR Filters

CHAPTER 3 AMPLIFIER DESIGN TECHNIQUES

D]TC - S octa Asmria ooi. <~ p-ee 199b3- %he srorisr7cx L~)~,71'% a I PHOTOGRAPH THIS SHEET. li LEVEL INVENTORY DOCUMENT IDENTIFICATION

A Stochastic Geometry Approach to the Modeling of DSRC for Vehicular Safety Communication

BP-P2P: Belief Propagation-Based Trust and Reputation Management for P2P Networks

Design of Coupling Coding in MPEG-4 HE-AAC

Postprint. This is the accepted version of a paper presented at IEEE PES General Meeting.

Redundancy Data Elimination Scheme Based on Stitching Technique in Image Senor Networks

Exponential-Hyperbolic Model for Actual Operating Conditions of Three Phase Arc Furnaces

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Convolutional Networks. Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow

A Novel Back EMF Zero Crossing Detection of Brushless DC Motor Based on PWM

BP-P2P: Belief Propagation-Based Trust and Reputation Management for P2P Networks

Temporal Secondary Access Opportunities for WLAN in Radar Bands

Improved Ensemble Empirical Mode Decomposition and its Applications to Gearbox Fault Signal Processing

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

(CATALYST GROUP) B"sic Electric"l Engineering

Variational Message-Passing for Joint Channel Estimation and Decoding in MIMO-OFDM

OPTIMIZATION OF IMAGE ENHANCEMENT USING AN ARTIFICIAL IMMUNE SYSTEM

Fuzzy ARTMAP Technique for Speech Noise Reduction

B inary classification refers to the categorization of data

Lecture 20. Intro to line integrals. Dan Nichols MATH 233, Spring 2018 University of Massachusetts.

A Simple Approach to Control the Time-constant of Microwave Integrators

4110 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 66, NO. 5, MAY 2017

ABSTRACT. We further show that using pixel variance for flat field correction leads to errors in cameras with good factory calibration.

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

g Lehrstuhl für KommunikationsTechnik, Lehrst

Arc Furnace Modeling in ATP-EMTP

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MATH 118 PROBLEM SET 6

IMPLEMENTATION OF FUZZY-NEURO CONTROLLER FOR DC-DC CONVERTER FED DC SERIES MOTOR USING EMBEDDED MICROCONTROLLER

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design

Available online at ScienceDirect. Procedia Engineering 89 (2014 )

Understanding Basic Analog Ideal Op Amps

An Analog Baseband Approach for Designing Full-Duplex Radios

D I G I TA L C A M E R A S PA RT 4

System-Wide Harmonic Mitigation in a Diesel Electric Ship by Model Predictive Control

Markov mode-multiplexing mode in OFDM outphasing transmitters

Sparse Banded Matrix Filter for Image Denoising

On the Prediction of EPON Traffic Using Polynomial Fitting in Optical Network Units

RECENTLY, there has been an increasing interest in noisy

CHAPTER 2 LITERATURE STUDY

This is a repository copy of Effect of power state on absorption cross section of personal computer components.

Transcription:

Interspeech 8 - September 8, Hyderbd Speech Enhncement Using the Minimum-Probbility-of-Error Criterion Jishnu Sdsivn, Subhdip Mukherjee, nd Chndr Sekhr Seelmntul Deprtment of Electricl Communiction Engineering, Deprtment of Electricl Engineering, Indin Institute of Science, Bnglore 5, Indi {sdsivn, subhdipm, chndrsekhr}@iisc.c.in Abstrct We propose novel speech denoising frmework by minimizing the probbility of error (PE, which mesures the devition probbility of the estimte from its true vlue. To develop the minimum PE (MPE criterion, one requires the knowledge of the noise probbility density function (p.d.f., which my not be vilble in prmetric form in speech denoising pplictions. Therefore, we dopt two pproches for modeling the noise p.d.f.: (i Gussin modeling bsed on dptive vrince estimtion; nd (ii Gussin mixture model (GMM in view of its pproximtion cpbilities. We consider discrete cosine trnsform (DCT domin shrinkge, where the optimum shrinkge prmeter is obtined by minimizing n estimte of the PE. A performnce ssessment for rel-world noise types shows tht for input signl-to-noise rtios (SNR greter thn 5, the proposed MPE-bsed point-wise shrinkge estimtors outperform three benchmrk techniques in terms of segmentl SNR nd short-time objective intelligibility (STOI scores. Index Terms: Minimum probbility of error, Speech denoising, Gussin mixture model, point-wise shrinkge estimtor.. Introduction Ambient coustic noise introduces unwnted disturbnces in speech signls leding to degrdtion in speech qulity, thereby ffecting the downstrem processing in speech communiction systems nd limiting the bility of listeners to understnd nd concentrte. Therefore, it is impertive to suppress noise nd enhnce the qulity nd intelligibility of speech. A typicl pproch to speech denoising is to minimize n pproprite distortion mesure, lso referred to s the risk function in sttistics literture, to obtin n estimte of the clen signl. However, direct minimiztion of risk requires the knowledge of the underlying clen signl or its sttistics, which is difficult to obtin in prctice. Hence, one needs to rely on the estimte of the clen signl sttistics. Genertive processes of speech signls exhibit wide vribilities bsed on speker, phonemes nd their durtions, lnguge, etc., which render the speech signl non-sttionry stochstic process. Therefore, estimting the clen speech prior is difficult, since it necessittes intricte stochstic modeling nd requires rigorous trining phse. Speech enhncement techniques cn be brodly ctegorized into (i spectrl subtrction techniques [ ], which involve the subtrction of the noise spectrum from the spectrum of noisy speech; (ii Wiener filtering [ ], which relies on the estimtes of the power-spectr of clen speech nd noise; (iii subspce techniques [7], wherein one utilizes the properties of the signl nd noise subspces; nd (iv sttisticl model-bsed pproches, which re setup in Byesin frmework nd rely on n estimte of the clen signl prior [8 ]. Recently, Xu et l. demonstrted the use of deep neurl networks for lerning the nonliner mp from noisy speech to clen speech [7, 8]. In this pper, emphsis is plced on developing non- Byesin technique for speech denoising. Our formultion relies only on the noise sttistics in its entirety, unlike the mensqured error (MSE formultions, in which the first- nd second-order sttistics suffice [9]. A sttisticl model is not ssumed on the clen speech signl. The key devition with respect to the stte-of-the-rt lies in the choice of the distortion mesure. We do not employ the stndrd MSE metric or perceptul distortion metric. Insted, we consider novel criterion for denoising, nmely the probbility of error (PE, which mesures the probbility of devition between the ground-truth signl nd its estimte. This criterion requires one to know or t lest estimte the noise p.d.f., nd plces no sttisticl ssumptions on the clen signl. The PE criterion is mesured in the short-time discrete-cosine trnsform (DCT domin. We rely on the prsimony of representtion nd energy compction of clen speech in the DCT bsis. Soon et l. showed tht the DCT is superior to the discrete Fourier trnsform (DFT for speech denoising []. The noise, however, is non-sprse in the DCT bsis. This representtion therefore justifies the use of point-wise shrinkge estimtor for denoising. Since the orcle PE requires one to know the ground-truth, we pproximte it by surrogte function tht depends solely on the noisy observtions, leding to prcticlly relizble estimte. Since denoising entils reduction in noise vrince in ech spectrl bnd, the PE is minimized with respect to the shrinkge prmeter over the intervl [, ], which is gret convenience s fr s optimiztion is concerned. We develop two different vrints of the PE risk, both point-wise, but one is n instntneous estimtor wheres the other incorportes temporl smoothing. Since the key objective is to combt relworld noise types (street, trin, nd F noise, which re nonsttionry nd whose distributions re not vilble priori, we dopt two p.d.f. models, one bsed on the Gussin nd the other employing GMM. Experimentl results re presented on the rel-world noise types for vrious input signl-to-noise rtios (SNRs nd compred with the stte of the rt.. MPE for Speech Denoising Consider the dditive observtion model x n = s n + w n, n =,,, N, ( where s n denotes the clen signl nd x n is the observtion corrupted by noise smples w n, which re independent nd identiclly distributed (i.i.d. with zero men nd vrince σ. Shorttime discrete cosine trnsform (DCT domin processing is considered for denoising within the proposed MPE formlism. The short-time DCT representtion of ( tkes the form X k,i = S k,i +W k,i, k =,, K, nd i =,, M, ( where k nd i denote the DCT coefficient nd the speech frme indices, respectively. For estimting S k,i, we develop point-.7/interspeech.8-9

wise estimtor Ŝk,i = k,i X k,i, where the shrinkge fctor k,i [, ] is selected optimlly bsed on the MPE criterion... MPE criteri for point-wise shrinkge Since the estimtor is point-wise, we drop the indices k nd i to mintin brevity of nottion, nd define the PE s ( R = P Ŝ S > ɛ, ( where ɛ > is predefined tolernce prmeter. Substituting Ŝ = X = (S + W, the risk in ( evlutes to R (, S = P ( (S + W S > ɛ ( ( ɛ ( S ɛ + ( S = F + F, ( where F ( is the cumultive distribution function (c.d.f. of the noise in the DCT domin. Since R depends on the groundtruth S, it is imprcticl to optimize it directly over, s the estimtor would be unrelizble. Therefore, we minimize n estimte of R, which is obtined by replcing S in ( with its noisy counterprt X. Such n estimte R tkes the form R(, X = F ( ɛ ( X + F ( ɛ + ( X nd correspondingly, the optiml shrinkge is obtined s opt = rg min R, by performing grid-serch over [, ] with grid-spcing of.. We consider two types of shrinkge estimtors. The first one, referred to s MPE-, pplies different shrinkge fctors to ech spectrl coefficient {X k,i } in the i th frme. The optiml shrinkge is selected coefficient-wise s opt k,i = rg min R(, X k,i. In the second vrint, which we refer to s MPE-, single shrinkge fctor is pplied to group of coefficients bunched long i, resulting in n estimtor of the form Ŝ k,i = opt k,i X k,i, where opt k,i = rg min +τ t= τ R(, X k,i t. (5 The prmeter τ determines the extent of temporl verging... Approximting unknown noise distributions In rel-world speech denoising scenrios, the noise distribution is often not known priori in prmetric form. In such scenrios, one hs to model the noise p.d.f. ppropritely. We consider two pproches for noise modeling: In the first one, we use Gussin, whose vrince is estimted dptively from the noisy speech signl, wheres in the second pproch, we employ GMM-bsed model. The effectiveness of the models will be vlidted experimentlly.... Gussin model nd dptive vrince estimtion This pproch relies on the ssumption tht the time-domin noise smples within frme re i.i.d. rndom vribles. Since the DCT coefficients re liner combintions of i.i.d. rndom vribles, considering the frme length to be sufficiently lrge, we invoke the centrl limit theorem, which ssures tht ech DCT coefficient W k,i is pproximtely Gussin distributed., A stochstic model bsed voice-ctivity detector (VAD [] is employed to estimte the vrince of W k,i. Going by the recommendtion in [], we use the following recursion to estimte the noise vrince dptively: ˆσ k,i = { ηˆσ k,i + ( η Xk,i, if i th frme is noise-only, ˆσ k,i, otherwise, where η =.98. Essentilly, the noise vrince is updted if the VAD identifies tht the frme under considertion corresponds to noise lone. In the sequel, the point-wise shrinkge estimtors MPE- nd MPE- for the Gussin noise model re referred to s MPE--G nd MPE--G, respectively.... Noise modeling using GMM The motivtion for using GMM stems from the fct tht it cn pproximte ny p.d.f. with finite number of discontinuities sufficiently ccurtely []. The L-component GMM p.d.f. with prmeters {α m, θ m, σ m} L m= is given by f(w = L m= α m (W θm exp (, ( σ m π σm nd the corresponding PE risk turns out to be L ( ɛ ( X θm R = α m [Q + σ m m= ( ] ɛ + ( X + θm Q, (7 σ m ( where Q(u = exp π t dt. The number of GMM u components M is selected following the Byesin informtion criterion (BIC []. For ech subbnd, the prmeters of the GMM re estimted using the expecttion-mximiztion (EM lgorithm [] bsed on trining dt corresponding exclusively to noise. The GMM-bsed p.d.f. modeling, when used in conjunction with the MPE- nd MPE- estimtors, re referred to s MPE--GMM nd MPE--GMM, respectively. The noise smples during trining nd testing re tken to be different.. Simultion Results Clen speech recordings from the Noizeus dtbse (8 khz smpling frequency [] re used in our experiments. The noise smples re tken from both Noizeus (trin nd street noises nd Noisex-9 (for F noise; downsmpled to 8 khz dtbses [5]. We consider frme-by-frme processing, with Hmming window, frme length of ms, nd n overlp of 75% between consecutive frmes. The vlue of τ in MPE- in (5 is set to, nd we choose ɛ = σ, where σ is the noise stndrd devition. We perform comprtive ssessment of the MPE-bsed techniques with three benchmrking lgorithms under different noise conditions. The lgorithms chosen for the comprison re: (i Wiener filter technique, which uses decisiondirected pproch for priori SNR estimtion (WFIL []; (ii log-spectrl mplitude estimtor (LSA, which minimizes the men-squred error (MSE of the logrithm of clen speech spectrl mplitude [9]; nd (iii Byesin non-negtive mtrix Exmple speech files re vilble t https://spectrumee. wixsite.com/mpe-se.

SSNR GAIN ( SSNR GAIN ( SSNR GAIN ( 8 BNMF LSA WFIL MPE G MPE G MPE GMM MPE GMM..55.5.5..5.. 5 5 5 5 5 5 5 5 5 ( SSNR (F noise (b PESQ (F noise (c STOI (F noise 8.5.......... 5 5 5 5 5 5 5 5 5 (d SSNR (Trin noise (e PESQ (Trin noise (f STOI (Trin noise... 5 5 5 5 5 5 5 5 5 (g SSNR (Street noise (h PESQ (Street noise (i STOI (Street noise 5 5 Figure : Performnce comprison of vrious lgorithms for different noise types in terms of SSNR, PESQ, nd STOI scores....... fctoriztion method (BNMF, wherein one optimizes the MSE of the clen speech spectrl mplitude with the help of dictionry trined offline on clen speech [5]. Mtlb implementtions of WFIL nd LSA re vilble in [7]. The implementtions use the VAD proposed in []. For MPE--G/MPE-- G, we use the sme VAD. For the GMM pproch, VAD is not needed, since it is pre-trined model. For the BNMF implementtion, we use the Mtlb code provided online by the uthors [5]. The choice of WFIL nd LSA for performnce benchmrking is motivted by the extensive comprison reported in [7], which estblished conclusively tht these result in higher speech qulity nd intelligibility thn the competing techniques. The BNMF technique hs been shown to be the best mong NMF bsed speech denoising pproches. Three objective scores re computed for performnce evlution: (i Segmentl signl-to-noise-rtio (SSNR, clculted by verging the SNRs over short speech segments; (ii Perceptul evlution of speech qulity (PESQ [8], which is widely used to mesure the perceptul speech qulity in nrrowbnd telephone networks, speech codecs, nd denoised speech; nd (iii Short-time objective intelligibility score (STOI, which hs been shown to be highly correlted with the intelligibility of the denoised speech [9]. The scores re verged over different speech files corresponding to independent nd rndomly selected noise reliztions for ech input SNR. Figure shows the performnce comprison of the techniques for F, trin, nd street noise. We observe tht for ll the noise types under considertion, MPE--GMM nd MPE- -GMM exhibit higher SSNR gin compred with the competing lgorithms (cf. Figures (, (d, nd (g. Further, in the cse of F noise, nd for other noise types with input SNR greter thn 5, MPE--G exhibits better denoising performnce in terms of SSNR. Among the proposed MPE estimtors, the SSNR gin obtined using MPE--G turns out to be the lest. In terms of PESQ scores (cf. Figures (b, (e, nd (h, we observe tht, for ll the noise types considered, MPE--GMM leds to the best performnce. For F noise, MPE--G nd MPE--G lso result in firly high PESQ scores.

( Clen speech (b Input SNR =, Trin noise (c BNMF (f MPE--G (d WFIL (f MPE--G (g MPE--GMM (h MPE--GMM Figure : Spectrogrms of the denoised speech obtined using different lgorithms. For input SNR exceeding 5, MPE--G, MPE--GMM, nd MPE--GMM exhibit denoising performnce superior to their competitors in terms of STOI (cf. Figures (c, (f, nd (i. To summrize, MPE--GMM exhibits better performnce compred with ll the other techniques. In the cse of street nd trin noise, GMM-bsed MPE estimtors show superior denoising performnce thn their Gussin counterprts. In the cse of F noise, the Gussin model led to better denoising. This is probbly becuse the F noise is reltively sttionry compred with street nd trin noise, nd the dptive vrince estimtion using VAD is resonbly ccurte. To demonstrte the time-frequency structure, distribution of residul noise, nd speech distortion, we show the spectrogrms of the denoised, noisy, nd clen speech signls in Figure corresponding to the trin noise. We observe tht WFIL hs higher residul noise thn ll the other lgorithms. BNMF suppresses noise, especilly in the silence regions, but it introduces speech distortions in some regions (cf. Figure (c, high frequency region (.5 to.5 khz just fter s, highlighted using red rectngle. MPE--GMM nd MPE--GMM yield superior noise suppression nd less speech distortion. In the cse of MPE--G/MPE--GMM, smll mount of musicl noise is present, which is suppressed to some extent in MPE--G/MPE- -GMM, since by construction, MPE- incorportes temporl smoothing while computing the point-wise shrinkge estimtor.. Conclusions We proposed novel criterion for speech denoising bsed on the probbility of error. Our formlism does not plce ny sttisticl ssumptions on the clen speech signl. Notwithstnding its simplicity, the performnce of the proposed denoiser turned out to be competitive with the stte-of-the-rt techniques under rel-world noise conditions. Further, n implicit ssumption of the proposed frmework is tht the clen signl dmits prsimonious representtion in chosen bsis, which is true of the speech signl in the DCT domin, nd tht the noise does not, which mkes the point-wise shrinkge nturl choice for denoising. The proposed frmework relies on modeling the noise p.d.f., for which we develop Gussin nd GMM-bsed pproximtions. The stndrd devition of the Gussin model for noise is updted recursively using VAD, wheres the prmeters for the GMM re pre-trined. Updting the GMM prmeters dptively might led to n improvement in the denoising performnce under rel-world noise conditions. Two versions of point-wise shrinkge were considered, one instntneous nd the other involving certin degree of temporl smoothing, with the ltter leding to superior performnce. All the sme, excessive smoothing might deteriorte the performnce nd the optiml degree of smoothing to be incorported in the MPE frmework must be scertined.

5. References [] P. Lockwood nd J. Boudy, Experiments with non-liner spectrl subtrctor (NSS, hidden Mrkov models nd the projections, for robust recognition in crs, Speech Commun., vol., issue, pp. 5 8, Jun. 99 [] S. Kmth nd P. Loizou, A multi-bnd spectrl subtrction method for enhncing speech corrupted by colored noise, Proc. IEEE Intl. Conf. Acoust., Speech, Signl Process., vol., pp. 7, My. [] Y. Hu nd P. Loizou, A perceptully motivted pproch for speech enhncement, IEEE Trns. Speech, Audio Process., vol., no. 5, pp. 57 5, Sep.. [] J. Chen, J. Benesty, Y. Hung, nd S. Doclo, New insight into noise reduction Wiener filter, IEEE Trns. Speech, Audio Process., vol., no., pp. 8, Jul.. [5] S. Srinivsn, J. Smuelsson, nd W. Kleijn, Codebook driven short-term predictor prmeter estimtion for speech enhncement, IEEE Trns. Speech, Audio Process., vol., no., pp. 7, Jn.. [] P. Sclrt, nd J. V. Filho, Speech enhncement bsed on priori signl to noise estimtion, in Proc. IEEE Intl. Conf. Acoust., Speech, Signl Process., vol., pp. 9, My. 99. [7] F. Jbloun nd B. Chmpgne, Incorporting the humn hering properties in the signl subspce pproch for speech enhncement, IEEE Trns. Speech, Audio Process., vol., no., Nov.. [8] Y. Ephrim nd D. Mlh, Speech enhncement using minimum men-squred error short-time spectrl mplitude estimtor, IEEE Trns. Acoust., Speech, Signl Process., vol. ASSP-, no., pp. 9, Dec. 98. [9] Y. Ephrim nd D. Mlh, Speech enhncement using minimum men-squred error log-spectrl mplitude estimtor, IEEE Trns. Acoust., Speech, Signl Process., vol. ASSP-, no., pp. 5, Apr. 985. [] P. C. Loizou, Speech enhncement bsed on perceptully motivted Byesin estimtors of the mgnitude spectrum, IEEE Trns. Speech, Audio Process., vol., no. 5, pp. 857 89, Sep. 5. [] J. S. Erkelens, R. C. Hendriks, R. Heusdens, nd J. Jensen, Minimum men-squre error estimtion of discrete Fourier coefficients with generlized gmm priors, IEEE Trns. Audio, Speech, nd Lng. Process., vol. 5, no., pp. 7 75, Aug. 7. [] Y. Ephrim, A Byesin estimtion pproch for speech enhncement using hidden Mrkov models, IEEE Trns. Signl Process., vol., no., pp. 75 75, Apr. 99. [] T. Lotter nd P. Vry, Speech enhncement by mximum posteriori estimtion using super-gussin speech model, EURASIP J. Appl. Signl Process., vol. 7, pp., 5. [] P. Mowlee nd R. Seidi, Itertive closed-loop phse-wre single-chnnel speech enhncement, IEEE Signl Process. Lett., vol., no., pp. 5 9, Dec.. [5] N. Mohmmdih, P. Smrgdis, nd A. Leijon, Supervised nd unsupervised speech enhncement using nonnegtive mtrix fctoriztion, IEEE Trns. Audio, Speech, Lng. Process., vol., no., pp. 5, Oct.. [] R. Mrtin, Speech enhncement bsed on minimum men-squre error estimtion nd supergussin priors, IEEE Trns. Speech, Audio Process., vol., no. 5, pp. 85 85, Sep. 5. [7] Y. Xu, J. Du, L.-R. Di, nd C.-H. Lee, An experimentl study on speech enhncement bsed on deep neurl networks, IEEE Signl Process. Lett., vol., no., pp. 5 8, Jn.. [8] Y. Xu, J. Du, L.-R. Di, nd C.-H. Lee, A regression pproch to speech enhncement bsed on deep neurl networks, IEEE Trns. Audio, Speech, Lng. Process., vol., no., pp. 7 9, 5. [9] J. Sdsivn, S. Mukherjee, nd C. S. Seelmntul, An optimum shrinkge estimtor bsed on minimum-probbility-of-error criterion nd ppliction to signl denoising, in Proc. IEEE Intl. Conf. on Acoust. Speech nd Signl Process., pp. 9 5,. [] I. Y. Soon, S. N. Koh, nd C. K. Yeo, Noisy speech enhncement using discrete cosine trnsform, Speech Commun., vol., pp. 9 57, Jun. 998. [] H. W. Sorenson nd D. L. Alspch, Recursive Byesin estimtion using Gussin sums, Automtic, Vol. 7, pp. 5 79, 97. [] R. Redner nd H. Wlker. Mixture densities, mximum likelihood nd the EM lgorithm, SIAM Review, Vol., no., pp. 95 9, Apr. 98. [] C. Frley nd A. Rftery, How Mny Clusters? Which Clustering Method? Answers Vi Model-Bsed Cluster Anlysis, Technicl Report 9, Dept. Sttistics, Univ. Wshington, Settle, WA, 998. [] Y. Hu nd P. Loizou, Subjective comprison nd evlution of speech enhncement lgorithms, Speech Commn., vol. 9, pp. 588, Jul. 7. [5] A. Vrg nd H. J. M. Steeneken, Assessment for utomtic speech recognition ii: Noisex-9: dtbse nd n experiment to study the effect of dditive noise on speech recognition systems, Speech Commn., vol., no., pp. 7 5, 99. [] J. Sohn nd N. S. Kim, A sttisticl model-bsed voice ctivity detection, IEEE Signl Process. Lett., vol., no., pp., Jn. 999. [7] P. Loizou, Speech Enhncement Theory nd Prctice. CRC Press, 7. [8] ITU-T Rec. P.8, Perceptul evlution of speech qulity (PESQ An objective method for end-to-end speech qulity ssessment of nrrowbnd telephone networks nd speech codecs, Interntionl Telecommuniction Union, Feb.. [9] C. H. Tl, R. C. Hendriks, R. Heusdens, nd J. Jensen, An lgorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Trns. Audio, Speech, Lng. Process. vol. 9, pp. 5, Sep.. 5