TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION

Similar documents
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Transient noise reduction in speech signal with a modified long-term predictor

Auditory modelling for speech processing in the perceptual domain

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Monaural and Binaural Speech Separation

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

EE482: Digital Signal Processing Applications

Chapter 4 SPEECH ENHANCEMENT

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

Audio Watermarking Based on Multiple Echoes Hiding for FM Radio

Pushpraj Tanwar Research Scholar in ECE Dept. Maulana Azad National Institute of Technology Bhopal, India

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

Pitch Period of Speech Signals Preface, Determination and Transformation

NOISE ESTIMATION IN A SINGLE CHANNEL

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Audio Restoration Based on DSP Tools

Bilateral Waveform Similarity Overlap Add Approach based on Time Scale Modification Principle for Packet Loss Concealment of Speech Signals

REAL-TIME BROADBAND NOISE REDUCTION

Overview of Code Excited Linear Predictive Coder

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Speech Synthesis using Mel-Cepstral Coefficient Feature

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Wavelet Speech Enhancement based on the Teager Energy Operator

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

Open Access Research of Dielectric Loss Measurement with Sparse Representation

Speech Coding using Linear Prediction

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Speech Enhancement using Wiener filtering

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Introduction of Audio and Music

Online Version Only. Book made by this file is ILLEGAL. 2. Mathematical Description

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

Real time noise-speech discrimination in time domain for speech recognition application

ACOUSTIC feedback problems may occur in audio systems

A METHOD OF SPEECH PERIODICITY ENHANCEMENT BASED ON TRANSFORM-DOMAIN SIGNAL DECOMPOSITION

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

EFFECTS OF PHYSICAL CONFIGURATIONS ON ANC HEADPHONE PERFORMANCE

Enhanced Waveform Interpolative Coding at 4 kbps

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Advanced Signal Processing and Digital Noise Reduction

Audio Imputation Using the Non-negative Hidden Markov Model

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Sound pressure level calculation methodology investigation of corona noise in AC substations

ACOUSTIC DATA TRANSMISSION IN AIR USING TRANSDUCER ARRAY

Open Access Sparse Representation Based Dielectric Loss Angle Measurement

Implementation of SYMLET Wavelets to Removal of Gaussian Additive Noise from Speech Signal

Speech Enhancement for Nonstationary Noise Environments

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

Journal of American Science 2015;11(7)

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

L19: Prosodic modification of speech

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

Time-Frequency Enhancement Technique for Bevel Gear Fault Diagnosis

HUMAN speech is frequently encountered in several

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

Acoustic Echo Cancellation using LMS Algorithm

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

A NEW FEATURE VECTOR FOR HMM-BASED PACKET LOSS CONCEALMENT

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Joint dereverberation and residual echo suppression of speech signals in noisy environments Habets, E.A.P.; Gannot, S.; Cohen, I.; Sommen, P.C.W.

Research Article Subband DCT and EMD Based Hybrid Soft Thresholding for Speech Enhancement

Real Time Noise Suppression in Social Settings Comprising a Mixture of Non-stationary and Transient Noise

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Speech Enhancement Based On Noise Reduction

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

Vibration Signal Pre-processing For Spall Size Estimation in Rolling Element Bearings Using Autoregressive Inverse Filtration

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Mikko Myllymäki and Tuomas Virtanen

Speech Signal Enhancement Techniques

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation

Fundamental Frequency Detection

Sound Synthesis Methods

EC209 - Improving Signal-To-Noise Ratio (SNR) for Optimizing Repeatable Auditory Brainstem Responses

Guan, L, Gu, F, Shao, Y, Fazenda, BM and Ball, A

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

A LPC-PEV Based VAD for Word Boundary Detection

ORTHOGONAL frequency division multiplexing (OFDM)

Transcription:

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION Jian Li 1,2, Shiwei Wang 1,2, Renhua Peng 1,2, Chengshi Zheng 1,2, Xiaodong Li 1,2 1. Communication Acoustics Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China 119, 2. Acoustics and Information Technology Laboratory, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 2121, e-mail: cszheng@mail.ioa.ac.cn This paper proposes a novel transient noise reduction (TNR) algorithm based on speech reconstruction. The proposed algorithm has two stages. First, the transient noise is detected by using linear prediction residual, which will be referred as Linear Prediction Residual (LPR)-based method. Second, we replace the frames that contain transient noise with the reconstructed speech by using packet loss concealment techniques, which can reduce speech distortion and suppress the transient noise in a robust way. Compared with traditional TNR algorithms, the proposed algorithm is computationally efficient. Moreover, the proposed algorithm can completely eliminate transient noise especially when the voiced speech and the transient noise exist simultaneously. Experimental results show that the proposed algorithm using speech reconstruction techniques can reduce the transient noise effectively, up to 3dB, without introducing audible speech distortion. 1. Introduction Transient noise, which is a type of non-stationary signal with short duration less than 5ms, often appears as an interference in speech communication systems, such as mobile phones, hearing aids and teleconference devices [1]. Since transient noise may seriously degrade the speech quality in practice, it is necessary to suppress it in an efficient way. In recent years, transient noise reduction (TNR) has become an attractive research topic and the researchers have already made some efforts to suppress this transient noise. In [1] and [2], Talmon and Cohen proposed an algorithm that can efficiently suppress transient noise with diffusion maps. However, this algorithm is computationally complex and non-causal. In [3]-[5], transient noise is suppressed in the time domain, wavelet domain or frequency domain, respectively. These algorithms can suppress transient noise with low delay while they may cause serious speech distortion when speech is erroneously detected as transient noise. Moreover, experimental studies show that the existing algorithms cannot completely eliminate transient noise in the sense of hearing. In this paper, we propose a novel TNR algorithm by using speech reconstruction. The proposed method is composed of two steps. In the first step, the Linear Prediction Residual (LPR)-based method is proposed to detect the transient noise as far as possible. In the second step, the transient ICSV21, Beijing, China, July 13-17, 214 1

21st International Congress on Sound and Vibration (ICSV21), Beijing, China, 13-17 July 214 noise corrupted frames are removed and packet loss concealment techniques are used to reconstruct speech for continuity. The remainder of this paper is organized as follows. In Section II, we formulate the problem. The LPR-based method is presented in Section III. Then, a packet loss concealment technique is proposed to reconstruct speech in Section IV. Section V gives some experimental results to show the validity of the proposed algorithm. Some conclusions are presented in Section VI. 2. Problem formulation Let s(n) denote a clean speech signal and let d st (n) and d tr (n) be the additive stationary and transient noise signals, respectively. The signal received by a microphone is composed of these three components, written as: x (n) = s (n) + d st (n) + d tr (n) (1) Since the additive stationary noise can be removed by traditional single-channel speech enhancement algorithms [6], [7], we ignore the impact of d st (n) in this paper. Therefore, (1) can be rewritten as: x (n) = s (n) + d tr (n) (2) The microphone signal x(n) can be divided into short-time frames and the transient noise detection problem in the lth frame can be regarded as a binary hypothesis test, given by: { H1 (l) : x l (n) = s (Ml + n) + d tr (Ml + n) or x l (n) = d tr (Ml + n) (3) H (l) : x l (n) = s (Ml + n) where M is the frame shift length and n =,1,...N-1, with N the frame length. In this paper, we choose M=256 and N=512 with the sampling frequency of 16 khz. To eliminate the transients completely, it is better to detect the transient noise as far as possible even when both the speech and the transient noise are presented in the l frame, since we can solve this problem by applying speech reconstruction techniques. 3. Transient noise detection with LPR-based method In the following three parts, we introduce the LPR-based method. In the first part, we analyze the properties of the LPR for different types of signals. In the second part, the specific processes are given to distinguish the transient noise from the speech. In the final part, some experimental results are presented to show the validity of the this method. 3.1 The properties of the LPR In this paper, we assume that the energy of the transient noise mostly concentrates on a small range over the time scale and the temporal energy of the transient noise is significantly larger compared with the speech components. According to the traditional methods, the spectral coherence can be applied to distinguish the unvoiced speech from the transient noise[8]. Meanwhile, the harmonic property of the voiced speech is useful to differentiate the voiced speech and the transient noise [5]. However, when the voiced speech and the transient noise exist at the same time, the voiced speech has a large influence on characteristics of the transient noise and makes it difficult to detect the transient noise. To solve this problem, we propose the LPR-based method. For enhancing the characteristic difference between the transient noise and the speech, we whiten the noisy signal x(n) in each frame. Let x l (n) be the LPR in the lth frame, which can be written as: P x l (n) = x l (n) a l px l (n p) (4) ICSV21, Beijing, China, July 13-17, 214 2 p=1

21st International Congress on Sound and Vibration (ICSV21), Beijing, China, 13-17 July 214 where {a l p} P p=1 are the AR coefficients in the lth frame. In practice, we can apply the common Levinson-Durbin algorithm to estimate the AR coefficients. Different types of signals are shown in Figure. 1 within a short time frame, where (a) is the voiced speech, (b) is the transient noise, (c) is the voiced speech corrupted by transients and (d) is the unvoiced speech. Each type of signal is whitened using linear prediction and the results are shown in Figure. 2 respectively. It is observed that the LPR of the voiced speech is reduced to an impulse train, where the impulses show up periodically as shown in Figure. 2(a). Whereas, Figure.2(b) indicates that the transient noise concentrates its energy on a small window of time before and after linear prediction due to the fact that the transient noise has a short duration and a flat spectrum. Comparing Figure. 1(c) and Figure. 2(c), it can be seen that the transient noise becomes more obvious after linear prediction since the voiced speech is suppressed by the linear prediction while the whitened transient component retains most of its energy. The energy of the unvoiced speech is approximately uniformly distributed over the time which can be seen in Figure. 1(d) and Figure. 2(d)..1.1 1 2 3 4 5 (a).1.1.2.1.1 1 2 3 4 5 (b) 1 2 3 4 5 (c).1.1 1 2 3 4 5 (d) Samples.4.2.2 1 2 3 4 5 (a).1.5.5 1 2 3 4 5 (b).1.5.5 1 2 3 4 5 (c).1.1 1 2 3 4 5 (d) Samples Figure 1: Waveforms of the original signals. Figure 2: Waveforms of the whitened signals. 3.2 A signal centroid-based method to detect transient noise Based on the different distributions between the transient noise and the other signals in the residual domain, we propose a signal centroid-based method to detect transient noise. The centroid of the LPR in the lth frame can be written as: C(l) = N 1 n= N 1 n x l (n) / n= x l (n) (5) Centered on the centroid C(l), the minimum time length which contains E% total energy is given by: C(l)+v x l (n) n=c(l) v B(l) = min E% (6) v N 1 x l (n) n= where E is recommended from 75-95 and E = 9 is chosen in this paper. Our studies indicate that the B(l) is small under H 1 (l), which is based on the fact that the energy of the transient noise concentrates around a small range. Aiming at improving the detection probability, we introduce a weighted window function w(n) and (5) can be rewritten as: ICSV21, Beijing, China, July 13-17, 214 3

21st International Congress on Sound and Vibration (ICSV21), Beijing, China, 13-17 July 214 C(l) = N 1 n= N 1 n w(n) x l (n) / n= w(n) x l (n) (7) where w(n) is a hanning window in practice. Our studies indicate that an appropriate choice of w(n) and overlap length will make the energy concentrated, which helps to detect the transient noise. However, we find that the speech phoneme onsets, which are characterized by sudden bursts, also concentrate their energy on a small range. To solve this problem, we propose to add some stationary noise into the original signal, which can mask the speech phoneme onset component but not mask the voiced speech and the transient noise. Notice should be given that even if the speech is erroneously detected as the transient noise, we can use the packet loss concealment technology to reconstruct the speech, which will be introduced in the following section. Through the above protective measures, the detection criterion can be given by: { B(l) Cth, accept H (l) (8) B(l) < C th, accept H 1 (l) where C th is the threshold and it is relevant to the frame length and the type of transient noise. A large amount of experiments show that C th = 15 is a good choice when the frame length is 512. 3.3 LPR-based method simulation In this part, we show the validity of the LPR-based method. The speech signal corrupted by transient noise is used for simulation and the results are shown in Figure. 3, where the dashed line represents the threshold C th. The results indicate that the LPR-based method can detect the transient noise effectively even when the speech and the transients exist simultaneously. 2 B(l) 15 1 2 4 6 8 1 Time[Sec].2.2 2 4 6 8 1 Time[Sec] Figure 3: Simulation for the transient noise detection. 4. TNR based on speech reconstruction Traditional TNR algorithms cannot eliminate transient noise completely in practice. Unfortunately, human s auditory system is sensitive to the residual transient noise. Vaseghi and Rayner proposed a method for removing impulsive noise and reconstructing speech with interpolation algorithm [3]. In this paper, we replace the frames that contain transient noise with the reconstructed speech by using packet loss concealment techniques. Since the duration of transient noise is usually less than 5ms, once the frame is detected to contain transient noise, this frame and its two successive frames should be discarded to ensure that the transient noise can be totally eliminated. Various packet loss concealment techniques can be used to generate approximations of the discarded frames such as Waveform substitution algorithm and Waveform Similarity Overlap-add (WSOLA) algorithm [9], [1]. In this paper, we apply two-side pitch waveform replication (PWR) technique [11] to recontract the speech of the discarded frames. ICSV21, Beijing, China, July 13-17, 214 4

21st International Congress on Sound and Vibration (ICSV21), Beijing, China, 13-17 July 214 4.1 Pitch detection The pitch period of each frame can be estimated by computing the normalized autocorrelation of the signal and searching for the index that maximizes the normalized autocorrelation [12], i.e., L 1 x(n)x(n + τ) n= C nac (τ) =, τ = τ min...τ max (9) L 1 x(n) 2 L 1 x(n + τ) 2 n= n= { τ, τ L = min < τ N 2 N N τ, < τ τ (1) 2 max where L is the correlation size and τ min and τ max are the minimum and maximum values of pitch periods, respectively. The estimated pitch period is then used to reconstruct the speech. The more accurate method of pitch estimation is beyond the scope of this paper. 4.2 Speech reconstruction Based on whether the forward frame or the backward frame is voiced, we consider four different conditions[11]: both are voiced (BV), only the previous frame is voiced (PV), only the next frame is voiced (NV) and both are unvoiced (BU). The reconstruction methods of the 4 conditions are given in detail. 4.2.1 Both voiced condition For the BV condition, an algorithm based on phase synchronization and pitch adjustment is used to reconstruct the discarded frames[13]. We assume that pitch period of the forward frame is P f and the pitch period of the backward frame is P b. In the forward frame, we choose P f samples nearest to the discarded frames to be previous pitch waveform, referred as PPW. In the backward frame, we choose P b samples nearest to the discarded frames to be next pitch waveform, referred as NPW. Assuming that there are r samples to be reconstructed, and the number of reconstructed pitch waveform(referred as RPW) is N p, given by: N p = round( round(r/p f) + round(r/p b ) ) (11) 2 In general, P f is not equal to P b so the length of each RPW is different. For instance, if P f < P b, the length of the ith RPW i is given by: P i = P f + round( P b P f N p + 1 i), i = 1, 2...N p (12) If r P 1 +P 2...+P Np, P i should be slightly modified to satisfy the criteria r = P 1 +P 2...+P Np. To get the ith RPW i with length of P i, we apply interpolation method to modify PPW into modified- PPW with P i samples, referred as PPW i m. Likewise, the same method can be used to modify NPW into modified-npw with P i samples, referred as NPW i m. The ith RPW i can be written as: RPW i (k) = w i f(k) PPW i m(k) + w i b(k) NPW i m(k), k = 1, 2...P i (13) wf(k) i = r g, w i r b(k) = g r, g = P 1 + P 2...P i 1 + k (14) where wf(k), i wb(k) i are the gain patterns used for adjusting the contributing ratio of forward and backward component. We combine all the RPWs so as to get the reconstructed speech. ICSV21, Beijing, China, July 13-17, 214 5

21st International Congress on Sound and Vibration (ICSV21), Beijing, China, 13-17 July 214 4.2.2 Other conditions For the PV, NV and BU conditions, a simple recovery approach [11] is used to reconstruct the discarded frames. As to the PV condition, the last pitch segment of the forward frame is repeated to fulfill the region of the discarded frames and the gain patterns are used to adjust the amplitude. A similar method can be used to process the NV condition. In case of the BU condition, the rear half of the forward frame and the first half of the backward frame are respectively extended throughout the discarded frames with an amplitude adjustment process. 4.3 Simulation In this part, different types of speech signals are used for simulation. The waveforms of the o- riginal signals are shown in Figure. 4(a)-(c) and the waveforms of the reconstructed signals are shown in Figure. 4(d)-(f). Notice should be given that in Figure. 4(d)-(f) the dashed lines represent the forward and backward frames while the solid lines represent the reconstructed frames. The results show that the two-side PWR algorithm can reconstruct speech effectively without significant distortion..1.1 2 4 6 8 1 12 (a).1.1 2 4 6 8 1 12 (d).5.5 2 4 6 8 1 12 (b).5.5 2 4 6 8 1 12 (e).5.5.5 2 4 6 8 1 12 (c) Samples.5 2 4 6 8 1 12 (f) Samples Figure 4: Waveforms of the original and the reconstructed signals. 5. Experiments In this section, some experimental results are given to show the validity of the proposed algorithm. In the first part, we use the speech corrupted by the mouse clicking for simulation and illustrate the validity of the proposed algorithm. In the second part, two objective measures are applied to compare the proposed algorithm with the traditional ENV-TNR algorithm in [14]. 5.1 Validity of the proposed algorithm In this part, the transient noise corrupted speech signal sampled at 16 khz is used to show the validity of the proposed algorithm and the results are shown in Figure. 5. Our experiments show that the proposed algorithm can detect the transient noise accurately and suppress the transient noise effectively without introducing audible speech distortion. 5.2 Quantitative Results In this part, the quantitative results of the perceptual evaluation of speech quality (PESQ) and the amount of noise reduction are given to show the validity of the proposed algorithm. Both the keyboard typing noise and the mouse clicking noise are used to compare the proposed algorithm with ICSV21, Beijing, China, July 13-17, 214 6

21st International Congress on Sound and Vibration (ICSV21), Beijing, China, 13-17 July 214.2.2 1 2 3 4 5 6 7 8 (a) Fre[Hz] 8 6 4 2 1 2 3 4 5 6 7 8 (d) 5 1 15.2.2 1 2 3 4 5 6 7 8 (b) Fre[Hz] 8 6 4 2 1 2 3 4 5 6 7 8 (e) 5 1 15.2.2 1 2 3 4 5 6 7 8 (c) Time[Sec] Fre[Hz] 8 6 4 2 1 2 3 4 5 6 7 8 (f) Time[Sec] 5 1 15 Figure 5: Waveforms of (a) Clean speech; (b) Noisy speech; (c) Enhanced speech and speech spectrograms of (d) Clean speech; (e) Noisy speech; (f) Enhanced speech. the traditional ENV-TNR algorithm. The comparison results are presented in Table 1. This table clearly demonstrates that the proposed algorithm could reduce the transient noise and improve the PESQ simultaneously. This is based on the fact that the proposed algorithm can eliminate the transient noise completely and reconstruct the speech effectively without significant speech distortion. Table 1: Comparison results of the amount of noise reduction and the PESQ. Noise Type Noise Reduction [db] PESQ ENV-TNR Proposed Nosiy ENV-TNR Proposed Keyboard Typing 18.1 3.64 1.21 1.6 2.19 Mouse Clicking 6.32 4.12 1.76 1.59 2.65 6. Conclusion This paper proposes a new LPR-based transient noise detection method and a new transient noise reduction algorithm based on speech reconstruction. Compared with the traditional TNR algorithms, the proposed algorithm can completely eliminate transient noise without introducing audible speech distortion, even when the voiced speech and the transient noise exist simultaneously. Experimental results verify the validity of the proposed algorithm in reducing transient noise and improving the speech quality. Future work should concentrate on improving the transient noise detection method and reconstructing the speech more accurately to further avoid audible speech distortion. Acknowledgement This work was supported by NSFC (National Science Fund of China) under Grant No. 612143 and No. 6132126. This work was also supported in part by the tri-networks integration under No. KGZD-EW-13-5(3). REFERENCES 1 Talmon, R., Cohen, I. and Gannot. S. Single-Channel Transient Interference Suppression With Diffusion Maps, IEEE Transactions on Audio, Speech and Language Processing, 21(1), 132 144, January, (213). ICSV21, Beijing, China, July 13-17, 214 7

21st International Congress on Sound and Vibration (ICSV21), Beijing, China, 13-17 July 214 2 Talmon, R., Cohen, I. and Gannot. S. Transient Noise Reduction Using Nonlocal Diffusion Filters, IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1584-1599, August, (211). 3 Vaseghi, S. V. and Rayner, P. J. W. Detection and Suppression of Impulsive Noise in Speech Communication Systems, IEE Proceedings I (Communications, Speech and Vision), 137(1), 38 46, February, (199). 4 Nongpiur, R. C. Impulse Noise Removal in Speech Using Wavelets, Proceedings of the 28 IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vagas, USA, 1593-1596, April, (28). 5 Zheng, C. S., Chen, X. L., Wang, S. W., Peng, R. H. and Li, X. D. Delayless Method to Suppress Transient Noise Using Speech Properties and Spectral Coherence, Proceedings of the 125 th Audio Engineering Society Convention, New York, USA, 17 2 October, (213). 6 Hu, X. H., Wang, S. W., Zheng, C. S. and Li, X. D. A Cepstrum-based Preprocessing and Postprocessing For Speech Enhancement in Adverse Environments, Applied Acoustics, 74(12), 1458-1462, December, (213). 7 Wang, J., Liu, H., Zheng, C. S. and Li, X. D. Spectral Subtraction based on Two-Stage Sspectral Estimation Modified Cepstrum Thresholding, Applied Acoustics, 74(3), 45-458, March, (213). 8 Zheng, C. S., Yang, H. F. and Li, X. D. On Generalized Auto-Spectral Coherence Function and Its Applications to Signal Detection, IEEE Signal Processing Letters, 21(5), 559-563, May, (214). 9 Goodman, D. J., Lockhart, G. B., Wasem, O. J. and Wong, W. C. Waveform Substitution Techniques for Recovering Missing Speech Segments in Packet Voice Communications, IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP(34), 144 1448, December, (1986). 1 Verhelst, W. and Roelands, M. An Overlap-Add Technique Based on Waveform Similarity (W- SOLA) for High Quality Time-Scale Modification of Speech, IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, USA, 2, 554 557, April, (1993). 11 Liao, W. T., Chen, J. C. and Chen, M. S. Adaptive Recovery Techniques for Real-Time Audio Streams, Proceedings of the 2th Annual Joint Conference of the IEEE Computer and Communications Socitiies, Anchorage, AK, 2, 815-823, April, (21). 12 Medan, Y. Yair, E. and Chazan, D. Super Resolution Pitch Signal Determination of Speech Signal, IEEE Transactions on Signal Processing, 39(1), 4 48, January, (1991). 13 Li, Z. B., Zhao, S. H., Wang, J. and Kuang, J. M. A Side Information Based Packet Loss Recovery Algorithm in VoIP, Congress on Image and Signal Processing, 28, Sanya, China, 5, 139 144, May, (28). 14 Manohar, K. and Rao, P. Speech Enhancement in Nonstationary Noise Environments Using Noise Properties, Speech Communication, 48(1), 96 19, January, (26). ICSV21, Beijing, China, July 13-17, 214 8