On using acoustic environment classification for statistical model-based speech enhancement

Size: px
Start display at page:

Download "On using acoustic environment classification for statistical model-based speech enhancement"

Transcription

1 Available online at Speech Communication 54 (22) On using acoustic environment classification for statistical model-based speech enhancement Jae-Hun Choi, Joon-Hyuk Chang School of Electrical Engineering, Hanyang University, Seoul 33-79, Republic of Korea Received 4 April 2; received in revised form 7 October 2; accepted 3 October 2 Available online November 2 Abstract In this paper, we present a statistical model-based speech enhancement technique using acoustic environment classification supported by a Gaussian mixture model (GMM). In the data training stage, the principal parameters of the statistical model-based speech enhancement algorithm such as the weighting parameter in the decision-directed (DD) method, the long-term smoothing parameter of the noise estimation, and the control parameter of the minimum gain value are uniquely set as optimal operating points according to the given noise information to ensure the best performance for each noise. These optimal operating points, which are specific to the different background noises, are estimated based on the composite measures, which are the objective quality measures representing the highest correlation with the actual speech quality processed by noise suppression algorithms. In the on-line environment-aware speech enhancement step, the noise classification is performed on a frame-by-frame basis using the maximum likelihood (ML)-based Gaussian mixture model (GMM). The speech absence probability (SAP) is used to detect the speech absence periods and to update the likelihood of the GMM. According to the classified noise information for each frame, we assign the optimal values to the aforementioned three parameters for speech enhancement. We evaluated the performances of the proposed methods using objective speech quality measures and subjective listening tests under various noise environments. Our experimental results showed that the proposed method yields better performances than does a conventional algorithm with fixed parameters. Ó 2 Elsevier B.V. All rights reserved. Keywords: Speech enhancement; Noise classification; Gaussian mixture model; DFT. Introduction Speech enhancement is a fundamental part of speech processing because environmental background noise drastically degrades the performances of processing systems (Boll, 979; Sim et al., 998; McAulay and Malpass, 98; Ephraim and Malah, 984, 985). Among the many approaches that have been developed to enhance speech, the spectral subtraction has been shown to be effective in suppressing stationary noise (Boll, 979). However, this technique is limited in its ability to deal with noise such as musical noise, which is characterized by tones with random noises, especially nonstationary noises. To avoid the Corresponding author. Tel.: ; fax: address: jchang@hanyang.ac.kr (J.-H. Chang). typical artifacts in a practical speech enhancement system, we should consider two major components of noise power estimation and uncorrupted speech estimation (Sim et al., 998; McAulay and Malpass, 98; Ephraim and Malah, 984, 985; Cappé, 994; Park and Chang, 27; Martin, 2; Sohn and Sung, 998; Cohen and Berdugo, 22). In the estimation of speech, Ephraim and Malah derived the minimum mean-square error (MMSE) estimator, which is very efficient at reducing musical noise phenomena (Ephraim and Malah,984, 985; Cappé, 994). Other spectral weighting tasks such as Wiener filtering, maximum a posteriori and MMSE log-spectral amplitude criteria have been considered (Sim et al., 998; McAulay and Malpass, 98; Ephraim and Malah, 984, 985). These algorithms are further enhanced through the use of a soft decision scheme in which the speech absence probability (SAP) is /$ - see front matter Ó 2 Elsevier B.V. All rights reserved. doi:.6/j.specom.2..9

2 478 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) derived based on the likelihood ratio test (LRT) and used for gain modification. Actually, the spectral gain is modified by the SAP, which is estimated for each frequency bin and each frame. The SAP based on the statistical model of speech is generally computed with the help of an a priori SNR, which is estimated using the non-linear recursive procedure, called the decision-directed (DD) approach (Ephraim and Malah, 984). The a priori SNR determined by the DD rule takes into account the current short-time frame, with a fixed weight ( a) and the processing output in the previous frame, with a weight a. Note that the parameter a should be carefully set since it has substantial control over the trade-off between the degree of smoothing in the a priori SNRs in a noisy area and the acceptable level of transient distortion in the signal. In contrast to the conventional DD estimator, which has a fixed weight factor, the adaptive weight factor determined by the deviation of the a posteriori SNR is proposed in (Park and Chang, 27). Unfortunately, this estimator interacts with the estimated SNR and does not consider a wide variety of noise conditions. In view of the noise power estimation, minimum statistics (MS) obtain the noise estimate as the minima values from a smoothed power estimate of the noisy signal (Martin, 2). The MS method is motivated by the observation that the power of a noisy speech signal frequently reduces to the power level of the noise signal. This method is known to be sensitive to outliers, is generally biased, and has a variance that is about twice as large as that of a conventional noise estimator (Cohen and Berdugo, 22). On the other hand, the aforementioned soft decision has been applied to the noise power estimation module by adopting the SAP in the long-term smoothed power spectrum of the background noise (Sohn and Sung, 998; Kim and Chang, 2; Chang and Kim, 2). In (Cohen and Berdugo, 22), the noise power estimate is updated during periods of speech absence, and speech presence. Considering both the noise power estimation and the estimation of speech, most speech enhancement algorithms are packaged with tunable parameters that substantially affect their performance. For example, the weight parameter in the DD approach could be tuned by the off-line knowledge of the acoustic background noise. Actually, the environmental sniffing framework proposed by Akbacak and Hansen shows improvement in speech recognition in a car environment (Akbacak and Hansen, 27). Krishnamurthy and Hansen further improved the performance of speech enhancement technique by providing a more accurate estimate of the noise update rate for a given environment (Krishnamurthy and Hansen, 26). An environmentally-aware voice activity detector, used in the method of Sangwan et al. (27), is based on an accurate noise model by employing the support vector machine (SVM). Regarding the classification of acoustic environments, numerous studies have been conducted for context-aware applications (Ma et al., 26; Kraft et al., 25). In this paper, we propose a novel speech enhancement approach using acoustic noise classification. Practically, statistical model-based speech enhancement is considered to be a target platform in which the SAP is derived based on the LRT by employing the DD method for the estimation of the a priori SNR and is used to modify the spectral gain and update the noise power. First, we identify the optimal points of the principal parameters such as the weight parameter of the DD approach, the long-term smoothing parameter, and the control parameter of the MMSE gain function for a wide variety of noise environments. This is achieved with the help of the composite measure, which is known to be relevant in estimating actual speech quality (Hu and Loizou, 26, 28). Secondly, we perform the noise classification on a frame-by-frame basis to recognize the noise type of the current frame. Indeed, a Gaussian mixture model (GMM)-based maximum likelihood (ML) estimation is used for noise classification during speech absence only and is performed using the SAP. Feature vectors applied to the GMM are carefully selected from the relevant parameters of the 3GPP2 selectable mode vocoder (SMV), as in (Song etal., 28; 3GPP2- C.R3-, v3., 25). Subsequently, we organize this noise knowledge in each frame to assign the optimal values for the three parameters which gives the best performance for a specific type of underlying additive noise. Specifically, our approach responds quickly to noise variation since the running average is used to track evolving noise. Based on a number of experiments, the proposed speech enhancement technique is found to yield a better performance than the conventional approach with fixed parameters. The rest of the paper is organized as follows. Section 2 briefly reviews the soft decision-based speech enhancement technique and Section 3 contains the proposed algorithm. Section 4 describes the experimental setup and results in detail and conclusion s are presented in Section Review of soft decision based-speech enhancement Let x(n) andd(n) denote clean speech and uncorrelated additive noise signals, respectively. The observed noisy speech signal y(n) is the sum of a clean speech signal x(n) and noise d(n), where n is a discrete-time index. By taking a discrete Fourier transform (DFT), we then have Y k ðtþ ¼X k ðtþþd k ðtþ; ðþ where k (=,2,...,K) is the frequency bin and t is the frame index. Given two hypotheses, H and H which indicate speech absence and presence, respectively, we assume that H : speech absent : Y k ðtþ ¼D k ðtþ; H : speech present : Y k ðtþ ¼X k ðtþþd k ðtþ: Assuming that the clean speech X k (t) and the additive noise D k (t) are statistically independent and that noisy spectral components are characterized by zero-mean complex Gaussian distributions, the probability density functions (PDF s) conditioned on the two hypotheses of H and H are given by ð2þ

3 ( ) pðy k ðtþjh Þ¼ pk d;k ðtþ exp jy kðtþj 2 ; ð3þ k d;k ðtþ ( ) pðy k ðtþjh Þ¼ pðk x;k ðtþþk d;k ðtþþ exp jy k ðtþj 2 ; ð4þ k x;k ðtþþk d;k ðtþ where k x,k (t) and k d,k (t) denote the variances of the clean speech and noise for the kth spectral component at the tth frame, respectively (Kim and Chang, 2). For the soft decision, the global SAP (GSAP) p(h jy(t)) conditioned on the current observations is derived such that pðh jy ðtþþ ¼ ¼ pðy ðtþjh ÞpðH Þ pðy ðtþjh ÞPðH ÞþpðYðtÞjH ÞPðH Þ þ PðH Q Þ K PðH Þ k¼ KðY kðtþþ ; where P(H )=( P(H )) is the a priori probability of speech absence. Also, substituting (3) and (4) into (5), the likelihood ratio K(Y k (t)) at the kth frequency is expressed as follows (Kim and Chang, 2): KðY k ðtþþ ¼ pðy kðtþjh Þ pðy k ðtþjh Þ ¼ þ n k ðtþ exp c kðtþn k ðtþ ; ð6þ þ n k ðtþ J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) ð5þ where ^k d;k ðtþ is the estimate for k d,k (t) andf d (=.99) is a smoothing parameter under a general stationary assumption of D k (t) (Kim and Chang, 2). Taking into account the uncertainty for speech absence or presence, the GSAP is applied to the expectation for the power spectrum of a noise signal as shown below: E½jD k ðtþj 2 jy k ðtþš ¼ E½jD k ðtþj 2 jy k ðtþ; H ŠpðH jy ðtþþ þ E½jD k ðtþj 2 jy k ðtþ; H ŠpðH jy ðtþþ: ðþ Let bx k ðtþ represent the estimated clean speech spectrum at the kth frequency bin and in the tth frame. In general, in the speech enhancement techniques, bx k ðtþ is estimated by applying a spectral gain to each spectral component of the input noisy spectrum. For the effective reduction of the musical noise phenomenon, we adopt the MMSEbased noise suppression rule proposed by Ephraim and Malah (984), as follows: bx k ðtþ ¼maxfGðn k ðtþ; c k ðtþþ; G min gy k ðtþ; ð2þ where G min is the minimum gain to control the perceived noise and G(,) denotes the actual noise suppression gain given by pffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p n Gðn; cþ ¼ F 2 cð þ nþ cn þ n ; ð3þ where the a posteriori signal-to-noise ratio (SNR) c k (t) and the a priori SNR n k (t) are defined by c k ðtþ jy kðtþj 2 k d;k ðtþ ; ð7þ n k ðtþ k x;kðtþ k d;k ðtþ : ð8þ Also, if ^n k ðtþ and ^c k ðtþ are the estimates for n k (t) and c k (t), respectively, ^n k ðtþ can be estimated using the well-known decision-directed (DD) approach (Ephraim and Malah, 984) as follows: j bx k ðt Þj ^n 2 k ðtþ a n ^k d;k ðt Þ þð a nþc½^c k ðtþ Š; ð9þ where bx k ðt Þ represents the estimated clean speech spectrum in the previous frame and C[x] =x if x P, and C[x] = otherwise. Here, a n ( 6 a n 6 ) is a weighting factor that controls the trade-off between the noise reduction and the transient signal distortion by being chosen empirically close to (i.e., a n =.99). Also, ^c k ðtþ is directly obtained by the ratio of the input power jy k (t)j 2 and the estimate of k d,k (t). On the other hand, the estimation of the noise power spectrum is a major component in speech enhancement. In particular, the soft decision method adopts a long-term smoothed noise power spectrum of the background noise as the estimate for k d,k (t) as follows (Kim and Chang, 2): ^k d;k ðt þ Þ ¼f d^kd;k ðtþþð f d ÞE½jD k ðtþj 2 jy k ðtþš; ðþ with F ½mŠ ¼exp m h m m i ð þ mþi þ mi ; ð4þ in which I and I being the modified Bessel functions of zero and first order, respectively. Notice that G min should be carefully set since it controls the trade-off between residual noise signal and the musical effect. As a similar value in (TIA/EIA/IS-27, 996),.248 is used as a fixed value in (Kim and Chang, 2). 3. Proposed environment-aware speech enhancement In the previous section, we describe that principal parameters of the soft decision-based speech enhancement technique as in (Kim and Chang, 2), such as the weight a n in the DD approach, the long-term smoothing parameter f d in the noise power estimation, and the minimum gain parameter G min are fixed values. Since, however, those parameters should be varied according to the noise type to ensure the best performance, we organize the environmental knowledge associated with noise to adaptively select parameters in speech enhancement. The overall environment-aware speech enhancement based on the noise classification employing the GMM is shown in Fig.. In the following subsections, we describe each part of the proposed algorithm in more detail. 3.. Finding optimal operating points for given noises The operating points of a n, f d, and G min according to specific noises should be built based on a relevant criterion

4 48 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Fig.. Overall block diagram of the proposed environment-aware speech enhancement. in terms of speech quality. The most accurate way to evaluate speech quality can be achieved through an exhaustive subjective listening test. However, since such tests are costly and time-consuming, we adopt the well-known composite measure in (Hu and Loizou, 26, 28) to measure overall speech quality depending on the parameter variations. The composite measure for overall quality, C ovl is expressed by combining basic objective measures to form a new measure, as following: C ovl ¼ :594 þ :85S PESQ :52S LLR :7S WSS ; ð5þ where S PESQ, S LLR, and S WSS denote the scores according to the perceptual evaluation of speech quality (PESQ), the log-likelihood ratio (LLR), and the weighted-slope spectral distance (WSS), respectively (ITU-T Rec. P. 862, 2; Quackenbush et al., 988). It is known from (Hu and Loizou, 28) that the composite measure has a significant correlation with the overall perceptual speech quality such as the mean opinion score (MOS). We then prepared 6 speech samples taken from the NTT database that consist of speech material from four male and four female speakers and that are 8 sec in duration. In order to create noisy environments, we applied twelve different noise types that included babble, car, car2, destroyer-engine, destroyeroperation, factory, factory2, HF-channel, office, street, white, wind noises to the clean speech data at SNR levels of 5,, and 5 db. The speech enhancement technique in (Kim and Chang, 2) was applied to these noisy speech sentences and the parameters were varied. Based on the enhanced speech signal, we first investigated the performance of C ovl by varying a n and f d in a way of the graphical curve for clear understanding. For each noise type, we obtained the 3D mesh curve as a function of the various values of a n and f d as shown in Fig. 2. Based on the data in Fig. 2, we discovered that the four different points indicated by the arrows respectively represent the optimal points in terms of C ovl in the cases of babble, factory, HF-channel, and office noise. By repeating this procedure and incorporating G min as an additional parameter to be optimized, we obtained the unique points (a n ; f d, and G minþ for the given noise types as shown in Table. As shown in the table, the different parameters were chosen according to different noises at the optimal operating points. Note that the variations in the points that depend on the input SNRs are small. This observation tells us that these points can be suitably applied without an additional SNR estimation On-line acoustic noise classification employing Gaussian mixture model As described in the previous subsection, the optimal operating points of the principal parameters for various noise types are obtained at the off-line step. For the realtime implementation the optimal point determination in the time-varying noise condition, we should classify the noise signal on a frame-by-frame basis during speech pauses. To achieve a successful classification, a feature vector that effectively characterizes the discrimination among the various noise environments must be chosen. As given in (3GPP2-C.R3-, v.3., 25) we select a 4-dimensional feature vector which includes ten linear predictive coding (LPC) coefficients, the energy, the partial residual energy, the running mean of energy and the running mean of the partial residual energy due to their superior classification performance. In Fig. 3, the normalized distribution of selected feature vectors according to noise is presented demonstrating that the multi-modal characteristics of the selected features can be successfully modeled using the GMM. For the GMM with the feature vectors ~x ¼fx ; x 2 ;...; x N g, the Gaussian mixture density of a weighted sum of M mixture components is written as follows:

5 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) (a) (b) C ovl α ζ ζ d C ovl α ζ ζ d (c) 2.96 (d) C ovl C ovl α ζ ζ d α ζ ζ d Fig. 2. 3D mesh curve for the estimated optimal operating point: (a) babble noise (SNR = 5 db); (b) factory noise (SNR = db); (c) HF-channel noise (SNR = 5 db); (d) office noise (SNR = db). Table Optimal operating points of a n ; f d, and G min for various noise types. Noise type Optimal points a n f d G min babble car car destroyer-engine destroyer-operation factory factory HF-channel office street white wind pð~xjkþ ¼ XM a i p i ð~xþ; i¼ ð6þ where p i ð~xþ and a i denote, respectively, a Gaussian distribution and the weight of the ith Gaussian mixture defined by: p i ð~xþ ¼ exp 2p N 2jR i j 2 2 ð~x l iþ T R i ð~x l i Þ ; ð7þ X M a i ¼ : ð8þ i¼ Based on this, each noise is modeled by the GMM parameter (k) which comprises the mixture weight p i, the mean vector l i, and the covariance matrix R i. In the noise classification, each noise is characterized by the GMM, i.e., k s where s = (babble), 2 (car), 3 (car2), 4 (destroyer-engine), 5 (destroyer-operation), 6 (factory), 7 (factory2), 8 (HF-channel), 9 (office), (street), (white), 2 (wind), or 3 (universal background model). For this, we used 6 for the number of the mixture based on the trade-off between the performance and the additional computational load. Actually, dependence on the mixture order was marginal as for M P 6. Based on the established model, the objective is to identify the noise model with the maximum a posteriori probability for the input feature vector ~xðtþ. Specifically, we determine the noise model (s) with the maximum a posteriori probability on a current frame assuming equally likely noises such that ^sðtþ ¼arg max log ^pðk s j~xðtþþ: ð9þ s¼;2;...;3 As shown in the flow diagram for noise classification based on the GMM in Fig. 4, the likelihoods of the GMM for individual noise are constructed during the initial ten frames. Once we achieve the GMM likelihood for each noise, the likelihoods are updated frame-by-frame during noise-only periods in our approach, which is a major contribution of this work. For this, we use the long-term smoothed likelihood incorporating the SAP to prevent the likelihood update during speech periods as following: log ^pð~xðtþjk s Þ¼pðH jy ðtþþfb log pð~xðt Þjk s Þ þð bþ log pð~xðtþjk s Þg þ ð pðh jy ðtþþþ log pð~xðt Þjk s Þ; ð2þ

6 482 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) (a) babble car car2 des eng des ops factory Frame Energy factory2 HF channel office street.5 white wind Frame Energy.5 (c) babble car car2 des eng des ops factory Running Mean Energy factory2 HF channel office street.5 white wind Running Mean Energy.5 (b) babble car car2 des eng des ops factory Partial Residual Energy factory2 HF channel office street.5 white wind Partial Residual Energy.5 (d) babble car car2 des eng des ops factory Running Mean of the Partial Residual Energy factory2 HF channel office street.5 white wind Running Mean of the Partial Residual Energy Fig. 3. Normalized distributions of the adopted feature vectors for noise classification: (a) frame energy; (b) partial residual energy; (c) running mean energy; (d) running mean of the partial residual energy. where b (=.985) is the smoothing parameter. Indeed, the misadaptation of the likelihoods during speech presence may result in a failure of noise classification. To address this problem, we employ the SAP counter, which counts the number of successive noise-only frames, where the likelihoods are updated according to (2) only when the SAP counter is greater than a given threshold (i.e., 3). Actually, the background noise can be successfully classified according to the GMM, as displayed in Fig. 5. As can be seen from the classification result after t = 4 s, classification is slightly delayed due to the SAP counter and the long-term smoothing of the likelihood Acoustic noise classification-based speech enhancement Using the classified noise information ^sðtþ on the current frame, three key parameters a n, f d, and G min are substituted in every frame with a n ; f d, and G min, respectively, based on Table. Accordingly, the proposed ^nðt; kþ becomes ^n k ðtþ ¼^a n ðtþ j bx k ðt Þj 2 ^k d;k ðt Þ þ ^a n ðtþ C½^ck ðtþ Š: ð2þ This time, ^a nðtþ is obtained using the long-term smoothing to prevent an abrupt change in a n and to ensure robust performance as follows: ^a n ðtþ ¼j a^a n ðt Þþð j aþ^a n ðtþ; ð22þ with j a (=.9) as a smoothing parameter. Also, the estimation of the noise power is then changed using f d such that ^k d;k ðtþ ¼^f d ðtþ^k d;k ðt Þþ ^f d ðtþ E½jD k ðtþj 2 jy k ðtþš; in which ð23þ ^f d ðtþ ¼j f^f d ðt Þþð j fþ^f d ðtþ; ð24þ with a smoothing parameter j f (=.9). As a result, the soft decision-based speech enhancement is finally achieved using (2) and (23). Based on the newly derived ^n p;k ðtþ and, ^c k ðtþ, the cleans speech spectrum bx k ðtþ is obtained using the aforementioned MMSE-based spectral gain such that bx k ðtþ ¼Gð^n k ðtþ; ^c k ðtþþy k ðtþ: ð25þ From the soft decision as in (2), it is known that the noise suppression rule G(,) is modified by egð; Þ which incorporates the SAP, as given by egð^n k ðtþ; ^c k ðtþþ ¼ ð pðh jy k ðtþþþgð^n k ðtþ; ^c k ðtþþ; ð26þ when deriving the suppression gain, the lower limit G min of the spectral gain should be chosen to minimize the

7 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Fig. 4. Block diagram of the noise classification of the GMM. disturbing residual noise and the speech signal distortion (TIA/EIA/IS-27, 996) as follows: eg k ðtþ ¼maxf egð^n k ðtþ; ^c k ðtþþ; G min ðtþg; ð27þ where max{} is a maximum operator. As shown in (27), a higher minimum gain value results in increased residual noise. In contrast, as the minimum gain approaches to zero, the residual noise is minimized, causing the speech distortion. Thus, it is obvious that the minimum gain G min should be carefully chosen. However, in (TIA/EIA/IS-27, 996), a fixed value (=.248) is used, which is not reasonable when considering the various noise types. Therefore, we adopt G min given by Table according to the classified noise type, where we obtain the final suppression gain as follows: eg k ðtþ ¼maxf egð^n k ðtþ; ^c k ðtþþ; G min ðtþg: ð28þ Accordingly, the residual noise is adjusted based on the noise information, which is clearly different from the approach of the previous method (Kim and Chang, 2). 4. Experiments and results The proposed environment-aware speech enhancement technique using noise classification was evaluated with objective speech quality measures and subjective listening tests. The experimental procedures are divided into the performance evaluation of noise classification and the noise suppression performance for comparison of the proposed algorithm with the conventional method (the speech enhancement based on global soft decision, denoted by SEGSD method) (Kim and Chang, 2). First, in order to evaluate the performance of the noise classification, different data set were used for training and testing. For noise classification using the GMM, test files of speech comprised 456 s long speech data, which was provided by four male and four female speakers and sampled at 8 khz. We did reference decision on clean speech files by manual labeling every -ms frame. Using the handmarked speech files, we divided the test material into only

8 484 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) GSAP speech waveform probability (a) (b) (c) babble noise car noise time(s) 2 s (d) time(s) Fig. 5. Result of noise classification based on GMM: (a) the GSAP; (b) clean speech waveform; (c) noisy speech corrupted by the babble and car noise (SNR = 5 db); (d) result of noise classification. noise-only frames and active speech frames, respectively. The hand-marked test material, included 57.% active speech frames that consisted of 44.% voiced sounds and 3.% unvoiced sounds. To simulate various noise environments, the aforementioned 2 noise sources which were different from those in the training data set, were added to the clean speech data at 5,, 5 db SNRs. The test data included phrases from the NTT database (Chang, 25), spoken by four male and four female speakers. In the NTT database consisting of 96 phrases, each phrase included two different meaningful sentences and each file lasted 8 s. We added the aforementioned various noises to the clean speech signal at different SNRs of 5,, 5 db. We first investigated the performance of the noise classification technique used in the enhancement method compared to the conventional method in (Ma et al., 26). Since we know the noise-only period through the handlabeled information, we measured the detection probabilities (P d ) for each frame of the noise period. For given test files, the performance of the proposed algorithm is shown in Tables 2 4 in the form of a confusion matrix. In addition, the confusion matrices of the conventional method are given in Tables 5 7 for performance comparison. These results show that the noise classification algorithms result in high accuracy (>96%) for given noise at all SNRs. This observation demonstrates that the noise classification technique in our approach is suitable for environmental discrimination for speech enhancement. Note that the performance differences were negligible for the different SNRs, implying that the proposed noise classification provides robust performance in the presence of SNR variation. On the other hand, the conventional method employing the 4th order mel-frequency cepstral coefficient (MFCC) gave us the average accuracy of 95.37% (SNR = 5 db), 95.77% (SNR = db), and 95.75% (SNR = 5 db) whose values are less those of our approach. This indicates that the proposed noise classification technique in the speech enhancement framework algorithm is acceptable. Also, we investigated the computational complexity of the SEGSD and the proposed method for comprehension of the additional computational burden. In this regard, Table 8 shows a summary of the computational complexity in terms of the MIPS claimed by each algorithm. In particular, the MIPS based on the proposed method is divided by two parts such as the noise classification and the speech enhancement part for clear comparison. Note that this computational step is eventually based on the TMS32C55x (TMS32C55x, 22). The results show that the noise classification to classify 3 different noises in the proposed method requires MIPS, which is an additional computation load. But, it is expected to reduce the additional load with minimal performance degradation if we combine similar noise types into a single noise case (e.g., car and car2 are grouped into vehicle) by considering a commercial application. Next, for the comparison of the proposed speech enhancement method with the conventional soft-decision algorithm, which uses fixed smoothing parameters, we

9 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Table 2 Result of the noise classification through a confusion matrix (SNR = 5 db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng.. des-ops fac fac Hf. office street white. wind. Average accuracy of noise classification: 96.73% Table 3 Result of noise classification through a confusion matrix (SNR = db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng.. des-ops.. fac fac Hf. office street white. wind. Average accuracy of noise classification: 97.75% Table 4 Result of noise classification through a confusion matrix (SNR = 5 db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng.. des-ops fac fac Hf. office street white. wind. Average accuracy of noise classification: 96.83% adopted the composite measures to objectively evaluate the speech quality as a combination of various representative objective quality measures. Specifically, the composite measures consisted of signal distortion (C sig ), background noise distortion (C bak ), and overall quality (C ovl ), as defined in (Hu and Loizou, 26, 28): C sig ¼ 3:93 :29 S LLR þ :63 S PESQ :9 S WSS ; C bak ¼ :634 þ :478 S PESQ :7 S WSS þ :63 S segsnr ; C ovl ¼ :594 þ :85 S PESQ :52 S LLR :7 S WSS ; ð29þ ð3þ ð3þ where C sig, C bak,andc ovl denote a five-point scale of signal distortion, a five-point scale of background intrusiveness,

10 486 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Table 5 Result of the conventional noise classification using 4 MFCC through a confusion matrix (SNR = 5 db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng des-ops fac fac Hf office street white wind Average accuracy of noise classification: 95.37% Table 6 Result of the conventional noise classification using 4 MFCC through a confusion matrix (SNR = db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng des-ops fac fac Hf office street white wind Average accuracy of noise classification: 95.77% Table 7 Result of the conventional noise classification using 4 MFCC through a confusion matrix (SNR = 5 db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng des-ops fac fac Hf office street white wind Average accuracy of noise classification: 95.75% and the overall quality using the scale of the mean opinion score (MOS), respectively. Also, S segsnr denotes the score by the segmental SNR. Tables 9 and present the results for background distortion and signal distortion. In particular, we added the experimental results using the test set of open noise types such as ambulance and truck noise, which were not part of the training set. In addition, we incorporated the results of the case in which the noises are perfectly classified as an ideal situation. This implies the limit of the performance when using the acoustic noise classification. As seen in the tables, the proposed algorithm yielded better performances

11 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Table 8 Comparison of the computational complexity. Module Method SEGSD Proposed Acoustic noise classification 7.84 (Feature extraction) (.4) (GMM-likelihood) (6.7) Noise suppression routine Total MIPS Table 9 Signal distortion (C sig ) result obtained from the proposed algorithm and the SEGSD method. Noise Method SNR (db) SEGSD Proposed Perfect classification babble destroyer-engine Hf-channel office ambulance truck than the conventional SEGSD method for every conditions. This results shows that the proposed method consistently results in superior performance compared to that of the SEGSD in terms of the residual noise and the signal distortion. Also, it can be seen that the proposed algorithm works well for the open noise sets, which implies that the proposed algorithm is not dependent on the training set. In particular, the performance difference between the proposed algorithm and the perfectly classified algorithm is almost same, which means that our noise classification algorithm provides superior and robust performances. On the other hand, Table shows the results of the overall speech quality through the composite measure. Based on the results, we can see that the proposed method yields better performances than do the previous method for all SNRs and given noise types incorporating the open noises. This finding is consistent with previous results, that show that our approach Table Background noise distortion (C bak ) result obtained from the proposed algorithm and the SEGSD method. Noise Method SNR (db) SEGSD Proposed Perfect classification babble destroyer-engine Hf-channel office ambulance truck Table Overall quality (C ovl ) result obtained from the proposed algorithm and the SEGSD method. Noise Method SNR (db) SEGSD Proposed Perfect classification babble destroyer-engine destroyeroperation destroyeroperation destroyeroperation Hf-channel office ambulance truck

12 488 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Table 2 PESQ result obtained from the proposed algorithm and the SEGSD method. Noise Method SNR (db) SEGSD Proposed Perfect classification babble destroyer-engine Hf-channel office ambulance truck Table 3 MOS test result obtained from the proposed algorithm and the SEGSD method (With 95% confidence interval). Environments Method Hypothesis Noise SNR (db) SEGSD Proposed SEGSD(ref.) babble ± ±. B 3.49 ± ±.2 B ± ±. B destroyer-engine ± ±.9 NW 2.66 ± ±.3 NW ± ±.4 B destroyeroperation destroyeroperation ± ±. B 3.83 ± ±. NW ± ±. NW Hf-channel ± ±.2 B ± ±.5 NW ± ±.4 B office ± ±. B ± ±. NW ± ±. B ambulance ± ±.9 NW ± ±.2 B ± ±.2 NW truck ± ±.4 NW 3.57 ± ±.9 B ± ±.3 B improves the qualities of both the speech signal and the background noise. In addition, we evaluated the performance in terms of the well-known objective quality measure, PESQ, which is recommended by the ITU-T for speech quality assessment of narrow-band telephony (ITU-T Rec. P. 862, 2) even though the PESQ measure 4 (a) 4 (b) Frequency [khz] 3 2 Frequency [khz] time (s) time (s) 4 (c) 4 (d) Frequency [khz] 3 2 Frequency [khz] time (s) time (s) Fig. 6. Speech spectrograms (destroyer-operation noise, SNR = 5 db): (a) clean speech; (b) noisy speech; (c) enhanced speech by the SEGSD; (d) enhanced speech by the proposed method.

13 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) is included as a basic element in the composite measure as a basic element. As shown in Table 2, the superiority of the proposed approach compared to that of the SEGSD method was illustrated. On the other hand, subjective listening tests have been performed in order to validate the objective performance evaluation. The listening tests were performed with ten listeners. Each listener scored a test file between one and five. The scale used for these tests corresponds to the MOS scale. The results are presented in Table 3, where a higher value is preferred. In addition, results of the corresponding hypothesis test against reference (SEGSD) are classified into three categorized: () better than (B), (2) not worse than, and (3) worse than (W) were given for checking the statistical significance (Chang and Kim, 2). Table 3 shows that the proposed method outperformed the conventional SEGSD method under the given noise environments and all SNRs. Also, subjective listening test supported by the statistical hypothesis test confirms that the proposed enhancement method leads to better or comparable results compared to those of the previous method even though the parameters are optimized based on the objective composite measure. Thus, it can be concluded that the proposed method improves the audible speech quality performance with the help of the acoustic noise classification. Finally, the speech spectrograms are presented in Fig. 6. Fig. 6(c) and (d) shows the spectrograms obtained with the SEGSD and the proposed algorithm, respectively. In the proposed method, the residual noise spectra are successfully reduced while preserving the speech spectra well. 5. Conclusion In this paper, we proposed a novel speech enhancement technique using environment-awareness provided by noise classification. The principal contribution of this work is the discovery of optimal points for the principal parameters in a statistical model-based speech enhancement, which enables performance improvement. In order to implement a frame-by-frame basis of the noise classification, the GMM-based likelihood is used. It should be noted that the GMM-based likelihood is updated only during the noise frames, which is classified by the SAP of each frame within the unified framework. The performance of the proposed approach was determined to be superior to that of the conventional technique based on extensive objective and subjective quality tests. Acknowledgements This work was supported by the IT R&D program of MKE/KEIT [29-S-36-, Development of New Virtual Machine Specification and Technology]. And, this work was supported by National Research Foundation of Korea (NRF) grant funded by the Korean Government (MEST) (NRF-2-982). This work was supported by the research fund of Hanyang University (HY-2-22) References Akbacak, M., Hansen, J., 27. Environmental sniffing: noise knowledge estimation for robust speech systems. IEEE Trans. Speech Audio Lang. Process. 5 (2), Boll, S.F., 979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27 (4), 3 2. Cappé, O., 994. Elimination of musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2 (2), Chang, J.-H., Kim, N.S., 2. Speech enhancement: new approaches to soft decision. IEICE Trans. Inform. Systems E84-D (9), Chang, J.-H., 25. Warped discrete cosine transform-based noisy speech enhancement. IEEE Trans. Circuit Systems 52 (9), Cohen, I., Berdugo, B., 22. Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Process. Lett. 9 (), 2 5. Ephraim, Y., Malah, D., 984. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (6), 9 2. Ephraim, Y., Malah, D., 985. Speech enhancement using a minimum mean-square error log spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. ASSP-33 (2), Hu, Y., Loizou, P., 26. Evaluation of objective measures for speech enhancement. In: Proc. Interspeech, pp Hu, Y., Loizou, P., 28. Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. ITU-T Rec. P. 862, 2. Perceptual evaluation of speech quality (PESQ), and objective meothod for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. Kim, N.S., Chang, J.-H., 2. Spectral enhancement based on global soft decision. IEEE Signal Process. Lett. 7 (5), 8. Kraft, F., Malkin, R., Schaaf, T., Waibel, A., 25. Temporal ICA for classification of acoustic events in a kitchen environment. In: Proc. Interspeech, pp Krishnamurthy, N., Hansen, J., 26. Noise update modeling for speech enhancement: when do we do enough? In: Proc. Interspeech, pp Ma, L., Milner, B.P., Smith, D., 26. Acoustic environment classification. ACM Trans. Speech Lang. Process. 3 (2), 22. Martin, R., 2. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9 (5), McAulay, R.J., Malpass, M.L., 98. Speech enhancement using a softdecision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. ASSP-28 (2), Park, Y.S., Chang, J.-H., 27. A novel approach to a robust a priori SNR estimator in speech enhancement. IEICE Trans. Comm. E9-B (8), Quackenbush, S., Barnwell, T., Clements, M., 988. Objective Measures of Speech Quality. Prentice-Hall, Englewood Cliffs, NJ. Sangwan, A., Krishnamurthy, N., Hansen, J., 27. Environmentally aware voice activity detector. In: Proc. Interspeech, pp Sim, B.L., Tong, Y.C., Chang, J.S., Tan, C.T., 998. A parametric formulation of the generalized spectral subtraction method. IEEE Trans. Acoust. Speech Signal Process. 6 (4), Sohn, J., Sung, W., 998. A voice activity detector employing soft decision based noise spectrum adaptation. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 98), Vol., pp

14 49 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Song, J.-H., Lee, K.-H., Chang, J.-H., Kim, J.K., Kim, N.S., 28. Analysis and improvement of speech/music classification for 3GPP2 SMV based on GMM. IEEE Signal Process. Lett. 5, 3 6. TIA/EIA/IS-27, 996. Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems. TMS32C55x., 22. TMS32C55x DSP library programmer s reference. TI Inc., Dallas, TX, USA. 3GPP2 Spec., 25. Software distribution for selectable mode vocoder (SMV), series option 56, specification. 3GPP2-C.R3-, v3..

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:

Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage: Signal Processing 9 (2) 55 6 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast communication Minima-controlled speech presence uncertainty

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

ARTICLE IN PRESS. Signal Processing

ARTICLE IN PRESS. Signal Processing Signal Processing 9 (2) 737 74 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast communication Double-talk detection based on soft decision

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Speech Enhancement for Nonstationary Noise Environments

Speech Enhancement for Nonstationary Noise Environments Signal & Image Processing : An International Journal (SIPIJ) Vol., No.4, December Speech Enhancement for Nonstationary Noise Environments Sandhya Hawaldar and Manasi Dixit Department of Electronics, KIT

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments 88 International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 88-87, December 008 Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise

More information

Speech Signal Enhancement Techniques

Speech Signal Enhancement Techniques Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION Changkyu Choi, Seungho Choi, and Sang-Ryong Kim Human & Computer Interaction Laboratory Samsung Advanced Institute of Technology

More information

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Sana Alaya, Novlène Zoghlami and Zied Lachiri Signal, Image and Information Technology Laboratory National Engineering School

More information

Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging

Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging 466 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 5, SEPTEMBER 2003 Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging Israel Cohen Abstract

More information

ANUMBER of estimators of the signal magnitude spectrum

ANUMBER of estimators of the signal magnitude spectrum IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011 1123 Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty Yang Lu and Philipos

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu

More information

Transient noise reduction in speech signal with a modified long-term predictor

Transient noise reduction in speech signal with a modified long-term predictor RESEARCH Open Access Transient noise reduction in speech signal a modified long-term predictor Min-Seok Choi * and Hong-Goo Kang Abstract This article proposes an efficient median filter based algorithm

More information

SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK

SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK 18th European Signal Processing Conference (EUSIPCO-2010) Aalborg, Denmar, August 23-27, 2010 SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK

More information

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Using a Mixture-Maximum Model IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE

More information

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain Speech Enhancement and Detection Techniques: Transform Domain 43 This chapter describes techniques for additive noise removal which are transform domain methods and based mostly on short time Fourier transform

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

AS DIGITAL speech communication devices, such as

AS DIGITAL speech communication devices, such as IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 4, MAY 2012 1383 Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay Timo Gerkmann, Member, IEEE,

More information

Estimation of Non-stationary Noise Power Spectrum using DWT

Estimation of Non-stationary Noise Power Spectrum using DWT Estimation of Non-stationary Noise Power Spectrum using DWT Haripriya.R.P. Department of Electronics & Communication Engineering Mar Baselios College of Engineering & Technology, Kerala, India Lani Rachel

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

STATISTICAL METHODS FOR THE ENHANCEMENT OF NOISY SPEECH. Rainer Martin

STATISTICAL METHODS FOR THE ENHANCEMENT OF NOISY SPEECH. Rainer Martin STATISTICAL METHODS FOR THE ENHANCEMENT OF NOISY SPEECH Rainer Martin Institute of Communication Technology Technical University of Braunschweig, 38106 Braunschweig, Germany Phone: +49 531 391 2485, Fax:

More information

INTERNATIONAL TELECOMMUNICATION UNION

INTERNATIONAL TELECOMMUNICATION UNION INTERNATIONAL TELECOMMUNICATION UNION ITU-T P.835 TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (11/2003) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Methods

More information

Integrated acoustic echo and background noise suppression technique based on soft decision

Integrated acoustic echo and background noise suppression technique based on soft decision Park and Chang EURASIP Journal on Advances in Signal Processing, : http://asp.eurasipjournals.com/content/// RESEARCH Open Access Integrated acoustic echo and background noise suppression technique based

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Impact Noise Suppression Using Spectral Phase Estimation

Impact Noise Suppression Using Spectral Phase Estimation Proceedings of APSIPA Annual Summit and Conference 2015 16-19 December 2015 Impact oise Suppression Using Spectral Phase Estimation Kohei FUJIKURA, Arata KAWAMURA, and Youji IIGUI Graduate School of Engineering

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

IN RECENT YEARS, there has been a great deal of interest

IN RECENT YEARS, there has been a great deal of interest IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 9 Signal Modification for Robust Speech Coding Nam Soo Kim, Member, IEEE, and Joon-Hyuk Chang, Member, IEEE Abstract Usually,

More information

Enhancement of Speech in Noisy Conditions

Enhancement of Speech in Noisy Conditions Enhancement of Speech in Noisy Conditions Anuprita P Pawar 1, Asst.Prof.Kirtimalini.B.Choudhari 2 PG Student, Dept. of Electronics and Telecommunication, AISSMS C.O.E., Pune University, India 1 Assistant

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

Codebook-based Bayesian speech enhancement for nonstationary environments Srinivasan, S.; Samuelsson, J.; Kleijn, W.B.

Codebook-based Bayesian speech enhancement for nonstationary environments Srinivasan, S.; Samuelsson, J.; Kleijn, W.B. Codebook-based Bayesian speech enhancement for nonstationary environments Srinivasan, S.; Samuelsson, J.; Kleijn, W.B. Published in: IEEE Transactions on Audio, Speech, and Language Processing DOI: 10.1109/TASL.2006.881696

More information

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM Mr. M. Mathivanan Associate Professor/ECE Selvam College of Technology Namakkal, Tamilnadu, India Dr. S.Chenthur

More information

Noise Reduction: An Instructional Example

Noise Reduction: An Instructional Example Noise Reduction: An Instructional Example VOCAL Technologies LTD July 1st, 2012 Abstract A discussion on general structure of noise reduction algorithms along with an illustrative example are contained

More information

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model Harjeet Kaur Ph.D Research Scholar I.K.Gujral Punjab Technical University Jalandhar, Punjab, India Rajneesh Talwar Principal,Professor

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 11, November 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Review of

More information

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Fundamental frequency estimation of speech signals using MUSIC algorithm

Fundamental frequency estimation of speech signals using MUSIC algorithm Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,

More information

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments G. Ramesh Babu 1 Department of E.C.E, Sri Sivani College of Engg., Chilakapalem,

More information

PROSE: Perceptual Risk Optimization for Speech Enhancement

PROSE: Perceptual Risk Optimization for Speech Enhancement PROSE: Perceptual Ris Optimization for Speech Enhancement Jishnu Sadasivan and Chandra Sehar Seelamantula Department of Electrical Communication Engineering, Department of Electrical Engineering Indian

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Phase estimation in speech enhancement unimportant, important, or impossible?

Phase estimation in speech enhancement unimportant, important, or impossible? IEEE 7-th Convention of Electrical and Electronics Engineers in Israel Phase estimation in speech enhancement unimportant, important, or impossible? Timo Gerkmann, Martin Krawczyk, and Robert Rehr Speech

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

IN REVERBERANT and noisy environments, multi-channel

IN REVERBERANT and noisy environments, multi-channel 684 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 6, NOVEMBER 2003 Analysis of Two-Channel Generalized Sidelobe Canceller (GSC) With Post-Filtering Israel Cohen, Senior Member, IEEE Abstract

More information

Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio

Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE 2014 1069 Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio Jae-Hun

More information

HUMAN speech is frequently encountered in several

HUMAN speech is frequently encountered in several 1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

EMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT

EMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT T-ASL-03274-2011 1 EMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT Navin Chatlani and John J. Soraghan Abstract An Empirical Mode Decomposition based filtering (EMDF) approach

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

More information

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation Md Tauhidul Islam a, Udoy Saha b, K.T. Shahid b, Ahmed Bin Hussain b, Celia Shahnaz

More information

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE 24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY 2009 Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation Jiucang Hao, Hagai

More information

Reliable A posteriori Signal-to-Noise Ratio features selection

Reliable A posteriori Signal-to-Noise Ratio features selection Reliable A eriori Signal-to-Noise Ratio features selection Cyril Plapous, Claude Marro, Pascal Scalart To cite this version: Cyril Plapous, Claude Marro, Pascal Scalart. Reliable A eriori Signal-to-Noise

More information

Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics

Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics 504 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 5, JULY 2001 Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics Rainer Martin, Senior Member, IEEE

More information

Long Range Acoustic Classification

Long Range Acoustic Classification Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Available online at   ScienceDirect. Procedia Computer Science 54 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 54 (2015 ) 574 584 Eleventh International Multi-Conference on Information Processing-2015 (IMCIP-2015) Speech Enhancement

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

COM 12 C 288 E October 2011 English only Original: English

COM 12 C 288 E October 2011 English only Original: English Question(s): 9/12 Source: Title: INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2009-2012 Audience STUDY GROUP 12 CONTRIBUTION 288 P.ONRA Contribution Additional

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate

More information

THE TELECOMMUNICATIONS industry is going

THE TELECOMMUNICATIONS industry is going IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 6, NOVEMBER 2006 1935 Single-Ended Speech Quality Measurement Using Machine Learning Methods Tiago H. Falk, Student Member, IEEE,

More information

OPTIMAL SPECTRAL SMOOTHING IN SHORT-TIME SPECTRAL ATTENUATION (STSA) ALGORITHMS: RESULTS OF OBJECTIVE MEASURES AND LISTENING TESTS

OPTIMAL SPECTRAL SMOOTHING IN SHORT-TIME SPECTRAL ATTENUATION (STSA) ALGORITHMS: RESULTS OF OBJECTIVE MEASURES AND LISTENING TESTS 17th European Signal Processing Conference (EUSIPCO 9) Glasgow, Scotland, August -, 9 OPTIMAL SPECTRAL SMOOTHING IN SHORT-TIME SPECTRAL ATTENUATION (STSA) ALGORITHMS: RESULTS OF OBJECTIVE MEASURES AND

More information

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Jong-Hwan Lee 1, Sang-Hoon Oh 2, and Soo-Young Lee 3 1 Brain Science Research Center and Department of Electrial

More information

Speech Enhancement in Noisy Environment using Kalman Filter

Speech Enhancement in Noisy Environment using Kalman Filter Speech Enhancement in Noisy Environment using Kalman Filter Erukonda Sravya 1, Rakesh Ranjan 2, Nitish J. Wadne 3 1, 2 Assistant professor, Dept. of ECE, CMR Engineering College, Hyderabad (India) 3 PG

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 666 676 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Comparison of Speech

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

ROBUST echo cancellation requires a method for adjusting

ROBUST echo cancellation requires a method for adjusting 1030 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk Jean-Marc Valin, Member,

More information

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal

NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA Qipeng Gong, Benoit Champagne and Peter Kabal Department of Electrical & Computer Engineering, McGill University 3480 University St.,

More information

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License Title Non-intrusive intelligibility prediction for Mandarin speech in noise Author(s) Chen, F; Guan, T Citation The 213 IEEE Region 1 Conference (TENCON 213), Xi'an, China, 22-25 October 213. In Conference

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

MULTICHANNEL systems are often used for

MULTICHANNEL systems are often used for IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 5, MAY 2004 1149 Multichannel Post-Filtering in Nonstationary Noise Environments Israel Cohen, Senior Member, IEEE Abstract In this paper, we present

More information

Adaptive Noise Reduction of Speech. Signals. Wenqing Jiang and Henrique Malvar. July Technical Report MSR-TR Microsoft Research

Adaptive Noise Reduction of Speech. Signals. Wenqing Jiang and Henrique Malvar. July Technical Report MSR-TR Microsoft Research Adaptive Noise Reduction of Speech Signals Wenqing Jiang and Henrique Malvar July 2000 Technical Report MSR-TR-2000-86 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 http://www.research.microsoft.com

More information

Modulation Domain Spectral Subtraction for Speech Enhancement

Modulation Domain Spectral Subtraction for Speech Enhancement Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH RECOGNITION

PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH RECOGNITION Journal of Engineering Science and Technology Vol. 12, No. 4 (2017) 972-986 School of Engineering, Taylor s University PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators 374 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 52, NO. 2, MARCH 2003 Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators Jenq-Tay Yuan

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments

Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments International Journal of Scientific & Engineering Research, Volume 2, Issue 5, May-2011 1 Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments Anuradha

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information