On using acoustic environment classification for statistical model-based speech enhancement
|
|
- Britney Lawson
- 5 years ago
- Views:
Transcription
1 Available online at Speech Communication 54 (22) On using acoustic environment classification for statistical model-based speech enhancement Jae-Hun Choi, Joon-Hyuk Chang School of Electrical Engineering, Hanyang University, Seoul 33-79, Republic of Korea Received 4 April 2; received in revised form 7 October 2; accepted 3 October 2 Available online November 2 Abstract In this paper, we present a statistical model-based speech enhancement technique using acoustic environment classification supported by a Gaussian mixture model (GMM). In the data training stage, the principal parameters of the statistical model-based speech enhancement algorithm such as the weighting parameter in the decision-directed (DD) method, the long-term smoothing parameter of the noise estimation, and the control parameter of the minimum gain value are uniquely set as optimal operating points according to the given noise information to ensure the best performance for each noise. These optimal operating points, which are specific to the different background noises, are estimated based on the composite measures, which are the objective quality measures representing the highest correlation with the actual speech quality processed by noise suppression algorithms. In the on-line environment-aware speech enhancement step, the noise classification is performed on a frame-by-frame basis using the maximum likelihood (ML)-based Gaussian mixture model (GMM). The speech absence probability (SAP) is used to detect the speech absence periods and to update the likelihood of the GMM. According to the classified noise information for each frame, we assign the optimal values to the aforementioned three parameters for speech enhancement. We evaluated the performances of the proposed methods using objective speech quality measures and subjective listening tests under various noise environments. Our experimental results showed that the proposed method yields better performances than does a conventional algorithm with fixed parameters. Ó 2 Elsevier B.V. All rights reserved. Keywords: Speech enhancement; Noise classification; Gaussian mixture model; DFT. Introduction Speech enhancement is a fundamental part of speech processing because environmental background noise drastically degrades the performances of processing systems (Boll, 979; Sim et al., 998; McAulay and Malpass, 98; Ephraim and Malah, 984, 985). Among the many approaches that have been developed to enhance speech, the spectral subtraction has been shown to be effective in suppressing stationary noise (Boll, 979). However, this technique is limited in its ability to deal with noise such as musical noise, which is characterized by tones with random noises, especially nonstationary noises. To avoid the Corresponding author. Tel.: ; fax: address: jchang@hanyang.ac.kr (J.-H. Chang). typical artifacts in a practical speech enhancement system, we should consider two major components of noise power estimation and uncorrupted speech estimation (Sim et al., 998; McAulay and Malpass, 98; Ephraim and Malah, 984, 985; Cappé, 994; Park and Chang, 27; Martin, 2; Sohn and Sung, 998; Cohen and Berdugo, 22). In the estimation of speech, Ephraim and Malah derived the minimum mean-square error (MMSE) estimator, which is very efficient at reducing musical noise phenomena (Ephraim and Malah,984, 985; Cappé, 994). Other spectral weighting tasks such as Wiener filtering, maximum a posteriori and MMSE log-spectral amplitude criteria have been considered (Sim et al., 998; McAulay and Malpass, 98; Ephraim and Malah, 984, 985). These algorithms are further enhanced through the use of a soft decision scheme in which the speech absence probability (SAP) is /$ - see front matter Ó 2 Elsevier B.V. All rights reserved. doi:.6/j.specom.2..9
2 478 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) derived based on the likelihood ratio test (LRT) and used for gain modification. Actually, the spectral gain is modified by the SAP, which is estimated for each frequency bin and each frame. The SAP based on the statistical model of speech is generally computed with the help of an a priori SNR, which is estimated using the non-linear recursive procedure, called the decision-directed (DD) approach (Ephraim and Malah, 984). The a priori SNR determined by the DD rule takes into account the current short-time frame, with a fixed weight ( a) and the processing output in the previous frame, with a weight a. Note that the parameter a should be carefully set since it has substantial control over the trade-off between the degree of smoothing in the a priori SNRs in a noisy area and the acceptable level of transient distortion in the signal. In contrast to the conventional DD estimator, which has a fixed weight factor, the adaptive weight factor determined by the deviation of the a posteriori SNR is proposed in (Park and Chang, 27). Unfortunately, this estimator interacts with the estimated SNR and does not consider a wide variety of noise conditions. In view of the noise power estimation, minimum statistics (MS) obtain the noise estimate as the minima values from a smoothed power estimate of the noisy signal (Martin, 2). The MS method is motivated by the observation that the power of a noisy speech signal frequently reduces to the power level of the noise signal. This method is known to be sensitive to outliers, is generally biased, and has a variance that is about twice as large as that of a conventional noise estimator (Cohen and Berdugo, 22). On the other hand, the aforementioned soft decision has been applied to the noise power estimation module by adopting the SAP in the long-term smoothed power spectrum of the background noise (Sohn and Sung, 998; Kim and Chang, 2; Chang and Kim, 2). In (Cohen and Berdugo, 22), the noise power estimate is updated during periods of speech absence, and speech presence. Considering both the noise power estimation and the estimation of speech, most speech enhancement algorithms are packaged with tunable parameters that substantially affect their performance. For example, the weight parameter in the DD approach could be tuned by the off-line knowledge of the acoustic background noise. Actually, the environmental sniffing framework proposed by Akbacak and Hansen shows improvement in speech recognition in a car environment (Akbacak and Hansen, 27). Krishnamurthy and Hansen further improved the performance of speech enhancement technique by providing a more accurate estimate of the noise update rate for a given environment (Krishnamurthy and Hansen, 26). An environmentally-aware voice activity detector, used in the method of Sangwan et al. (27), is based on an accurate noise model by employing the support vector machine (SVM). Regarding the classification of acoustic environments, numerous studies have been conducted for context-aware applications (Ma et al., 26; Kraft et al., 25). In this paper, we propose a novel speech enhancement approach using acoustic noise classification. Practically, statistical model-based speech enhancement is considered to be a target platform in which the SAP is derived based on the LRT by employing the DD method for the estimation of the a priori SNR and is used to modify the spectral gain and update the noise power. First, we identify the optimal points of the principal parameters such as the weight parameter of the DD approach, the long-term smoothing parameter, and the control parameter of the MMSE gain function for a wide variety of noise environments. This is achieved with the help of the composite measure, which is known to be relevant in estimating actual speech quality (Hu and Loizou, 26, 28). Secondly, we perform the noise classification on a frame-by-frame basis to recognize the noise type of the current frame. Indeed, a Gaussian mixture model (GMM)-based maximum likelihood (ML) estimation is used for noise classification during speech absence only and is performed using the SAP. Feature vectors applied to the GMM are carefully selected from the relevant parameters of the 3GPP2 selectable mode vocoder (SMV), as in (Song etal., 28; 3GPP2- C.R3-, v3., 25). Subsequently, we organize this noise knowledge in each frame to assign the optimal values for the three parameters which gives the best performance for a specific type of underlying additive noise. Specifically, our approach responds quickly to noise variation since the running average is used to track evolving noise. Based on a number of experiments, the proposed speech enhancement technique is found to yield a better performance than the conventional approach with fixed parameters. The rest of the paper is organized as follows. Section 2 briefly reviews the soft decision-based speech enhancement technique and Section 3 contains the proposed algorithm. Section 4 describes the experimental setup and results in detail and conclusion s are presented in Section Review of soft decision based-speech enhancement Let x(n) andd(n) denote clean speech and uncorrelated additive noise signals, respectively. The observed noisy speech signal y(n) is the sum of a clean speech signal x(n) and noise d(n), where n is a discrete-time index. By taking a discrete Fourier transform (DFT), we then have Y k ðtþ ¼X k ðtþþd k ðtþ; ðþ where k (=,2,...,K) is the frequency bin and t is the frame index. Given two hypotheses, H and H which indicate speech absence and presence, respectively, we assume that H : speech absent : Y k ðtþ ¼D k ðtþ; H : speech present : Y k ðtþ ¼X k ðtþþd k ðtþ: Assuming that the clean speech X k (t) and the additive noise D k (t) are statistically independent and that noisy spectral components are characterized by zero-mean complex Gaussian distributions, the probability density functions (PDF s) conditioned on the two hypotheses of H and H are given by ð2þ
3 ( ) pðy k ðtþjh Þ¼ pk d;k ðtþ exp jy kðtþj 2 ; ð3þ k d;k ðtþ ( ) pðy k ðtþjh Þ¼ pðk x;k ðtþþk d;k ðtþþ exp jy k ðtþj 2 ; ð4þ k x;k ðtþþk d;k ðtþ where k x,k (t) and k d,k (t) denote the variances of the clean speech and noise for the kth spectral component at the tth frame, respectively (Kim and Chang, 2). For the soft decision, the global SAP (GSAP) p(h jy(t)) conditioned on the current observations is derived such that pðh jy ðtþþ ¼ ¼ pðy ðtþjh ÞpðH Þ pðy ðtþjh ÞPðH ÞþpðYðtÞjH ÞPðH Þ þ PðH Q Þ K PðH Þ k¼ KðY kðtþþ ; where P(H )=( P(H )) is the a priori probability of speech absence. Also, substituting (3) and (4) into (5), the likelihood ratio K(Y k (t)) at the kth frequency is expressed as follows (Kim and Chang, 2): KðY k ðtþþ ¼ pðy kðtþjh Þ pðy k ðtþjh Þ ¼ þ n k ðtþ exp c kðtþn k ðtþ ; ð6þ þ n k ðtþ J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) ð5þ where ^k d;k ðtþ is the estimate for k d,k (t) andf d (=.99) is a smoothing parameter under a general stationary assumption of D k (t) (Kim and Chang, 2). Taking into account the uncertainty for speech absence or presence, the GSAP is applied to the expectation for the power spectrum of a noise signal as shown below: E½jD k ðtþj 2 jy k ðtþš ¼ E½jD k ðtþj 2 jy k ðtþ; H ŠpðH jy ðtþþ þ E½jD k ðtþj 2 jy k ðtþ; H ŠpðH jy ðtþþ: ðþ Let bx k ðtþ represent the estimated clean speech spectrum at the kth frequency bin and in the tth frame. In general, in the speech enhancement techniques, bx k ðtþ is estimated by applying a spectral gain to each spectral component of the input noisy spectrum. For the effective reduction of the musical noise phenomenon, we adopt the MMSEbased noise suppression rule proposed by Ephraim and Malah (984), as follows: bx k ðtþ ¼maxfGðn k ðtþ; c k ðtþþ; G min gy k ðtþ; ð2þ where G min is the minimum gain to control the perceived noise and G(,) denotes the actual noise suppression gain given by pffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p n Gðn; cþ ¼ F 2 cð þ nþ cn þ n ; ð3þ where the a posteriori signal-to-noise ratio (SNR) c k (t) and the a priori SNR n k (t) are defined by c k ðtþ jy kðtþj 2 k d;k ðtþ ; ð7þ n k ðtþ k x;kðtþ k d;k ðtþ : ð8þ Also, if ^n k ðtþ and ^c k ðtþ are the estimates for n k (t) and c k (t), respectively, ^n k ðtþ can be estimated using the well-known decision-directed (DD) approach (Ephraim and Malah, 984) as follows: j bx k ðt Þj ^n 2 k ðtþ a n ^k d;k ðt Þ þð a nþc½^c k ðtþ Š; ð9þ where bx k ðt Þ represents the estimated clean speech spectrum in the previous frame and C[x] =x if x P, and C[x] = otherwise. Here, a n ( 6 a n 6 ) is a weighting factor that controls the trade-off between the noise reduction and the transient signal distortion by being chosen empirically close to (i.e., a n =.99). Also, ^c k ðtþ is directly obtained by the ratio of the input power jy k (t)j 2 and the estimate of k d,k (t). On the other hand, the estimation of the noise power spectrum is a major component in speech enhancement. In particular, the soft decision method adopts a long-term smoothed noise power spectrum of the background noise as the estimate for k d,k (t) as follows (Kim and Chang, 2): ^k d;k ðt þ Þ ¼f d^kd;k ðtþþð f d ÞE½jD k ðtþj 2 jy k ðtþš; ðþ with F ½mŠ ¼exp m h m m i ð þ mþi þ mi ; ð4þ in which I and I being the modified Bessel functions of zero and first order, respectively. Notice that G min should be carefully set since it controls the trade-off between residual noise signal and the musical effect. As a similar value in (TIA/EIA/IS-27, 996),.248 is used as a fixed value in (Kim and Chang, 2). 3. Proposed environment-aware speech enhancement In the previous section, we describe that principal parameters of the soft decision-based speech enhancement technique as in (Kim and Chang, 2), such as the weight a n in the DD approach, the long-term smoothing parameter f d in the noise power estimation, and the minimum gain parameter G min are fixed values. Since, however, those parameters should be varied according to the noise type to ensure the best performance, we organize the environmental knowledge associated with noise to adaptively select parameters in speech enhancement. The overall environment-aware speech enhancement based on the noise classification employing the GMM is shown in Fig.. In the following subsections, we describe each part of the proposed algorithm in more detail. 3.. Finding optimal operating points for given noises The operating points of a n, f d, and G min according to specific noises should be built based on a relevant criterion
4 48 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Fig.. Overall block diagram of the proposed environment-aware speech enhancement. in terms of speech quality. The most accurate way to evaluate speech quality can be achieved through an exhaustive subjective listening test. However, since such tests are costly and time-consuming, we adopt the well-known composite measure in (Hu and Loizou, 26, 28) to measure overall speech quality depending on the parameter variations. The composite measure for overall quality, C ovl is expressed by combining basic objective measures to form a new measure, as following: C ovl ¼ :594 þ :85S PESQ :52S LLR :7S WSS ; ð5þ where S PESQ, S LLR, and S WSS denote the scores according to the perceptual evaluation of speech quality (PESQ), the log-likelihood ratio (LLR), and the weighted-slope spectral distance (WSS), respectively (ITU-T Rec. P. 862, 2; Quackenbush et al., 988). It is known from (Hu and Loizou, 28) that the composite measure has a significant correlation with the overall perceptual speech quality such as the mean opinion score (MOS). We then prepared 6 speech samples taken from the NTT database that consist of speech material from four male and four female speakers and that are 8 sec in duration. In order to create noisy environments, we applied twelve different noise types that included babble, car, car2, destroyer-engine, destroyeroperation, factory, factory2, HF-channel, office, street, white, wind noises to the clean speech data at SNR levels of 5,, and 5 db. The speech enhancement technique in (Kim and Chang, 2) was applied to these noisy speech sentences and the parameters were varied. Based on the enhanced speech signal, we first investigated the performance of C ovl by varying a n and f d in a way of the graphical curve for clear understanding. For each noise type, we obtained the 3D mesh curve as a function of the various values of a n and f d as shown in Fig. 2. Based on the data in Fig. 2, we discovered that the four different points indicated by the arrows respectively represent the optimal points in terms of C ovl in the cases of babble, factory, HF-channel, and office noise. By repeating this procedure and incorporating G min as an additional parameter to be optimized, we obtained the unique points (a n ; f d, and G minþ for the given noise types as shown in Table. As shown in the table, the different parameters were chosen according to different noises at the optimal operating points. Note that the variations in the points that depend on the input SNRs are small. This observation tells us that these points can be suitably applied without an additional SNR estimation On-line acoustic noise classification employing Gaussian mixture model As described in the previous subsection, the optimal operating points of the principal parameters for various noise types are obtained at the off-line step. For the realtime implementation the optimal point determination in the time-varying noise condition, we should classify the noise signal on a frame-by-frame basis during speech pauses. To achieve a successful classification, a feature vector that effectively characterizes the discrimination among the various noise environments must be chosen. As given in (3GPP2-C.R3-, v.3., 25) we select a 4-dimensional feature vector which includes ten linear predictive coding (LPC) coefficients, the energy, the partial residual energy, the running mean of energy and the running mean of the partial residual energy due to their superior classification performance. In Fig. 3, the normalized distribution of selected feature vectors according to noise is presented demonstrating that the multi-modal characteristics of the selected features can be successfully modeled using the GMM. For the GMM with the feature vectors ~x ¼fx ; x 2 ;...; x N g, the Gaussian mixture density of a weighted sum of M mixture components is written as follows:
5 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) (a) (b) C ovl α ζ ζ d C ovl α ζ ζ d (c) 2.96 (d) C ovl C ovl α ζ ζ d α ζ ζ d Fig. 2. 3D mesh curve for the estimated optimal operating point: (a) babble noise (SNR = 5 db); (b) factory noise (SNR = db); (c) HF-channel noise (SNR = 5 db); (d) office noise (SNR = db). Table Optimal operating points of a n ; f d, and G min for various noise types. Noise type Optimal points a n f d G min babble car car destroyer-engine destroyer-operation factory factory HF-channel office street white wind pð~xjkþ ¼ XM a i p i ð~xþ; i¼ ð6þ where p i ð~xþ and a i denote, respectively, a Gaussian distribution and the weight of the ith Gaussian mixture defined by: p i ð~xþ ¼ exp 2p N 2jR i j 2 2 ð~x l iþ T R i ð~x l i Þ ; ð7þ X M a i ¼ : ð8þ i¼ Based on this, each noise is modeled by the GMM parameter (k) which comprises the mixture weight p i, the mean vector l i, and the covariance matrix R i. In the noise classification, each noise is characterized by the GMM, i.e., k s where s = (babble), 2 (car), 3 (car2), 4 (destroyer-engine), 5 (destroyer-operation), 6 (factory), 7 (factory2), 8 (HF-channel), 9 (office), (street), (white), 2 (wind), or 3 (universal background model). For this, we used 6 for the number of the mixture based on the trade-off between the performance and the additional computational load. Actually, dependence on the mixture order was marginal as for M P 6. Based on the established model, the objective is to identify the noise model with the maximum a posteriori probability for the input feature vector ~xðtþ. Specifically, we determine the noise model (s) with the maximum a posteriori probability on a current frame assuming equally likely noises such that ^sðtþ ¼arg max log ^pðk s j~xðtþþ: ð9þ s¼;2;...;3 As shown in the flow diagram for noise classification based on the GMM in Fig. 4, the likelihoods of the GMM for individual noise are constructed during the initial ten frames. Once we achieve the GMM likelihood for each noise, the likelihoods are updated frame-by-frame during noise-only periods in our approach, which is a major contribution of this work. For this, we use the long-term smoothed likelihood incorporating the SAP to prevent the likelihood update during speech periods as following: log ^pð~xðtþjk s Þ¼pðH jy ðtþþfb log pð~xðt Þjk s Þ þð bþ log pð~xðtþjk s Þg þ ð pðh jy ðtþþþ log pð~xðt Þjk s Þ; ð2þ
6 482 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) (a) babble car car2 des eng des ops factory Frame Energy factory2 HF channel office street.5 white wind Frame Energy.5 (c) babble car car2 des eng des ops factory Running Mean Energy factory2 HF channel office street.5 white wind Running Mean Energy.5 (b) babble car car2 des eng des ops factory Partial Residual Energy factory2 HF channel office street.5 white wind Partial Residual Energy.5 (d) babble car car2 des eng des ops factory Running Mean of the Partial Residual Energy factory2 HF channel office street.5 white wind Running Mean of the Partial Residual Energy Fig. 3. Normalized distributions of the adopted feature vectors for noise classification: (a) frame energy; (b) partial residual energy; (c) running mean energy; (d) running mean of the partial residual energy. where b (=.985) is the smoothing parameter. Indeed, the misadaptation of the likelihoods during speech presence may result in a failure of noise classification. To address this problem, we employ the SAP counter, which counts the number of successive noise-only frames, where the likelihoods are updated according to (2) only when the SAP counter is greater than a given threshold (i.e., 3). Actually, the background noise can be successfully classified according to the GMM, as displayed in Fig. 5. As can be seen from the classification result after t = 4 s, classification is slightly delayed due to the SAP counter and the long-term smoothing of the likelihood Acoustic noise classification-based speech enhancement Using the classified noise information ^sðtþ on the current frame, three key parameters a n, f d, and G min are substituted in every frame with a n ; f d, and G min, respectively, based on Table. Accordingly, the proposed ^nðt; kþ becomes ^n k ðtþ ¼^a n ðtþ j bx k ðt Þj 2 ^k d;k ðt Þ þ ^a n ðtþ C½^ck ðtþ Š: ð2þ This time, ^a nðtþ is obtained using the long-term smoothing to prevent an abrupt change in a n and to ensure robust performance as follows: ^a n ðtþ ¼j a^a n ðt Þþð j aþ^a n ðtþ; ð22þ with j a (=.9) as a smoothing parameter. Also, the estimation of the noise power is then changed using f d such that ^k d;k ðtþ ¼^f d ðtþ^k d;k ðt Þþ ^f d ðtþ E½jD k ðtþj 2 jy k ðtþš; in which ð23þ ^f d ðtþ ¼j f^f d ðt Þþð j fþ^f d ðtþ; ð24þ with a smoothing parameter j f (=.9). As a result, the soft decision-based speech enhancement is finally achieved using (2) and (23). Based on the newly derived ^n p;k ðtþ and, ^c k ðtþ, the cleans speech spectrum bx k ðtþ is obtained using the aforementioned MMSE-based spectral gain such that bx k ðtþ ¼Gð^n k ðtþ; ^c k ðtþþy k ðtþ: ð25þ From the soft decision as in (2), it is known that the noise suppression rule G(,) is modified by egð; Þ which incorporates the SAP, as given by egð^n k ðtþ; ^c k ðtþþ ¼ ð pðh jy k ðtþþþgð^n k ðtþ; ^c k ðtþþ; ð26þ when deriving the suppression gain, the lower limit G min of the spectral gain should be chosen to minimize the
7 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Fig. 4. Block diagram of the noise classification of the GMM. disturbing residual noise and the speech signal distortion (TIA/EIA/IS-27, 996) as follows: eg k ðtþ ¼maxf egð^n k ðtþ; ^c k ðtþþ; G min ðtþg; ð27þ where max{} is a maximum operator. As shown in (27), a higher minimum gain value results in increased residual noise. In contrast, as the minimum gain approaches to zero, the residual noise is minimized, causing the speech distortion. Thus, it is obvious that the minimum gain G min should be carefully chosen. However, in (TIA/EIA/IS-27, 996), a fixed value (=.248) is used, which is not reasonable when considering the various noise types. Therefore, we adopt G min given by Table according to the classified noise type, where we obtain the final suppression gain as follows: eg k ðtþ ¼maxf egð^n k ðtþ; ^c k ðtþþ; G min ðtþg: ð28þ Accordingly, the residual noise is adjusted based on the noise information, which is clearly different from the approach of the previous method (Kim and Chang, 2). 4. Experiments and results The proposed environment-aware speech enhancement technique using noise classification was evaluated with objective speech quality measures and subjective listening tests. The experimental procedures are divided into the performance evaluation of noise classification and the noise suppression performance for comparison of the proposed algorithm with the conventional method (the speech enhancement based on global soft decision, denoted by SEGSD method) (Kim and Chang, 2). First, in order to evaluate the performance of the noise classification, different data set were used for training and testing. For noise classification using the GMM, test files of speech comprised 456 s long speech data, which was provided by four male and four female speakers and sampled at 8 khz. We did reference decision on clean speech files by manual labeling every -ms frame. Using the handmarked speech files, we divided the test material into only
8 484 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) GSAP speech waveform probability (a) (b) (c) babble noise car noise time(s) 2 s (d) time(s) Fig. 5. Result of noise classification based on GMM: (a) the GSAP; (b) clean speech waveform; (c) noisy speech corrupted by the babble and car noise (SNR = 5 db); (d) result of noise classification. noise-only frames and active speech frames, respectively. The hand-marked test material, included 57.% active speech frames that consisted of 44.% voiced sounds and 3.% unvoiced sounds. To simulate various noise environments, the aforementioned 2 noise sources which were different from those in the training data set, were added to the clean speech data at 5,, 5 db SNRs. The test data included phrases from the NTT database (Chang, 25), spoken by four male and four female speakers. In the NTT database consisting of 96 phrases, each phrase included two different meaningful sentences and each file lasted 8 s. We added the aforementioned various noises to the clean speech signal at different SNRs of 5,, 5 db. We first investigated the performance of the noise classification technique used in the enhancement method compared to the conventional method in (Ma et al., 26). Since we know the noise-only period through the handlabeled information, we measured the detection probabilities (P d ) for each frame of the noise period. For given test files, the performance of the proposed algorithm is shown in Tables 2 4 in the form of a confusion matrix. In addition, the confusion matrices of the conventional method are given in Tables 5 7 for performance comparison. These results show that the noise classification algorithms result in high accuracy (>96%) for given noise at all SNRs. This observation demonstrates that the noise classification technique in our approach is suitable for environmental discrimination for speech enhancement. Note that the performance differences were negligible for the different SNRs, implying that the proposed noise classification provides robust performance in the presence of SNR variation. On the other hand, the conventional method employing the 4th order mel-frequency cepstral coefficient (MFCC) gave us the average accuracy of 95.37% (SNR = 5 db), 95.77% (SNR = db), and 95.75% (SNR = 5 db) whose values are less those of our approach. This indicates that the proposed noise classification technique in the speech enhancement framework algorithm is acceptable. Also, we investigated the computational complexity of the SEGSD and the proposed method for comprehension of the additional computational burden. In this regard, Table 8 shows a summary of the computational complexity in terms of the MIPS claimed by each algorithm. In particular, the MIPS based on the proposed method is divided by two parts such as the noise classification and the speech enhancement part for clear comparison. Note that this computational step is eventually based on the TMS32C55x (TMS32C55x, 22). The results show that the noise classification to classify 3 different noises in the proposed method requires MIPS, which is an additional computation load. But, it is expected to reduce the additional load with minimal performance degradation if we combine similar noise types into a single noise case (e.g., car and car2 are grouped into vehicle) by considering a commercial application. Next, for the comparison of the proposed speech enhancement method with the conventional soft-decision algorithm, which uses fixed smoothing parameters, we
9 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Table 2 Result of the noise classification through a confusion matrix (SNR = 5 db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng.. des-ops fac fac Hf. office street white. wind. Average accuracy of noise classification: 96.73% Table 3 Result of noise classification through a confusion matrix (SNR = db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng.. des-ops.. fac fac Hf. office street white. wind. Average accuracy of noise classification: 97.75% Table 4 Result of noise classification through a confusion matrix (SNR = 5 db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng.. des-ops fac fac Hf. office street white. wind. Average accuracy of noise classification: 96.83% adopted the composite measures to objectively evaluate the speech quality as a combination of various representative objective quality measures. Specifically, the composite measures consisted of signal distortion (C sig ), background noise distortion (C bak ), and overall quality (C ovl ), as defined in (Hu and Loizou, 26, 28): C sig ¼ 3:93 :29 S LLR þ :63 S PESQ :9 S WSS ; C bak ¼ :634 þ :478 S PESQ :7 S WSS þ :63 S segsnr ; C ovl ¼ :594 þ :85 S PESQ :52 S LLR :7 S WSS ; ð29þ ð3þ ð3þ where C sig, C bak,andc ovl denote a five-point scale of signal distortion, a five-point scale of background intrusiveness,
10 486 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Table 5 Result of the conventional noise classification using 4 MFCC through a confusion matrix (SNR = 5 db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng des-ops fac fac Hf office street white wind Average accuracy of noise classification: 95.37% Table 6 Result of the conventional noise classification using 4 MFCC through a confusion matrix (SNR = db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng des-ops fac fac Hf office street white wind Average accuracy of noise classification: 95.77% Table 7 Result of the conventional noise classification using 4 MFCC through a confusion matrix (SNR = 5 db). Accuracy bab car car2 des-eng. des-ops. fac fac2 Hf office street white wind UBM bab car car des-eng des-ops fac fac Hf office street white wind Average accuracy of noise classification: 95.75% and the overall quality using the scale of the mean opinion score (MOS), respectively. Also, S segsnr denotes the score by the segmental SNR. Tables 9 and present the results for background distortion and signal distortion. In particular, we added the experimental results using the test set of open noise types such as ambulance and truck noise, which were not part of the training set. In addition, we incorporated the results of the case in which the noises are perfectly classified as an ideal situation. This implies the limit of the performance when using the acoustic noise classification. As seen in the tables, the proposed algorithm yielded better performances
11 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Table 8 Comparison of the computational complexity. Module Method SEGSD Proposed Acoustic noise classification 7.84 (Feature extraction) (.4) (GMM-likelihood) (6.7) Noise suppression routine Total MIPS Table 9 Signal distortion (C sig ) result obtained from the proposed algorithm and the SEGSD method. Noise Method SNR (db) SEGSD Proposed Perfect classification babble destroyer-engine Hf-channel office ambulance truck than the conventional SEGSD method for every conditions. This results shows that the proposed method consistently results in superior performance compared to that of the SEGSD in terms of the residual noise and the signal distortion. Also, it can be seen that the proposed algorithm works well for the open noise sets, which implies that the proposed algorithm is not dependent on the training set. In particular, the performance difference between the proposed algorithm and the perfectly classified algorithm is almost same, which means that our noise classification algorithm provides superior and robust performances. On the other hand, Table shows the results of the overall speech quality through the composite measure. Based on the results, we can see that the proposed method yields better performances than do the previous method for all SNRs and given noise types incorporating the open noises. This finding is consistent with previous results, that show that our approach Table Background noise distortion (C bak ) result obtained from the proposed algorithm and the SEGSD method. Noise Method SNR (db) SEGSD Proposed Perfect classification babble destroyer-engine Hf-channel office ambulance truck Table Overall quality (C ovl ) result obtained from the proposed algorithm and the SEGSD method. Noise Method SNR (db) SEGSD Proposed Perfect classification babble destroyer-engine destroyeroperation destroyeroperation destroyeroperation Hf-channel office ambulance truck
12 488 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Table 2 PESQ result obtained from the proposed algorithm and the SEGSD method. Noise Method SNR (db) SEGSD Proposed Perfect classification babble destroyer-engine Hf-channel office ambulance truck Table 3 MOS test result obtained from the proposed algorithm and the SEGSD method (With 95% confidence interval). Environments Method Hypothesis Noise SNR (db) SEGSD Proposed SEGSD(ref.) babble ± ±. B 3.49 ± ±.2 B ± ±. B destroyer-engine ± ±.9 NW 2.66 ± ±.3 NW ± ±.4 B destroyeroperation destroyeroperation ± ±. B 3.83 ± ±. NW ± ±. NW Hf-channel ± ±.2 B ± ±.5 NW ± ±.4 B office ± ±. B ± ±. NW ± ±. B ambulance ± ±.9 NW ± ±.2 B ± ±.2 NW truck ± ±.4 NW 3.57 ± ±.9 B ± ±.3 B improves the qualities of both the speech signal and the background noise. In addition, we evaluated the performance in terms of the well-known objective quality measure, PESQ, which is recommended by the ITU-T for speech quality assessment of narrow-band telephony (ITU-T Rec. P. 862, 2) even though the PESQ measure 4 (a) 4 (b) Frequency [khz] 3 2 Frequency [khz] time (s) time (s) 4 (c) 4 (d) Frequency [khz] 3 2 Frequency [khz] time (s) time (s) Fig. 6. Speech spectrograms (destroyer-operation noise, SNR = 5 db): (a) clean speech; (b) noisy speech; (c) enhanced speech by the SEGSD; (d) enhanced speech by the proposed method.
13 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) is included as a basic element in the composite measure as a basic element. As shown in Table 2, the superiority of the proposed approach compared to that of the SEGSD method was illustrated. On the other hand, subjective listening tests have been performed in order to validate the objective performance evaluation. The listening tests were performed with ten listeners. Each listener scored a test file between one and five. The scale used for these tests corresponds to the MOS scale. The results are presented in Table 3, where a higher value is preferred. In addition, results of the corresponding hypothesis test against reference (SEGSD) are classified into three categorized: () better than (B), (2) not worse than, and (3) worse than (W) were given for checking the statistical significance (Chang and Kim, 2). Table 3 shows that the proposed method outperformed the conventional SEGSD method under the given noise environments and all SNRs. Also, subjective listening test supported by the statistical hypothesis test confirms that the proposed enhancement method leads to better or comparable results compared to those of the previous method even though the parameters are optimized based on the objective composite measure. Thus, it can be concluded that the proposed method improves the audible speech quality performance with the help of the acoustic noise classification. Finally, the speech spectrograms are presented in Fig. 6. Fig. 6(c) and (d) shows the spectrograms obtained with the SEGSD and the proposed algorithm, respectively. In the proposed method, the residual noise spectra are successfully reduced while preserving the speech spectra well. 5. Conclusion In this paper, we proposed a novel speech enhancement technique using environment-awareness provided by noise classification. The principal contribution of this work is the discovery of optimal points for the principal parameters in a statistical model-based speech enhancement, which enables performance improvement. In order to implement a frame-by-frame basis of the noise classification, the GMM-based likelihood is used. It should be noted that the GMM-based likelihood is updated only during the noise frames, which is classified by the SAP of each frame within the unified framework. The performance of the proposed approach was determined to be superior to that of the conventional technique based on extensive objective and subjective quality tests. Acknowledgements This work was supported by the IT R&D program of MKE/KEIT [29-S-36-, Development of New Virtual Machine Specification and Technology]. And, this work was supported by National Research Foundation of Korea (NRF) grant funded by the Korean Government (MEST) (NRF-2-982). This work was supported by the research fund of Hanyang University (HY-2-22) References Akbacak, M., Hansen, J., 27. Environmental sniffing: noise knowledge estimation for robust speech systems. IEEE Trans. Speech Audio Lang. Process. 5 (2), Boll, S.F., 979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27 (4), 3 2. Cappé, O., 994. Elimination of musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2 (2), Chang, J.-H., Kim, N.S., 2. Speech enhancement: new approaches to soft decision. IEICE Trans. Inform. Systems E84-D (9), Chang, J.-H., 25. Warped discrete cosine transform-based noisy speech enhancement. IEEE Trans. Circuit Systems 52 (9), Cohen, I., Berdugo, B., 22. Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Process. Lett. 9 (), 2 5. Ephraim, Y., Malah, D., 984. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (6), 9 2. Ephraim, Y., Malah, D., 985. Speech enhancement using a minimum mean-square error log spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. ASSP-33 (2), Hu, Y., Loizou, P., 26. Evaluation of objective measures for speech enhancement. In: Proc. Interspeech, pp Hu, Y., Loizou, P., 28. Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. ITU-T Rec. P. 862, 2. Perceptual evaluation of speech quality (PESQ), and objective meothod for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. Kim, N.S., Chang, J.-H., 2. Spectral enhancement based on global soft decision. IEEE Signal Process. Lett. 7 (5), 8. Kraft, F., Malkin, R., Schaaf, T., Waibel, A., 25. Temporal ICA for classification of acoustic events in a kitchen environment. In: Proc. Interspeech, pp Krishnamurthy, N., Hansen, J., 26. Noise update modeling for speech enhancement: when do we do enough? In: Proc. Interspeech, pp Ma, L., Milner, B.P., Smith, D., 26. Acoustic environment classification. ACM Trans. Speech Lang. Process. 3 (2), 22. Martin, R., 2. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9 (5), McAulay, R.J., Malpass, M.L., 98. Speech enhancement using a softdecision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. ASSP-28 (2), Park, Y.S., Chang, J.-H., 27. A novel approach to a robust a priori SNR estimator in speech enhancement. IEICE Trans. Comm. E9-B (8), Quackenbush, S., Barnwell, T., Clements, M., 988. Objective Measures of Speech Quality. Prentice-Hall, Englewood Cliffs, NJ. Sangwan, A., Krishnamurthy, N., Hansen, J., 27. Environmentally aware voice activity detector. In: Proc. Interspeech, pp Sim, B.L., Tong, Y.C., Chang, J.S., Tan, C.T., 998. A parametric formulation of the generalized spectral subtraction method. IEEE Trans. Acoust. Speech Signal Process. 6 (4), Sohn, J., Sung, W., 998. A voice activity detector employing soft decision based noise spectrum adaptation. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 98), Vol., pp
14 49 J.-H. Choi, J.-H. Chang / Speech Communication 54 (22) Song, J.-H., Lee, K.-H., Chang, J.-H., Kim, J.K., Kim, N.S., 28. Analysis and improvement of speech/music classification for 3GPP2 SMV based on GMM. IEEE Signal Process. Lett. 5, 3 6. TIA/EIA/IS-27, 996. Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems. TMS32C55x., 22. TMS32C55x DSP library programmer s reference. TI Inc., Dallas, TX, USA. 3GPP2 Spec., 25. Software distribution for selectable mode vocoder (SMV), series option 56, specification. 3GPP2-C.R3-, v3..
Signal Processing 91 (2011) Contents lists available at ScienceDirect. Signal Processing. journal homepage:
Signal Processing 9 (2) 55 6 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast communication Minima-controlled speech presence uncertainty
More informationEffective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a
R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,
More informationARTICLE IN PRESS. Signal Processing
Signal Processing 9 (2) 737 74 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Fast communication Double-talk detection based on soft decision
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationRECENTLY, there has been an increasing interest in noisy
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In
More informationSpeech Enhancement for Nonstationary Noise Environments
Signal & Image Processing : An International Journal (SIPIJ) Vol., No.4, December Speech Enhancement for Nonstationary Noise Environments Sandhya Hawaldar and Manasi Dixit Department of Electronics, KIT
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationNoise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments
88 International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 88-87, December 008 Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise
More informationSpeech Signal Enhancement Techniques
Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr
More informationMMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2
MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,
More informationNOISE ESTIMATION IN A SINGLE CHANNEL
SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina
More informationStudents: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa
Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationREAL-TIME BROADBAND NOISE REDUCTION
REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time
More informationCHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS
46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationSPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim
SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION Changkyu Choi, Seungho Choi, and Sang-Ryong Kim Human & Computer Interaction Laboratory Samsung Advanced Institute of Technology
More informationPerceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter
Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Sana Alaya, Novlène Zoghlami and Zied Lachiri Signal, Image and Information Technology Laboratory National Engineering School
More informationNoise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging
466 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 5, SEPTEMBER 2003 Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging Israel Cohen Abstract
More informationANUMBER of estimators of the signal magnitude spectrum
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011 1123 Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty Yang Lu and Philipos
More informationFrequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement
Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation
More informationSpeech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech
Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu
More informationTransient noise reduction in speech signal with a modified long-term predictor
RESEARCH Open Access Transient noise reduction in speech signal a modified long-term predictor Min-Seok Choi * and Hong-Goo Kang Abstract This article proposes an efficient median filter based algorithm
More informationSPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK
18th European Signal Processing Conference (EUSIPCO-2010) Aalborg, Denmar, August 23-27, 2010 SPEECH ENHANCEMENT BASED ON A LOG-SPECTRAL AMPLITUDE ESTIMATOR AND A POSTFILTER DERIVED FROM CLEAN SPEECH CODEBOOK
More informationSpeech Enhancement Using a Mixture-Maximum Model
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE
More informationChapter 3. Speech Enhancement and Detection Techniques: Transform Domain
Speech Enhancement and Detection Techniques: Transform Domain 43 This chapter describes techniques for additive noise removal which are transform domain methods and based mostly on short time Fourier transform
More informationMODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS
MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,
More informationAS DIGITAL speech communication devices, such as
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 4, MAY 2012 1383 Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay Timo Gerkmann, Member, IEEE,
More informationEstimation of Non-stationary Noise Power Spectrum using DWT
Estimation of Non-stationary Noise Power Spectrum using DWT Haripriya.R.P. Department of Electronics & Communication Engineering Mar Baselios College of Engineering & Technology, Kerala, India Lani Rachel
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationSTATISTICAL METHODS FOR THE ENHANCEMENT OF NOISY SPEECH. Rainer Martin
STATISTICAL METHODS FOR THE ENHANCEMENT OF NOISY SPEECH Rainer Martin Institute of Communication Technology Technical University of Braunschweig, 38106 Braunschweig, Germany Phone: +49 531 391 2485, Fax:
More informationINTERNATIONAL TELECOMMUNICATION UNION
INTERNATIONAL TELECOMMUNICATION UNION ITU-T P.835 TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (11/2003) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Methods
More informationIntegrated acoustic echo and background noise suppression technique based on soft decision
Park and Chang EURASIP Journal on Advances in Signal Processing, : http://asp.eurasipjournals.com/content/// RESEARCH Open Access Integrated acoustic echo and background noise suppression technique based
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationImpact Noise Suppression Using Spectral Phase Estimation
Proceedings of APSIPA Annual Summit and Conference 2015 16-19 December 2015 Impact oise Suppression Using Spectral Phase Estimation Kohei FUJIKURA, Arata KAWAMURA, and Youji IIGUI Graduate School of Engineering
More informationSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure
More informationIN RECENT YEARS, there has been a great deal of interest
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 12, NO 1, JANUARY 2004 9 Signal Modification for Robust Speech Coding Nam Soo Kim, Member, IEEE, and Joon-Hyuk Chang, Member, IEEE Abstract Usually,
More informationEnhancement of Speech in Noisy Conditions
Enhancement of Speech in Noisy Conditions Anuprita P Pawar 1, Asst.Prof.Kirtimalini.B.Choudhari 2 PG Student, Dept. of Electronics and Telecommunication, AISSMS C.O.E., Pune University, India 1 Assistant
More informationA Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification
A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department
More informationCodebook-based Bayesian speech enhancement for nonstationary environments Srinivasan, S.; Samuelsson, J.; Kleijn, W.B.
Codebook-based Bayesian speech enhancement for nonstationary environments Srinivasan, S.; Samuelsson, J.; Kleijn, W.B. Published in: IEEE Transactions on Audio, Speech, and Language Processing DOI: 10.1109/TASL.2006.881696
More informationIMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM
IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM Mr. M. Mathivanan Associate Professor/ECE Selvam College of Technology Namakkal, Tamilnadu, India Dr. S.Chenthur
More informationNoise Reduction: An Instructional Example
Noise Reduction: An Instructional Example VOCAL Technologies LTD July 1st, 2012 Abstract A discussion on general structure of noise reduction algorithms along with an illustrative example are contained
More informationAnalysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model
Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model Harjeet Kaur Ph.D Research Scholar I.K.Gujral Punjab Technical University Jalandhar, Punjab, India Rajneesh Talwar Principal,Professor
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 2, Issue 11, November 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Review of
More informationAccurate Delay Measurement of Coded Speech Signals with Subsample Resolution
PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationFundamental frequency estimation of speech signals using MUSIC algorithm
Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,
More informationModified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments
Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments G. Ramesh Babu 1 Department of E.C.E, Sri Sivani College of Engg., Chilakapalem,
More informationPROSE: Perceptual Risk Optimization for Speech Enhancement
PROSE: Perceptual Ris Optimization for Speech Enhancement Jishnu Sadasivan and Chandra Sehar Seelamantula Department of Electrical Communication Engineering, Department of Electrical Engineering Indian
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationPhase estimation in speech enhancement unimportant, important, or impossible?
IEEE 7-th Convention of Electrical and Electronics Engineers in Israel Phase estimation in speech enhancement unimportant, important, or impossible? Timo Gerkmann, Martin Krawczyk, and Robert Rehr Speech
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationIN REVERBERANT and noisy environments, multi-channel
684 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 11, NO. 6, NOVEMBER 2003 Analysis of Two-Channel Generalized Sidelobe Canceller (GSC) With Post-Filtering Israel Cohen, Senior Member, IEEE Abstract
More informationDual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 6, JUNE 2014 1069 Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio Jae-Hun
More informationHUMAN speech is frequently encountered in several
1948 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 7, SEPTEMBER 2012 Enhancement of Single-Channel Periodic Signals in the Time-Domain Jesper Rindom Jensen, Student Member,
More informationWavelet Speech Enhancement based on the Teager Energy Operator
Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose
More informationA Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)
More informationEMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT
T-ASL-03274-2011 1 EMD BASED FILTERING (EMDF) OF LOW FREQUENCY NOISE FOR SPEECH ENHANCEMENT Navin Chatlani and John J. Soraghan Abstract An Empirical Mode Decomposition based filtering (EMDF) approach
More informationAn Improved Voice Activity Detection Based on Deep Belief Networks
e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.
More informationA CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION
More informationSpeech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation
Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation Md Tauhidul Islam a, Udoy Saha b, K.T. Shahid b, Ahmed Bin Hussain b, Celia Shahnaz
More information24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE
24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY 2009 Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation Jiucang Hao, Hagai
More informationReliable A posteriori Signal-to-Noise Ratio features selection
Reliable A eriori Signal-to-Noise Ratio features selection Cyril Plapous, Claude Marro, Pascal Scalart To cite this version: Cyril Plapous, Claude Marro, Pascal Scalart. Reliable A eriori Signal-to-Noise
More informationNoise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics
504 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 5, JULY 2001 Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics Rainer Martin, Senior Member, IEEE
More informationLong Range Acoustic Classification
Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire
More informationSpeech Enhancement Based On Noise Reduction
Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion
More informationAvailable online at ScienceDirect. Procedia Computer Science 54 (2015 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 54 (2015 ) 574 584 Eleventh International Multi-Conference on Information Processing-2015 (IMCIP-2015) Speech Enhancement
More informationRecent Advances in Acoustic Signal Extraction and Dereverberation
Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing
More informationCOM 12 C 288 E October 2011 English only Original: English
Question(s): 9/12 Source: Title: INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2009-2012 Audience STUDY GROUP 12 CONTRIBUTION 288 P.ONRA Contribution Additional
More informationRobust Low-Resource Sound Localization in Correlated Noise
INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationEvaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation
Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate
More informationTHE TELECOMMUNICATIONS industry is going
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 6, NOVEMBER 2006 1935 Single-Ended Speech Quality Measurement Using Machine Learning Methods Tiago H. Falk, Student Member, IEEE,
More informationOPTIMAL SPECTRAL SMOOTHING IN SHORT-TIME SPECTRAL ATTENUATION (STSA) ALGORITHMS: RESULTS OF OBJECTIVE MEASURES AND LISTENING TESTS
17th European Signal Processing Conference (EUSIPCO 9) Glasgow, Scotland, August -, 9 OPTIMAL SPECTRAL SMOOTHING IN SHORT-TIME SPECTRAL ATTENUATION (STSA) ALGORITHMS: RESULTS OF OBJECTIVE MEASURES AND
More informationBlind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model
Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Jong-Hwan Lee 1, Sang-Hoon Oh 2, and Soo-Young Lee 3 1 Brain Science Research Center and Department of Electrial
More informationSpeech Enhancement in Noisy Environment using Kalman Filter
Speech Enhancement in Noisy Environment using Kalman Filter Erukonda Sravya 1, Rakesh Ranjan 2, Nitish J. Wadne 3 1, 2 Assistant professor, Dept. of ECE, CMR Engineering College, Hyderabad (India) 3 PG
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationAvailable online at ScienceDirect. Procedia Computer Science 89 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 666 676 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Comparison of Speech
More informationComparison of Spectral Analysis Methods for Automatic Speech Recognition
INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationROBUST echo cancellation requires a method for adjusting
1030 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk Jean-Marc Valin, Member,
More informationNOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA. Qipeng Gong, Benoit Champagne and Peter Kabal
NOISE POWER SPECTRAL DENSITY MATRIX ESTIMATION BASED ON MODIFIED IMCRA Qipeng Gong, Benoit Champagne and Peter Kabal Department of Electrical & Computer Engineering, McGill University 3480 University St.,
More informationNon-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License
Title Non-intrusive intelligibility prediction for Mandarin speech in noise Author(s) Chen, F; Guan, T Citation The 213 IEEE Region 1 Conference (TENCON 213), Xi'an, China, 22-25 October 213. In Conference
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationMULTICHANNEL systems are often used for
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 5, MAY 2004 1149 Multichannel Post-Filtering in Nonstationary Noise Environments Israel Cohen, Senior Member, IEEE Abstract In this paper, we present
More informationAdaptive Noise Reduction of Speech. Signals. Wenqing Jiang and Henrique Malvar. July Technical Report MSR-TR Microsoft Research
Adaptive Noise Reduction of Speech Signals Wenqing Jiang and Henrique Malvar July 2000 Technical Report MSR-TR-2000-86 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 http://www.research.microsoft.com
More informationModulation Domain Spectral Subtraction for Speech Enhancement
Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation
More informationPERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH RECOGNITION
Journal of Engineering Science and Technology Vol. 12, No. 4 (2017) 972-986 School of Engineering, Taylor s University PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH
More informationEnhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients
ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds
More informationNarrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators
374 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 52, NO. 2, MARCH 2003 Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators Jenq-Tay Yuan
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationDifferent Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments
International Journal of Scientific & Engineering Research, Volume 2, Issue 5, May-2011 1 Different Approaches of Spectral Subtraction method for Enhancing the Speech Signal in Noisy Environments Anuradha
More informationPerformance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment
BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity
More information