Robust telephone speech recognition based on channel compensation

Size: px
Start display at page:

Download "Robust telephone speech recognition based on channel compensation"

Transcription

1 Pattern Recognition 32 (1999) 1061}1067 Robust telephone speech recognition based on channel compensation Jiqing Han*, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, , People+s Republic of China Received 16 October 1997; received in revised form 16 July 1998 Abstract Channel compensation technique has been proved to be an e!ective approach for robust speech recognition. In this paper, we compare the performance of our proposed method RMFCC with those of the former channel compensation methods: CMS, two-level CMS and RASTA for robust telephone speech recognition. For all experiments, a Korean isolated 84-word- consisting of 80 speakers collected from local telephone line is adopted. Using RMFCC, a 39.8% reduction in word error rate is obtained relative to conventional HMM system. It is shown from the experiments that RMFCC, comparing with RASTA, reduces the computational complexity without losing accuracy, and is also better than CMS and two-level CMS on the performance. After discussion, we verify that it is an e!ective approach to suppress very low modulation frequencies by "ltering for robust telephone speech recognition Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Channel compensation; Speech recognition; Robustness; Modulation frequencies; Signal-to-noise rate 1. Introduction Speech signal carries not only the linguistic message but is also in#uenced by other sources of information. One of the more harmful sources of non-linguistic variability is the communication environment, which typically includes the recording room, the microphone and most importantly the communication channel such as a telephone line. Performance of an automatic speech recognizer (ASR) can be degraded dramatically when the recognizer is applied in an environment that is di!erent from the environment in which it was trained. Though the degradation may often be attributed to non-linear e!ects of the environment and the additive noise in the signal, the frequency character of the communication channel alone can strongly in#uence the short-time * Corresponding author. spectrum of the speech. Since most similarity measures applied in ASRs are directly or indirectly based on the short-time spectrum of the speech, ASR performance can be also signi"cantly in#uenced by the frequency character of the communication channel. It has been reported [1] that the error rate of a speech recognizer can increase from 1.3 to 44.6%when the testing data are"ltered by a pole/zero "lter modeling a long-distance telephone line. Thus, it is a crucial factor to "nd the robust channel compensation methods for the actual application of telephone speech recognition. The robustness of speech recognition has been widely discussed, and a variety of channel compensation approaches have been proposed [2}5]. Cepstral mean subtraction (CMS), [2] which "rst calculates the cepstral mean of an utterance and then the cepstral mean is subtracted from the cepstral coe$cients of each frame, is one of the e!ective algorithms considering the simplicity. However, the e!ectiveness of CMS is severely limited when the channel cannot be adequately modeled by /99/$ } See front matter 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S ( 9 8 )

2 1062 J. Han, W. Gao / Pattern Recognition 32 (1999) 1061}1067 a linear one. In order to process the non-linear channel, two-level CMS is proposed, which "rst classi"es the input speech signal into two parts and calculates the cepstral mean for each part, then the di!erent cepstral means are subtracted from the cepstral coe$cients of each part. This method had been e!ectively used for channel compensation in the connected digit recognition for mobile applications [3] and speaker recognition [4], and is better than CMS on the performance. However, it needs signal classi- "cation, and the performance depends on the classi"cation result. The RelAtive SpecTrAl (RASTA) processing [5] which uses a band-pass "lter with a very low cut-o! frequency is an e!ective channel compensation method, and it can suppress slowly-varying channel distortions and get good performance. Conventional RASTA processing is applied on the perceptual linear predictive (PLP) [6] log spectrum. However, PLP needs complex computation. In our previous work [7], we use mel spectral analysis [8] instead of PLP approach to reduce the computation. Based on the linear relationship between mel-frequency log spectrum and mel-frequency cepstral coe$cients (MFCCs), we extend RASTA processing from mel-frequency log spectrum to MFCCs, and a RASTA-like band-pass "lter is proposed for robust speech recognition. Next, we select the pole parameter of the "lter by experiments and discuss the initial-value selection of the integrator. In this paper, we compare the performance of our proposed method RMFCC with those of the former channel compensation methods: CMS, two-level CMS and RASTA for robust telephone speech recognition. For all experiments, a Korean isolated 84-word- consisting of 80 speakers collected from local telephone line is adopted. Using RMFCC, a 39.8% reduction in word error rate is obtained relative to conventional HMM system. It is shown from the experiments that the proposed method, comparing with RASTA, reduces the computational complexity without losing accuracy, and is also better than CMS and two-level CMS on the performance. After discussion, we verify that it is an e!ective approach to suppress very low modulation frequencies by "ltering for robust telephone speech recognition. This paper is organized as follows. In Section 2, we introduce the former channel compensation methods and our proposed method RMFCC. Next, in Section 3 the telephone speech and the signal processing for our experiments are described. Then, the results of our recognition experiments are discussed in Section 4. Finally, in Section 5 we sum up the main conclusion. stream is zero, and modi"es the cepstral coe$cients to minimize the mismatch in the training and testing data due to channel distortions. CMS is often regarded as a kind of standard channel compensation method in which the mean of the cepstral vector is subtracted from the cepstral coe$cients of an utterance, C "c! 1 c (t"0, 1, 2, ¹ 2, ¹!1), (1) where c and C are the cepstral vectors at frame t before and after CMS processing respectively, and ¹ is the total frame number in the utterance ¹wo-level CMS Generally speaking, speech signal is corrupted not only by the channel distortions but also by the additive noise before entering ASR (as shown in Fig. 1); at the power spectrum the noisy speech >(ω) is >(ω)"[x(ω)#n (ω)] H(ω)#N (ω), (2) where X(ω) refers to the input speech component, N (ω) and N (ω) refer to the environmental noise, and H(ω) refers to the channel distortions. For clean training and testing conditions (N (ω) and N (ω) being negligible), the distortions become additive in the cepstral domain, therefore CMS removes the timeinvariant part of channel distortions. When high level of additive noise (N (ω) and N (ω) non-negligible) is present, we can only ignore noise component in the speech segments with high signal-to-noise rate (SNR) and use CMS in those segments. In the same manner, we can ignore the speech component in the segments of very low SNR, i.e. very low-level speech or no speech is present. Thus, recognition performance can be improved further by using a two-level CMS, where separate channel compensation is performed for the segments that are classi"ed as speech and for the segments classi"ed as background. Given a frame sequence of cepstral observations C"c, c, 2, c, each frame is classi"ed as either a background or a speech frame. Let E"E, E, 2, 2. Channel compensation for robust telephone speech recognition 2.1. Cepstral mean subtraction (CMS) Cepstral mean subtraction relies on the assumption that the ensemble average of the input speech feature Fig. 1. The diagram of speech signal distortions.

3 J. Han, W. Gao / Pattern Recognition 32 (1999) 1061} E be the log energy sequence of the observation, then the background function is bck(t)" E 1 (αe 0 otherwise (t"0, 1, 2, 2, ¹!1), (3) where E is the maximum log energy of the observation, and the parameter α is an empirically selected constant. For two-level CMS, the compensated cepstral vectors C are computed according to C " c!c c!c bck(t)"1 otherwise (t"0, 1, 2, 2, ¹!1) (4) where C and C are the cepstral mean vectors of the background and speech frames, respectively RAS¹A-P P technique Perceptual experiments suggest that human speech perception might be able to suppress stationary nonlinguistic background and enhance the variable linguistic message [9]. Thus, it is useful to adopt the features based on human hearing for robust speech recognition. In RASTA-PLP technique, several well known properties of hearing are simulated by practical engineering approximations, and then a band-pass "lter as follow is applied to a log spectral representation of speech, H(Z)"0.1 Z (2#Z!Z!2Z ). (5) 1!0.94Z The numerator of this "lter represents a linear regression estimate of the temporal derivative, while the denominator represents a simple leaky integrator. RASTA processing can e!ectively suppress slowly varying channel distortions RMFCC method In RASTA-PLP technique, only the RASTA processing is used to suppress slowly varying channel distortions, while PLP is used to simulate the properties of human hearing. Mel spectral analysis is also one kind of approach simulating the properties of human hearing and simpler than PLP analysis. Speci"cally, mel spectral analysis need not calculate the complex equal loudness pre-emphasis and intensity loudness power law, and conduct spectral analysis again [6]. If we use H(Z) to represent the RASTA band-pass "lter, > and >M each represent the ith mel-frequency log spectrums in the Z transform domain before and after being processed by RASTA, then we get >M "H(Z) ) >. (6) MFCCs, which are used as the features in most of the current speech recognizers, are calculated by using discrete cosine transform (DCT) on mel-frequency log spectrum as follows [8]; C (k)" cos B π k(i!0.5) >M "H(Z) cos B π k(i!0.5) > "H(Z) c (k) (k"1, 2, 3,..., K), (7) where C (k) and c (k) are the kth MFCCs in the Z transform domain with and without using RASTA processing respectively, B is the number of mel-frequency bands, and K is the dimension of MFCCs. From Eq. (7), it is reasonable to extend RASTA processing from log spectrum to MFCCs (i.e. "rst calculating MFCCs and then processing by a band-pass "lter). Generally, B is bigger than K (e.g. we used B"40 and K"12), and thus this kind of relative MFCCs (RMFCC) processing reduces the computation complexity. The main part of RASTA processing is the IIR "lter as follows H(Z)"G Z (2#Z!Z!2Z ). (8) 1!ρZ We also use this kind of "lter and should select the parameters of the "lter for our RMFCC processing. When an input signal X[t] passes through H(Z) in equation (8), the output >[t] is >[t]"g (n!2) X[t#n]#ρ >[t!1], (9) where t"0,1,2,..., ¹!1 are the labels of the frames, and the initial value >[!1] should be selected. 3. Database and baseline system The is collected from the local telephone network in Seoul and Taejon, South Korea, and many kinds of di!erent handsets are used for collection. Since the system is speaker-independent, several speakers are selected for experiments. The training contains utterances from 40 speakers (22 male and 18 female), and the testing utterances from 40 di!erent speakers (22 male and 18 female). The male to female ratio in the re#ects that of the general South Korea population. Every speaker read 93 sentences several times, and then 84 Korean isolated words were manually segmented and labeled. Since some utterances were discarded due to bad recordings, the total number of utterances for training is and for testing one 8036.

4 1064 J. Han, W. Gao / Pattern Recognition 32 (1999) 1061}1067 Table 1 Di!erent SNR measurements for the Measurement Training Testing SNR db 13.95dB SEGSNR db 13.85dB MAXSNR db 19.00dB We use the SNR as an objective measurement to evaluate the. In the literature, many SNR measurements have been proposed (see e.g. Ref. [10]). Since we have no a priori knowledge about telephone speech, the di!erent SNR measurements were adopted for the training and testing, and the results are listed in Table 1. It is shown from the Table 1 that those measurements are very similar between the training and the testing one, which might be that the includes relatively su$cient environmental (speakers, channels, noises) features and obeys the statistical theory. In the experiments, the speech signal is "rst digitized at a sampling rate of 8 KHz, a pre-emphasis "lter H(z)"1!0.95z is applied to the speech samples, and a Hamming window of 240 samples (30 ms) is used for every 15 ms. Next, the power spectrum of the windowed signal in each frame is computed using a 256-point DFT, and 40 mel-frequency spectral coe$cients are derived based on mel-frequency band-pass "lters. Then, 12 MFCCs are computed using the DCT. Finally, an isolated word, conventional continuous density HMM recognizer is implemented as the baseline system for the performance comparison. 4. Experiments and discussion A series of experiments are designed, using the training and testing, to evaluate the channel compensation methods for robust telephone speech recognition. We implement two-level CMS method and select the classi"cation parameter α by experiments, and also use experiments to determine some parameters for RMFCC. Then, the performances of all the channel compensation methods are compared Experiments of two-level CMS In two-level CMS, as shown in the Eq. (3), there is a parameter α which should be determined. Using di!erent parameter α, we did the comparing experiments to select the para- meter, and the results are illustrated in Fig. 2. Performances of two-level CMS with di!erent classi"cation parameter α. Table 2 Word error rates and the reductions in word error rates using CMS and two-level CMS in comparison with the baseline system Method Baseline CMS Two-level CMS system Training 6.5% 2.7% 2.5% Error * 58.5% 61.5% reduction Testing 11.8% 7.8% 7.2% Error * 33.9% 38.9% reduction Fig. 2. It is found that α"0.1 exhibits an optimum, which is adopted as the classi"cation parameter in our two-level CMS experiment. We implement a system using CMS as the channel compensation method and compare the performance with that of using two-level CMS. Table 2 shows the results of the baseline system, the systems of using CMS and two-level, CMS, respectively, and also shows the word error rate reductions of using the two channel compensation methods in comparison with the baseline system. Although the linear time-invariant channel assumption is almost never satis"ed in practice, using CMS we still achieve a signi"cant improvement on the performance relative to the baseline system, and a 33.9% reduction in word error rate is got for the testing. Two-level CMS is better than CMS on the performance, and a 7.7% reduction in word error rate is got for the testing.

5 J. Han, W. Gao / Pattern Recognition 32 (1999) 1061} From the experimental results, we "nd that the twolevel CMS can further suppress the channel distortions in telephone speech recognition Parameters selection in RMFCC In RMFCC method, there are some adjustable parameters, namely the gain G and the pole ρ in Eq. (8) and the initial value >[!1] in Eq. (9), all these parameters a!ect the recognition accuracy. In the previous RASTA processing, a constant 0.94 was used as the pole ρ. For RMFCC processing, we select the parameter ρ by comparing experiments. Using the gain G"0.1 consistent with RASTA, we compare the system performances of using di!erent ρ, and the results are shown in Fig. 3. From the results, we observe that ρ"0.92 exhibits an optimum, and it is chosen as the pole parameter of the RMFCC "lter in the following experiments. We also use comparing experiments to select the initial value >[!1] in Eq. (9) to get a better recognition accuracy. In the previous RASTA works, they did not report how to select the initial value y[!1]. All the methods are keeping the silent part before speech, and the results are dependent on the determination of the silence. In noisy environment, it is not easy to determine the silence parts and unfortunately, the silence is often mixed with serious noise. When the integrator is used from the silence, the noise might be introduced again, and moreover, it also needs extra silence processing. We attempt to "nd the special >[!1] to get good performance without processing the extra silence. Using the G"0.1 and ρ"0.92, three kinds of the initial values: zero, cepstral mean and silent part, are compared, and the results are listed in Table 3. We can see that using zero initial value gets the best performance. It seems that the zero value normalizes the cepstral coe$cients for all utterances. Also using zero initial value the computation is very simple Comparing experiments The proposed method RMFCC and the former channel compensation methods have been evaluated for the robust telephone speech recognition, and the experimental results are summarized in Table 4. As discussed below, delta-mfcc has a relationship with RMFCC. In this point of view we implemented another conventional HMM system using MFCC and delta-mfcc as features, where the delta-mfcc features are obtained by a linear regression estimate of the MFCCs. And we also implemented a system using RASTA as channel compensation method. In order to get good performance, we had attempted to combine RMFCC with CMS and two-level CMS, respectively, but the results, as shown in Table 4, are not improved in comparison with the case of using RMFCC only. It is shown from the experiments that the performance of RMFCC is signi"cantly superior to that of the baseline system, and a 39.8% reduction in word error rate is got for the testing. Comparing with delta-mfcc, RMFCC produces a 28.3% reduction in word error rate with slight increase in the computational complexity. Table 3 Performance comparison for the di!erent initial values of the RMFCC integrator Initial value Zero Mean Silence Training 97.7% 97.6% 97.7% Testing 92.9% 91.9% 92.6% Table 4 Word error rates using various types of channel compensation methods Method Training Testing Fig. 3. Performances of di!erent RMFCC "lter pole positions. Baseline 6.5% 11.8% Delta-MFCC 3.4% 9.9% CMS 2.7% 7.8% Two-level CMS 2.5% 7.2% RASTA 2.1% 7.1% RMFCC 2.3% 7.1% CMS#RMFCC 2.3% 7.1% Two-level CMS #RMFCC 2.3% 7.1%

6 1066 J. Han, W. Gao / Pattern Recognition 32 (1999) 1061}1067 RMFCC has been shown to yield good performance in Section 4.3, and the main part of RMFCC processing is a band-pass "lter. After analyzing the feature of the RMFCC "lter, we get the frequency response curve of the "lter, which is the solid line in Fig. 4. It can be seen that the "lter can attenuate very low modulation frequencies. Using delta-mfcc, the performance is better than that of the baseline system which is only using MFCC as features, and this is consistent with the former work [2]. We notice that there is a relationship between delta- MFCC and RMFCC. When the denominator of the RMFCC "lter in Eq. (8) is ignored, RMFCC processing is equivalent to the calculation of delta-mfcc. Therefore, delta-mfcc is also regarded as a kind of RMFCC processing. The frequency response curve of the "lter used in delta-mfcc is the dotted line in Fig. 4, and it can also suppress low modulation frequencies. This is the reason why using delta-mfcc the performance is better than that of using MFCC only. CMS processing can be regarded as a kind of high-pass "ltering, which can also suppress low modulation frequencies. The two-level CMS, in which di!erent modulation frequencies are considered and removed by di!erent high-pass "ltering, is better than CMS. From the discussion, we know that many channel compensation methods are based on the "ltering processing, and verify that it is an e!ective approach to suppress low modulation frequencies by "ltering for robust telephone speech recognition. 5. Conclusion Fig. 4. Frequency responses of RMFCC and delta-mfcc "lters. And it is also better than CMS and two-level CMS on the performance. In comparison with two-level CMS, although RMFCC does not obviously improve the performance, it is a straightforward method, while both CMS and two-level CMS are post-processing methods, which need calculating the cepstral mean of an utterance to estimate the long-term features of the channel and then the mean should be subtracted from the cepstral coe$cients of every frame. Comparing with RASTA, RMFCC gets nearly the same performance but requires a simple computational complexity. With respect to both the performance and the computational complexity, RMFCC is the best method Discussion We have extended RASTA processing from log spectrum to MFCCs and proposed the RMFCC processing method, and also compared the performances between RMFCC and the former channel compensation methods. For all experiments, a Korean isolated 84-word- consisting of 80 speakers collected from local telephone line is adopted. Using RMFCC, a 39.8% reduction in word error rate is obtained relative to conventional HMM system. It is shown from the experiments that the proposed method reduces the computational complexity without compromising performance in comparison with RASTA, and has the advantage which does not have to estimate the long-term features of the communication channel like CMS and two-level CMS. After discussion, we "nd that many channel compensation methods are based on the "ltering processing, and verify that it is an e!ective approach to suppress very low modulation frequencies by "ltering for robust telephone speech recognition. Acknowledgements The authors would like to thank Mr Munsung Han, Mr Gyu-Bong Park and Mr Jeongue Park for their support. References [1] S. Lerner, B. Mazor, Telephone channel normalization for automatic speech recognition, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, ICASSP92 (1992) I261}I264. [2] S. Furui, Cepstral analysis technique for automatic speaker veri"cation, IEEE Trans. Acoust. Speech Signal Process. 29 (1981) 254}272. [3] S. Gupta, F. Soong, R. Haimi-Cohen, High accuracy connected digit recognition for mobile applications, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing ICASSP96 (1996) 57}60. [4] D. Reynolds, The e!ects of handset variability on speaker recognition performance: experiments on the switchboard corpus, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. ICASSP96, (1996) 113}116. [5] H. Hermansky, N. Morgan, RASTA processing of speech, IEEE Trans. Speech Audio Process. 2 (1994) 578}589.

7 J. Han, W. Gao / Pattern Recognition 32 (1999) 1061} [6] H. Hermansky, Perceptual linear predictive (PLP) analysis of Speech, J. Acoust. Soc. Am. 87 (1990) 1738}1752. [7] J. Han, M. Han, G. Park, J. Park, W. Gao, Relative mel-frequency cepstral coe$cients compensation for robust telephone Speech Recognition, Proc. Europ. Conf. Speech Commun. Technol. Eurospeech97, (1997) pp. 1531}1534. [8] J.W. Picone, Signal modeling techniques in speech recognition, Proc. IEEE 81 (1993) 1215}1247. [9] Q. Summer"eld, A. Sidwell, T. Nelson, Auditory enhancement of changes in spectral amplitude, J. Acoust. Soc. Am. 81 (1987) 700}706. [10] N. Jayant, P. Npll, Digital Coding of waveforms. Prentice- Hall, Englewood Cli!s, NJ, About the Author*JIQING HAN received the B.S. degree and M.S. degree in Electrical Engineering from Harbin Institute of Technology (HIT), Harbin, P. R. China, in 1987 and 1990, respectively. From 1990 to 1996, he was an assistant lecturer in the Department of Computer Science and Engineering, HIT. He is currently working for his Ph.D. degree in Computer Science and Engineering in HIT. Since June 1996, he has been working in Systems Engineering Research Institute, Korea Institute of Science and Technology, South Korea, as a Visiting Scientist. His research interests include robust speech recognition, signal processing and pattern recognition. About the Author*WEN GAO received his "rst Ph.D. degree in Computer Science and Engineering from Harbin Institute of Technology (HIT), China, in 1988, and the second Ph.D. degree in Electrical Engineering from University of Tokyo, Japan, in In 1993, he worked in Carnegie Mellon University (CMU), and in 1995 at Arti"cial Intelligent Laboratory in MIT as a Visiting Professor. He is a professor and the director of Motorola - NCIN Joint D and R Laboratory, China. Prof. GAO had published many papers, and his current research interests include image processing, computer vision, pattern recognition and multimodal human interface.

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Modulation Spectrum Power-law Expansion for Robust Speech Recognition Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR Syu-Siang Wang 1, Jeih-weih Hung, Yu Tsao 1 1 Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan Dept. of Electrical

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Adaptive Noise Reduction of Speech. Signals. Wenqing Jiang and Henrique Malvar. July Technical Report MSR-TR Microsoft Research

Adaptive Noise Reduction of Speech. Signals. Wenqing Jiang and Henrique Malvar. July Technical Report MSR-TR Microsoft Research Adaptive Noise Reduction of Speech Signals Wenqing Jiang and Henrique Malvar July 2000 Technical Report MSR-TR-2000-86 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 http://www.research.microsoft.com

More information

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping 100 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping Naoya Wada, Shingo Yoshizawa, Noboru

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Chanwoo Kim 1 and Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

An Introduction to the FDM-TDM Digital Transmultiplexer: Appendix C *

An Introduction to the FDM-TDM Digital Transmultiplexer: Appendix C * OpenStax-CNX module: m32675 1 An Introduction to the FDM-TDM Digital Transmultiplexer: Appendix C * John Treichler This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

Speech Signal Enhancement Techniques

Speech Signal Enhancement Techniques Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr

More information

A Real Time Noise-Robust Speech Recognition System

A Real Time Noise-Robust Speech Recognition System A Real Time Noise-Robust Speech Recognition System 7 A Real Time Noise-Robust Speech Recognition System Naoya Wada, Shingo Yoshizawa, and Yoshikazu Miyanaga, Non-members ABSTRACT This paper introduces

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Fundamental frequency estimation of speech signals using MUSIC algorithm

Fundamental frequency estimation of speech signals using MUSIC algorithm Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Robustness (cont.); End-to-end systems

Robustness (cont.); End-to-end systems Robustness (cont.); End-to-end systems Steve Renals Automatic Speech Recognition ASR Lecture 18 27 March 2017 ASR Lecture 18 Robustness (cont.); End-to-end systems 1 Robust Speech Recognition ASR Lecture

More information

Digitally controlled Active Noise Reduction with integrated Speech Communication

Digitally controlled Active Noise Reduction with integrated Speech Communication Digitally controlled Active Noise Reduction with integrated Speech Communication Herman J.M. Steeneken and Jan Verhave TNO Human Factors, Soesterberg, The Netherlands herman@steeneken.com ABSTRACT Active

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Speech Enhancement for Nonstationary Noise Environments

Speech Enhancement for Nonstationary Noise Environments Signal & Image Processing : An International Journal (SIPIJ) Vol., No.4, December Speech Enhancement for Nonstationary Noise Environments Sandhya Hawaldar and Manasi Dixit Department of Electronics, KIT

More information

Machine recognition of speech trained on data from New Jersey Labs

Machine recognition of speech trained on data from New Jersey Labs Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu

More information

SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS

SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS Bojana Gajić Department o Telecommunications, Norwegian University o Science and Technology 7491 Trondheim, Norway gajic@tele.ntnu.no

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Sana Alaya, Novlène Zoghlami and Zied Lachiri Signal, Image and Information Technology Laboratory National Engineering School

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

Robust Speech Recognition Group Carnegie Mellon University. Telephone: Fax:

Robust Speech Recognition Group Carnegie Mellon University. Telephone: Fax: Robust Automatic Speech Recognition In the 21 st Century Richard Stern (with Alex Acero, Yu-Hsiang Chiu, Evandro Gouvêa, Chanwoo Kim, Kshitiz Kumar, Amir Moghimi, Pedro Moreno, Hyung-Min Park, Bhiksha

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper

More information

Signal Analysis Using Autoregressive Models of Amplitude Modulation. Sriram Ganapathy

Signal Analysis Using Autoregressive Models of Amplitude Modulation. Sriram Ganapathy Signal Analysis Using Autoregressive Models of Amplitude Modulation Sriram Ganapathy Advisor - Hynek Hermansky Johns Hopkins University 11-18-2011 Overview Introduction AR Model of Hilbert Envelopes FDLP

More information

Abstract Dual-tone Multi-frequency (DTMF) Signals are used in touch-tone telephones as well as many other areas. Since analog devices are rapidly chan

Abstract Dual-tone Multi-frequency (DTMF) Signals are used in touch-tone telephones as well as many other areas. Since analog devices are rapidly chan Literature Survey on Dual-Tone Multiple Frequency (DTMF) Detector Implementation Guner Arslan EE382C Embedded Software Systems Prof. Brian Evans March 1998 Abstract Dual-tone Multi-frequency (DTMF) Signals

More information

Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim, Member, IEEE, and Richard M. Stern, Fellow, IEEE

Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim, Member, IEEE, and Richard M. Stern, Fellow, IEEE IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 7, JULY 2016 1315 Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition Chanwoo Kim, Member, IEEE, and

More information

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Jong-Hwan Lee 1, Sang-Hoon Oh 2, and Soo-Young Lee 3 1 Brain Science Research Center and Department of Electrial

More information

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Sriram Ganapathy 1, Samuel Thomas 1 and Hynek Hermansky 1,2 1 Dept. of ECE, Johns Hopkins University, USA 2 Human Language Technology

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS Jürgen Freudenberger, Sebastian Stenzel, Benjamin Venditti

More information

Enhancement of Speech in Noisy Conditions

Enhancement of Speech in Noisy Conditions Enhancement of Speech in Noisy Conditions Anuprita P Pawar 1, Asst.Prof.Kirtimalini.B.Choudhari 2 PG Student, Dept. of Electronics and Telecommunication, AISSMS C.O.E., Pune University, India 1 Assistant

More information

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY INTER-NOISE 216 WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY Shumpei SAKAI 1 ; Tetsuro MURAKAMI 2 ; Naoto SAKATA 3 ; Hirohumi NAKAJIMA 4 ; Kazuhiro NAKADAI

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Using a Mixture-Maximum Model IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE

More information

IMPULSE NOISE CANCELLATION ON POWER LINES

IMPULSE NOISE CANCELLATION ON POWER LINES IMPULSE NOISE CANCELLATION ON POWER LINES D. T. H. FERNANDO d.fernando@jacobs-university.de Communications, Systems and Electronics School of Engineering and Science Jacobs University Bremen September

More information

Audio Imputation Using the Non-negative Hidden Markov Model

Audio Imputation Using the Non-negative Hidden Markov Model Audio Imputation Using the Non-negative Hidden Markov Model Jinyu Han 1,, Gautham J. Mysore 2, and Bryan Pardo 1 1 EECS Department, Northwestern University 2 Advanced Technology Labs, Adobe Systems Inc.

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Abstract This report presents a method to achieve acoustic echo canceling and noise suppression using microphone arrays. The method employs a digital self-calibrating microphone system. The on-site calibration

More information

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211 Adaptive Forward-Backward Quantizer for Low Bit Rate High Quality Speech Coding Jozsef Vass Yunxin Zhao y Xinhua Zhuang Department of Computer Engineering & Computer Science University of Missouri-Columbia

More information

CS 188: Artificial Intelligence Spring Speech in an Hour

CS 188: Artificial Intelligence Spring Speech in an Hour CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch

More information

Robust speech recognition using temporal masking and thresholding algorithm

Robust speech recognition using temporal masking and thresholding algorithm Robust speech recognition using temporal masking and thresholding algorithm Chanwoo Kim 1, Kean K. Chin 1, Michiel Bacchiani 1, Richard M. Stern 2 Google, Mountain View CA 9443 USA 1 Carnegie Mellon University,

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate

More information

Exploring QAM using LabView Simulation *

Exploring QAM using LabView Simulation * OpenStax-CNX module: m14499 1 Exploring QAM using LabView Simulation * Robert Kubichek This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 2.0 1 Exploring

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering

Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering Yun-Kyung Lee, o-young Jung, and Jeon Gue Par We propose a new bandpass filter (BPF)-based online channel normalization

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments 88 International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 88-87, December 008 Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Estimation of Non-stationary Noise Power Spectrum using DWT

Estimation of Non-stationary Noise Power Spectrum using DWT Estimation of Non-stationary Noise Power Spectrum using DWT Haripriya.R.P. Department of Electronics & Communication Engineering Mar Baselios College of Engineering & Technology, Kerala, India Lani Rachel

More information

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech

More information

Automatic Morse Code Recognition Under Low SNR

Automatic Morse Code Recognition Under Low SNR 2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) Automatic Morse Code Recognition Under Low SNR Xianyu Wanga, Qi Zhaob, Cheng Mac, * and Jianping

More information