DWT and LPC based feature extraction methods for isolated word recognition

Size: px
Start display at page:

Download "DWT and LPC based feature extraction methods for isolated word recognition"

Transcription

1 RESEARCH Open Access DWT and LPC based feature extraction methods for isolated word recognition Navnath S Nehe 1* and Raghunath S Holambe 2 Abstract In this article, new feature extraction methods, which utilize wavelet decomposition and reduced order linear predictive coding (LPC) coefficients, have been proposed for speech recognition. The coefficients have been derived from the speech frames decomposed using discrete wavelet transform. LPC coefficients derived from subband decomposition (abbreviated as WLPC) of speech frame provide better representation than modeling the frame directly. The WLPC coefficients have been further normalized in cepstrum domain to get new set of features denoted as wavelet subband cepstral mean normalized features. The proposed approaches provide effective (better recognition rate), efficient (reduced feature vector dimension), and noise robust features. The performance of these techniques have been evaluated on the TI-46 isolated word database and own created Marathi digits database in a white noise environment using the continuous density hidden Markov model. The experimental results also show the superiority of the proposed techniques over the conventional methods like linear predictive cepstral coefficients, Mel-frequency cepstral coefficients, spectral subtraction, and cepstral mean normalization in presence of additive white Gaussian noise. Keywords: feature extraction, linear predictive coding, discrete wavelet transform, cepstral mean normalization, hidden Markov model 1. Introduction A speech recognition system has two major components, namely, feature extraction and classification. Feature extraction method plays a vital role in speech recognition task. There are two dominant approaches of acoustic measurement. First is a temporal domain or parametric approach such as linear prediction [1], which is developed to closely match the resonant structure of humanvocaltractthatproduces the corresponding sound. Linear prediction coefficients (LPC) technique is not suitable for representing speech because it assumes signal stationary within a given frame and hence not analyze the localized events accurately. Also it is not able to capture the unvoiced and nasalized sounds properly [2]. Second approach is nonparametric frequency domain approach based on human auditory perception system and known as Mel-frequency cepstral coefficients (MFCC) [3]. The widespread use of the MFCCs is due * Correspondence: nsnehe@yahoo.com 1 Department of Instrumentation Engineering, Pravara Rural Engineering College, Loni , Maharashtra, India Full list of author information is available at the end of the article to its low computational complexity and better performance for ASR under clean matched conditions. Performance of MFCC degrades rapidly in presence of noise and degradation is directly proportional to signal-tonoise ratio (SNR). Poor performance of LPC and its different forms like reflection coefficients, linear prediction cepstral coefficients (LPCC) as well as MFCC and its various forms [4] in noisy conditions has led many researchers to investigate alternative robust feature extraction algorithms. In the literature, various techniques have been proposed to improve the performance of ASR systems in the presence of noise. Speech enhancement techniques such as spectral subtraction (SS) [5] or cepstrums from difference of power spectrum [6] reduce the effect of noise either using statistical information of noise or filtering the noise from noisy speech before feature extraction. Techniques like perceptual linear prediction [7] and relative spectra [8] incorporate some of the features of the human auditory mechanism and give noise robust ASR. Feature enhancement techniques like cepstral mean subtraction [9] and parallel model combination 2012 Nehe and Holambe; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

2 Page 2 of 7 [10] improve ASR performance by compensating for mismatch effects in cepstral domain features. In another approach [11-16] wavelet transform and wavelet packet tree have been used for speech feature extraction in which the energies of wavelet decomposed subbands have been used in place of Mel filtered subband energies. Because of its better energy compaction property [17], wavelet transform-based features give better recognition accuracy than LPC and MFCC. Mel filter-like admissible wavelet packet structure [14] performs better than MFCC in unvoiced phoneme recognition. Wavelet subband features proposed in [15] used normalized subband energies as features which show good performance in presence of additive white noise. However, in these wavelet-based approaches, the time information is lost due to use of wavelet subband energies. We used the actual wavelet coefficients proposed in [18], which preserve the time information, and also these features performed better than LPCC and MFCC due to the combined advantages of LPC and WT. LPC can better distinguish words having distinct vowel sounds [19] and WT can model the details of unvoiced sound portions of speech signal. However, the performance of these features is not well for the noisy speech recognition. We propose the modification in the features proposed in [18] to derive effective, efficient, and noise robust features from the frequency subbands of the frame. Each frame of speech signal is decomposed (uniformly/dyadic) into different frequency subbands using discrete wavelet transform (DWT) and each subband is further modeled using linear predictive coding (LPC). The WT has a better capability to model the details of unvoiced sound portions. Hence, the subband decomposition has been performed by means of DWT. DWT is more popular in the field of digital signal processing due to its multiresolution capability and it has the property of constant Q, which is one of the demands of many signal processing applications, especially in the processing of the speech signals (as human s hearing system is constant Q perceptional) [20]. Wavelet decomposition results in a logarithmic set of bandwidths, which is very similar to the response of human ear to frequencies (logarithmic fashion). The LPC coefficients derived from the speech subbands obtained after DWT decomposition provide WLPC features [18]. Further these features were normalized in cepstrum domain using well-known cepstrum mean normalization (CMN) technique to get the noise robust features. These new features are denoted as wavelet subband-based cepstral mean normalized features (WSCMN) which perform better in additive white noise environment. The performance of the proposed features is tested on TI-46 and Marathi digits database using continuous density hidden Markov model (CDHMM) as a classifier. The rest of the article is organized as follows. In Section 2, we describe a brief theory about DWT. The proposed WLPC feature extraction and its normalization are described in Section 3. The various experiments and recognition results are given in Section 4. Section 5 gives the concluding remarks based on the experimentation. 2. Discrete wavelet transform The speech is a nonstationary signal. The Fourier transform (FT) is not suitable for the analysis of such nonstationary signal because it provides only the frequency information of signal but does not provide the information about at what time which frequency is present. The windowed short-time FT (STFT) provides the temporal information about the frequency content of signal. A drawback of the STFT is its fixed time resolution due to fixed window length. The WT, with its flexible time-frequency window, is an appropriate tool for the analysis of nonstationary signals like speech which have both short high frequency bursts and long quasi-stationary components also. WT decomposes signals over translated and dilated mother wavelets. Mother wavelet is a time function with finite energy and fast decay. The different versions of the single wavelet are orthogonal to each other. The continuous wavelet transform (CWT) is given by Equation (1) where the function ψ(t), a, andb are called the (mother) wavelet, scaling factor, and translation parameter, respectively. W x (a, b) = 1 a ( ) t b x(t)ψ dt. (1) a As CWT is a function of two parameters, it contains high redundancy while analyzing the signals. Instead of this, analysis of the signal using small number of scales with varying number of translations at each scale, i.e., discretizing scale and translation parameters as a =2 j and b =2 j k gives DWT. DWT theory [20,21] requires two sets of related functions called scaling function and wavelet function given by φ(t) = N 1 and n=0 ψ(t) = N 1 n=0 h[n] 2φ(2t n) (2) g[n] 2φ(2t n), (3)

3 Page 3 of 7 where function j(t) is called scaling function, h[n] is an impulse response of a low-pass filter, and g[n] isan impulse response of a high-pass filter. The scaling and wavelet functions can be implemented effectively using a pair of filters, i.e., h[n] andg[n]. These filters are called a quadrature mirror filters that satisfy the property g[n] =(-1) 1-n h[1-n] [17]. The input signal is lowpass filtered to give the approximate components and high-pass filtered to give the detail components of the input speech signal. The approximate signal at each stage is further decomposed using same low-pass and high-pass filters to get the approximate and detail components for the next stage. This type of decomposition is called dyadic decomposition, whereas decomposition of detail signal along with the approximate signal at each stage is called uniform decomposition. Dyadic decomposition divides the input signal bandwidth into the logarithmic set of bandwidths, whereas the uniform decomposition divides it into the uniform set of bandwidths. In speech signal, high frequencies are present very briefly at the onset of a sound while lower frequencies are present latter for long period [21]. DWT resolves all these frequencies well. The DWT parameters contain the information of different frequency scales. This helps in getting the speech information of corresponding frequency band. In order to parameterize the speech signal, the signal is decomposed into four frequency bands uniformly or in dyadic fashion. 3. Proposed WLPC feature extraction Among the speech recognition approaches, the family based on LPC coefficient and their cepstrum (LPCC) is well known for its performance and relative simplicity. LPC are the coefficients of an auto-regressive model [2] of a speech frame. The all-pole representation of the vocal tract transfer function is as given below in the speech signal and also cannot analyze the localized events accurately which wavelet transform can analyze. However, LPC can better distinguish between the words that have distinct vowel sounds than those share common vowel sounds [19]. WT is able to model the details of unvoiced sound portion of speech than LPC [19]. Also subband signals (wavelet coefficients) obtained from the wavelet decomposition can preserve the time information [12] and LPC can be estimated from such time domain signals easily. So, we can apply LPC technique on each subband signal after the wavelet decomposition which gives the combined benefits of LPC and WT. Hence, the combination of LPC with WT has been proposed in this article. The LPCC features have been estimated from the subband signals obtained from the DWT in the proposed feature extraction technique. Figure 1 shows the block diagrams of proposed feature extraction systems. Three levels DWT decomposition of preprocessed and windowed speech frames has been done using Daubechies s wavelet filters. Actual wavelet coefficients retain the time information; hence, LPC features have been estimated from the DWT coefficients in time domain. LPC features of pth order have been extracted from each subband of wavelet decomposed speech signal. The schematic of this technique is shown in Figure 1 a. The LPC coefficients obtained from each subband are concatenated to form a final feature vector denoted as Dyadic wavelet decomposed LPC (DWLPC). Thus, the feature vector f i derived from frame i can be expressed as G H(z) = p 1 a i z i (4) i=1 where a p are the prediction coefficients and G is the gain. These LPC can be derived by minimizing the mean square error between the actual samples of speech frame and the estimated samples by autocorrelation method. LPCC were obtained directly using Equation (5) [2]. i 1 ( ) k i LPCC i = a i + LPCC i k a k (5) i k=1 where i = 1,2,...,p. The obtained LPC and LPCC features cannot capture the high frequency peaks present (a) (b) Figure 1 WLPC Feature extraction methods: (a) DWLPC; (b) UWLPC.

4 Page 4 of 7 f i =[a A3 a D3 a D2 a D1 ] T, (6) where, a A3 is a row vector formed using prediction coefficients obtained from the approximate components A 3 at third level and a Dj is row vector formed using prediction coefficients obtained from the detail components D j (j = 1,2,3) at jth level. T indicates a vector transpose. Figure 1b shows the schematic of uniform wavelet decomposed LPC (UWLPC) feature extraction from subbands of uniform bandwidth. The subbands are obtained by two-level wavelet packet decomposition [21]. Then, the UWLPC feature vector is formed similar to DWLPC by concatenation of LPC coefficients estimated from the uniformly decomposed subband signals WSCMN features CMN [9] is the simplest feature normalization technique to implement. It provides many of the benefits available in the more-advanced normalization algorithms. The LPCC cepstrums were derived using Equation (5) from the WLPC features estimated from the subband signals of each frame. Thus, a sequence of cepstral vectors {x 1, x 2,...,x T } is obtained from a speech sample. Further these cepstral vectors were normalized using CMN. In its basic form, CMN consists of subtracting the mean feature vector μ x from each vector x t and normalizing by variance s x to obtain the normalized vector ˆx t. ˆx t = x t μ x σ x (7) where μ x = 1 x t and σx 2 T = 1 T t T ( x 2 t μ 2 ) x This gives the proposed WSCMN feature vectors. Figure 2 shows the WSCMN feature extraction steps where U-WSCMN are the uniform decomposed WSCMN feature vectors and D-WSCMN are the dyadic decomposed WSCMN feature vectors. After normalization, the mean of the cepstral sequence is zero, and it has a variance of one. This normalization is also called as cepstral mean and variance normalization. The CMN makes the features robust to some linear filtering of the acoustic signal, which might be caused by microphones with different transfer t=1 (8) functions, varying distance from user to microphone, the room acoustics, or transmission channels [9]. 4. Experimental results This section evaluates the performance of the proposed techniques on isolated words in presence of stationary white noise using TI-46 and own created Marathi databases. 4.1 Databases The speech recognition experiments were conducted under clean and noisy conditions using the TI-46 and own created Marathi digit database. The TI-46 Speaker Dependent Isolated Word Corpus [22] has two datasets, namely, TI-20 and TI-ALPHA. The TI-20 vocabulary consists of ten English digits zero through nine and ten control words yes,no,erase,rubout,repeat,go, enter, help, stop, and start. The TI-ALPHA subset consists of a through z English alphabets. In both the subsets, data are collected from eight male and eight female speakers. There are 26 utterances of each word from each speaker out of which 10 were used as training tokens and remaining 16 were used as testing tokens. So, TI-20 subset has total 3200 training samples and 5120 test samples, whereas TI-ALPHA has 4160 training samples and 6656 test samples. All the data samples were digitized with sampling frequency 12.5 khz. For Marathi database, data were collected from 56 male and 44 female speakers in a quiet room and discretized with sampling frequency 10 khz. There are 20 utterances of each word from each speaker recorded in 2 different sessions at an interval of 1 week. In each session, ten utterances of each word from each speaker were recorded. For experiments, the samples recorded in first session were used for training and the samples recorded in second session were used for testing. Thus, this database has total 10,000 training samples and 10,000 test samples. Table 1 shows the English digits and their equivalent Marathi digit pronunciation. 4.2 Experimental setup The input speech samples are pre-emphasized by a firstorder filter with transfer function H(z) = z -1.The pre-emphasized speech data are divided into blocks of 25.6 ms duration with 50% overlap between every adjacent frame. The smooth frequency transitions are ensured using a Hamming window to each frame. Figure 2 WSCMN Feature extraction methods. Table 1 English and equivalent Marathi digit pronunciation Zero One Two Three Four Five Six Seven Eight Nine Shunya Ek Don Teen Char Paach Saha Sat Aath Nau

5 Page 5 of 7 Noisy test samples of each dataset (TI-20, TI-ALPHA, and Marathi Digits) were obtained by artificially adding stationary white Gaussian noise under a wide range of SNRs (0, 5, 10, 15, 20, and 30 db) into the test samples of each dataset. Tests were carried out on clean as well as noisy test samples. For training and testing, diagonal covariance left-right CDHMM [2] with 4-mixtures and 5-states (as this combination yields best performance) was used as a classifier. 4.3 Baseline experiment The baseline experiments were performed using LPCC and MFCC features on each database. First in the LPCC feature extraction, the prediction coefficients were extracted from each speech frame using 13th order LPC. From the obtained prediction coefficients, cepstral coefficients and its temporal derivatives (first and second derivatives) were extracted and concatenated to form a final LPCC feature vector (this gives feature dimension 39). In MFCC feature extraction process, the magnitude spectrum of windowed speech frame was filtered using a triangular Mel filter bank consisting of 20 Mel filters. From a set of 20 Mel-scaled log filter bank outputs, MFCC feature vector that consists of 13 MFCC and the corresponding delta and acceleration coefficients (total 39 coefficients) is extracted from each frame. The performance of LPCC and MFCC features was tested on each dataset under clean test condition and presented in Table 2. The recognition results obtained using MFCC features (under clean test condition) are comparable to the state-of-the-art recognition results presented in [23]. These results are used as a baseline for the comparison. We tested the performance of LPCC and MFCC features for different LPC orders and different number of Mel-filters in the triangular filter bank, respectively. It was observed that 13th-order LPC (p = 13), 20 Mel-filters in filter bank, and feature vector of length 39 (13 LPC/MFCC coefficients and their first and second derivatives) yield best performance on the databases. Hence, the results were obtained for these values of parameters. 4.4 WLPC features In this section, features were extracted using proposed techniques. In the first type, each speech frame was Table 2 Percentage recognition rate of LPCC and MFCC features on various datasets Dataset % Recognition rate LPCC MFCC TI TI-ALPHA Marathi Digits decomposed into subbands of logarithmic bandwidth by three level DWT and 32nd-order Daubechies s wavelet (the algorithms were tested for various orders and it is observed that 32nd order gives the best performance). Prediction coefficients with different LPC orders (varying from 3 to 7) were derived from the subbands. These prediction coefficients were then concatenated to form DWLPC feature vector. In the second type, each speech frame was decomposed into subbands of uniform bandwidth by two level wavelet packet transform. Then, the prediction coefficients were estimated from the subbands of uniform decomposition similar to first type and were concatenated to form UWLPC feature vector. In both the feature extraction types, we select LPC of order 5 (as it gives the best performance). Five prediction coefficients from each subband give feature vector of dimension 20. Performances of these features were tested using CDHMM with 4-mixtures and 5-states. For the comparison of performance based on the feature dimension, we also considered the 21 coefficients in LPCC and MFCC feature vectors (7 LPC/MFCC coefficients and their first and second derivatives). The performances of LPCC, MFCC, and WLPC (UWLPC/ DWLPC) features have been tested on TI-20 database and presented in Table 3. Percentage recognition rate using LPCC and WLPC (UWLPC/DWLPC) features for different LPC order were also estimated and presented in Figure 3. These results prove that the performance of WLPC (UWLPC/ DWLPC) is better than LPCC and MFCC features with half the feature vector length than LPCC and MFCC because the proposed features combine the advantage of identification capability of LPC for vowel and the wavelet s better modeling capability of unvoiced sound portions and high frequency picks of speech sound. Among the WLPC features, DWLPC is superior to UWLPC because the dyadic decomposition in DWLPC mimics the human auditory perception system better. The performance of MFCC and WLPC (UWLPC and DWLPC) features on TI-Alpha database has been presented in Table 4. Further, the robustness of the proposed features has been tested by normalizing the features using CMN. Table 3 Percentage recognition rates of different features on TI-20 database. Features Feature vector length % Recognition rate LPCC MFCC UWLPC DWLPC

6 Page 6 of 7 Figure 4 D-WSCMN performance for different LPC orders p on clean TI-20 database. Figure 3 Percentage recognition rate for different LPC orders using (a) LPCC features, (b) WLPC (UWLPC/DWLPC) features. The CMN is applied on the WLPC to get the noise robust WSCMN (D-WSCMN and U-WSCMN) features for the isolated word recognition. The performance of the D-WSCMN for different prediction orders (p) was tested on clean TI-20 database and is presented in Figure 4. From these results it is clear that the D-WSCMN yield better results for p = 5. The robustness of WSCMN features was tested on noisy samples generated by adding white Gaussian noise (of SNR 0, 5, 10, and 20 db) to the test samples of TI-20 dataset. The results of WSCMN features were compared with LPCC, MFCC, SS method [5], and CMN [9] features in Figure 5. WSCMN feature performance was also tested on clean as well as noisy Marathi digits database. The recognition performance of WSCMN using uniform and dyadic decomposition on this database is shown in Figure 6. It is observed that as compared to MFCC performance on clean data (84.50%), the performance of WSCMN features is significantly increased (100%) on this database. This is because the WSCMN technique is able to capture the difference between the Marathi phonemes more clearly than the MFCC and CMN. Also it gives better performance at various noise levels because of the cepstrum normalization. 5. Conclusions In this article, DWT and LPC-based techniques (UWLPC and DWLPC) for isolated word recognition Table 4 Performance of WLPC features on TI-Alpha database Features Feature vector length % Recognition rate MFCC UWLPC DWLPC have been presented. Experimental results show that the proposed WLPC (UWLPC and DWLPC) features are effective and efficient as compared to LPCC and MFCC because it takes the combined advantages of LPC and DWT while estimating the features. Feature vector dimension for WLPC is almost half of the LPCC and MFCC. This reduces the memory requirement and the computational time. It is also observed that the performance of DWLPC is better than UWLPC. This is because the dyadic (logarithmic) frequency % Recognition Rate LPCC 40 MFCC SS 20 CMN U-WSCMN D-WSCMN 0 Clean Noise in d B Figure 5 Percentage recognition rate of different features on TI-20 database in white noise environment. % Recognition Rate MFCC 40 CMN U-WSCMN D-WSCMN 20 Clean Noise in d B Figure 6 Performance of WSCMN features on Marathi digit database in white noise environment.

7 Page 7 of 7 decomposition mimics the human auditory perception system better than uniform frequency decomposition. WSCMN features are noise robust features because of normalization in cepstrum domain. It is observed that the proposed WSCMN features yield better performance as compared to the popular existing methods in presenceofwhitenoisebecausethistechniqueisableto capture the difference between the phonemes (especially in Marathi database) more clearly than the MFCC and CMN. It has also been proved experimentally that the proposed approaches provide effective (better recognition rate), efficient (reduced feature vector dimension), and robust features. Author details 1 Department of Instrumentation Engineering, Pravara Rural Engineering College, Loni , Maharashtra, India 2 S.G.G.S. Institute Engineering & Technology, Vishnupuri, Nanded, Maharashtra, India Competing interests The authors declare that they have no competing interests. 16. B Kotnik, Z Kačič, A comprehensive noise robust speech parameterization algorithm using wavelet packet decomposition-based denoising and speech feature representation techniques. EURASIP J Adv Signal Process. 1, 1 20 (2007) 17. S Mallat, A Wavelet Tour of Signal Processing (Academic, New York, 1998) 18. NS Nehe, RS Holambe, New feature extraction methods using DWT and LPC for isolated word recognition, in Proc of IEEE TENCON 2008, Hyderabad, India, 1 6 (2008) 19. M Krishnan, CP Neophytou, G Prescott, Wavelet transform speech recognition using vector quantization, dynamic time warping and artificial neural networks, in International Conference On Spoken Language Processing, Yokohama, Japan, (1994) 20. Y Hao, X Zhu, A new feature in speech recognition based on wavelet transform, in Proc IEEE 5th Inter Conf on Signal Processing (WCCC-ICSP 2000), vol. 3. Beijing, China, (21 25 August 2000) 21. KP Soman, KI Ramchandran, Insight into Wavelets from Theory to Practice, 2nd edn. (Prentice-Hall of India, New Delhi, 2005) 22. TI 46-Word Speaker-Dependent Isolated Word Corpus, NIST Speech Disc (1991) 23. DS Pallett, A benchmark for speaker-dependent recognition using the Texas Instruments 20 Word and Alpha-set speech database, in Proc of Speech Recognition Workshop, Bristol, UK, (1986) doi: / Cite this article as: Nehe and Holambe: DWT and LPC based feature extraction methods for isolated word recognition. EURASIP Journal on Audio, Speech, and Music Processing :7. Received: 21 January 2011 Accepted: 30 January 2012 Published: 30 January 2012 References 1. F Itakura, Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Proces. ASSP-23, (1975) 2. L Rabiner, BH Juang, Fundamentals of Speech Recognition (Prentice-Hall Inc., Englewood Cliffs, NJ, 1993) 3. SB Davis, P Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. ASSP-28(4), (1980) 4. K Wang, CH Lee, BH Juang, Selective feature extraction via signal decomposition. IEEE Signal Process Lett. 4, 8 11 (1997). doi: / SF Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process. 27, (1979). doi: / TASSP J Xu, G Wei, Noise-robust speech recognition based on difference of power spectrum. Electron Lett. 36(14), (2000). doi: /el: H Hermansky, Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am. 87(4), (1990). doi: / H Hermansky, N Morgan, RASTA processing of speech. IEEE Trans Speech Audio Process. 2, (1994). doi: / AE Rosenberg, CH Lee, FK Soong, Cepstral channel normalization techniques for hmm-based speaker verification, in Proc ICSLP, Yokohama, Japan, (1994) 10. MJF Gales, SJ Young, Robust speech recognition using parallel model combination. IEEE Trans Speech Audio Process. 4, (1996). doi: / Z Tufekci, JN Gowdy, Feature extraction using discrete wavelet transform for speech recognition, in IEEE International Conference Southeastcon 2000, Nashville, TN, USA, (April 2000) 12. M Gupta, A Gilbert, Robust speech recognition using wavelet coefficient features, in Proc IEEE workshop on Automatic Speech Recognition and Understanding (ASRU 01), Madonna di Campiglio, Trento, Italy, (December 2001) 13. JN Gowdy, Z Tufekci, Mel-scaled discrete wavelet coefficients for speech recognition, in Proc IEEE Inter Conf Acoustics, speech, and Signal Processing (ICASSP 00), vol. 3. Istanbul, Turkey, (June 2000) 14. O Farooq, S Datta, Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process Lett. 8(7), (2001). doi: / O Farooq, S Datta, Wavelet based robust sub-band features for phoneme recognition. IEE Vis Image Signal Process. 151(4), (2004) Submit your manuscript to a journal and benefit from: 7 Convenient online submission 7 Rigorous peer review 7 Immediate publication on acceptance 7 Open access: articles freely available online 7 High visibility within the field 7 Retaining the copyright to your article Submit your next manuscript at 7 springeropen.com

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR Syu-Siang Wang 1, Jeih-weih Hung, Yu Tsao 1 1 Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan Dept. of Electrical

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping 100 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping Naoya Wada, Shingo Yoshizawa, Noboru

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem Introduction to Wavelet Transform Chapter 7 Instructor: Hossein Pourghassem Introduction Most of the signals in practice, are TIME-DOMAIN signals in their raw format. It means that measured signal is a

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Modulation Spectrum Power-law Expansion for Robust Speech Recognition Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

Robust telephone speech recognition based on channel compensation

Robust telephone speech recognition based on channel compensation Pattern Recognition 32 (1999) 1061}1067 Robust telephone speech recognition based on channel compensation Jiqing Han*, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology,

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Using a Mixture-Maximum Model IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM Arvind Raman Kizhanatham, Nishant Chandra, Robert E. Yantorno Temple University/ECE Dept. 2 th & Norris Streets, Philadelphia,

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

VU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann

VU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann 052600 VU Signal and Image Processing Torsten Möller + Hrvoje Bogunović + Raphael Sahann torsten.moeller@univie.ac.at hrvoje.bogunovic@meduniwien.ac.at raphael.sahann@univie.ac.at vda.cs.univie.ac.at/teaching/sip/17s/

More information

A Real Time Noise-Robust Speech Recognition System

A Real Time Noise-Robust Speech Recognition System A Real Time Noise-Robust Speech Recognition System 7 A Real Time Noise-Robust Speech Recognition System Naoya Wada, Shingo Yoshizawa, and Yoshikazu Miyanaga, Non-members ABSTRACT This paper introduces

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Analysis of LMS Algorithm in Wavelet Domain

Analysis of LMS Algorithm in Wavelet Domain Conference on Advances in Communication and Control Systems 2013 (CAC2S 2013) Analysis of LMS Algorithm in Wavelet Domain Pankaj Goel l, ECE Department, Birla Institute of Technology Ranchi, Jharkhand,

More information

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. Department of Signal Theory and Communications. c/ Gran Capitán s/n, Campus Nord, Edificio D5

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. Department of Signal Theory and Communications. c/ Gran Capitán s/n, Campus Nord, Edificio D5 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING Javier Hernando Department of Signal Theory and Communications Polytechnical University of Catalonia c/ Gran Capitán s/n, Campus Nord, Edificio D5 08034

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

Fundamental frequency estimation of speech signals using MUSIC algorithm

Fundamental frequency estimation of speech signals using MUSIC algorithm Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,

More information

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT) 5//0 EE6B: VLSI Signal Processing Wavelets Prof. Dejan Marković ee6b@gmail.com Shortcomings of the Fourier Transform (FT) FT gives information about the spectral content of the signal but loses all time

More information

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Aadel Alatwi, Stephen So, Kuldip K. Paliwal Signal Processing Laboratory Griffith University, Brisbane, QLD, 4111,

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Original Research Articles

Original Research Articles Original Research Articles Researchers A.K.M Fazlul Haque Department of Electronics and Telecommunication Engineering Daffodil International University Emailakmfhaque@daffodilvarsity.edu.bd FFT and Wavelet-Based

More information

World Journal of Engineering Research and Technology WJERT

World Journal of Engineering Research and Technology WJERT wjert, 017, Vol. 3, Issue 4, 406-413 Original Article ISSN 454-695X WJERT www.wjert.org SJIF Impact Factor: 4.36 DENOISING OF 1-D SIGNAL USING DISCRETE WAVELET TRANSFORMS Dr. Anil Kumar* Associate Professor,

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Separating Voiced Segments from Music File using MFCC, ZCR and GMM Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

Two-Feature Voiced/Unvoiced Classifier Using Wavelet Transform

Two-Feature Voiced/Unvoiced Classifier Using Wavelet Transform 8 The Open Electrical and Electronic Engineering Journal, 2008, 2, 8-13 Two-Feature Voiced/Unvoiced Classifier Using Wavelet Transform A.E. Mahdi* and E. Jafer Open Access Department of Electronic and

More information

Wavelet Transform Based Islanding Characterization Method for Distributed Generation

Wavelet Transform Based Islanding Characterization Method for Distributed Generation Fourth LACCEI International Latin American and Caribbean Conference for Engineering and Technology (LACCET 6) Wavelet Transform Based Islanding Characterization Method for Distributed Generation O. A.

More information

Detection, localization, and classification of power quality disturbances using discrete wavelet transform technique

Detection, localization, and classification of power quality disturbances using discrete wavelet transform technique From the SelectedWorks of Tarek Ibrahim ElShennawy 2003 Detection, localization, and classification of power quality disturbances using discrete wavelet transform technique Tarek Ibrahim ElShennawy, Dr.

More information

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999 Wavelet Transform From C. Valens article, A Really Friendly Guide to Wavelets, 1999 Fourier theory: a signal can be expressed as the sum of a series of sines and cosines. The big disadvantage of a Fourier

More information

Gammatone Cepstral Coefficient for Speaker Identification

Gammatone Cepstral Coefficient for Speaker Identification Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia

More information

HTTP Compression for 1-D signal based on Multiresolution Analysis and Run length Encoding

HTTP Compression for 1-D signal based on Multiresolution Analysis and Run length Encoding 0 International Conference on Information and Electronics Engineering IPCSIT vol.6 (0) (0) IACSIT Press, Singapore HTTP for -D signal based on Multiresolution Analysis and Run length Encoding Raneet Kumar

More information

MOST MODERN automatic speech recognition (ASR)

MOST MODERN automatic speech recognition (ASR) IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 5, SEPTEMBER 1997 451 A Model of Dynamic Auditory Perception and Its Application to Robust Word Recognition Brian Strope and Abeer Alwan, Member,

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM DR. D.C. DHUBKARYA AND SONAM DUBEY 2 Email at: sonamdubey2000@gmail.com, Electronic and communication department Bundelkhand

More information

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System C.GANESH BABU 1, Dr.P..T.VANATHI 2 R.RAMACHANDRAN 3, M.SENTHIL RAJAA 3, R.VENGATESH 3 1 Research Scholar (PSGCT)

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

Nonlinear Filtering in ECG Signal Denoising

Nonlinear Filtering in ECG Signal Denoising Acta Universitatis Sapientiae Electrical and Mechanical Engineering, 2 (2) 36-45 Nonlinear Filtering in ECG Signal Denoising Zoltán GERMÁN-SALLÓ Department of Electrical Engineering, Faculty of Engineering,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Time-Frequency Distributions for Automatic Speech Recognition

Time-Frequency Distributions for Automatic Speech Recognition 196 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001 Time-Frequency Distributions for Automatic Speech Recognition Alexandros Potamianos, Member, IEEE, and Petros Maragos, Fellow,

More information

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Sriram Ganapathy 1, Samuel Thomas 1 and Hynek Hermansky 1,2 1 Dept. of ECE, Johns Hopkins University, USA 2 Human Language Technology

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

IDENTIFICATION OF TRANSIENT SPEECH USING WAVELET TRANSFORMS. Daniel Motlotle Rasetshwane. BS, University of Pittsburgh, 2002

IDENTIFICATION OF TRANSIENT SPEECH USING WAVELET TRANSFORMS. Daniel Motlotle Rasetshwane. BS, University of Pittsburgh, 2002 IDENTIFICATION OF TRANSIENT SPEECH USING WAVELET TRANSFORMS by Daniel Motlotle Rasetshwane BS, University of Pittsburgh, 2002 Submitted to the Graduate Faculty of School of Engineering in partial fulfillment

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Image Denoising Using Complex Framelets

Image Denoising Using Complex Framelets Image Denoising Using Complex Framelets 1 N. Gayathri, 2 A. Hazarathaiah. 1 PG Student, Dept. of ECE, S V Engineering College for Women, AP, India. 2 Professor & Head, Dept. of ECE, S V Engineering College

More information

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Chanwoo Kim 1 and Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies

More information

A Wavelet Based Approach for Speaker Identification from Degraded Speech

A Wavelet Based Approach for Speaker Identification from Degraded Speech International Journal of Communication Networks and Information Security (IJCNIS) Vol., No. 3, December A Wavelet Based Approach for Speaker Identification from Degraded Speech A. Shafik, S. M. Elhalafawy,

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Wavelet-based Voice Morphing

Wavelet-based Voice Morphing Wavelet-based Voice orphing ORPHANIDOU C., Oxford Centre for Industrial and Applied athematics athematical Institute, University of Oxford Oxford OX1 3LB, UK orphanid@maths.ox.ac.u OROZ I.. Oxford Centre

More information

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM MAY 21 ABSTRACT Although automatic speech recognition systems have dramatically improved in recent decades,

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION

LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION 1 HSIN-JU HSIEH, 2 HAO-TENG FAN, 3 JEIH-WEIH HUNG 1,2,3 Dept of Electrical Engineering,

More information

WAVELET SIGNAL AND IMAGE DENOISING

WAVELET SIGNAL AND IMAGE DENOISING WAVELET SIGNAL AND IMAGE DENOISING E. Hošťálková, A. Procházka Institute of Chemical Technology Department of Computing and Control Engineering Abstract The paper deals with the use of wavelet transform

More information

Fourier and Wavelets

Fourier and Wavelets Fourier and Wavelets Why do we need a Transform? Fourier Transform and the short term Fourier (STFT) Heisenberg Uncertainty Principle The continues Wavelet Transform Discrete Wavelet Transform Wavelets

More information

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Evaluation of Audio Compression Artifacts M. Herrera Martinez Evaluation of Audio Compression Artifacts M. Herrera Martinez This paper deals with subjective evaluation of audio-coding systems. From this evaluation, it is found that, depending on the type of signal

More information

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999 Wavelet Transform From C. Valens article, A Really Friendly Guide to Wavelets, 1999 Fourier theory: a signal can be expressed as the sum of a, possibly infinite, series of sines and cosines. This sum is

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information