Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

Size: px
Start display at page:

Download "Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping"

Transcription

1 100 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping Naoya Wada, Shingo Yoshizawa, Noboru Hayasaka, and Yoshikazu Miyanaga, Non-members ABSTRACT This paper explores the extraction of speech features aiming noise robustness for speech recognition. The sources of actual noise may be not only a continuous noise but also a burst noise and a mixture of them. We present both noise robust techniques against continuous noise and burst noise. The algorithm for compensating continuous noise influences is based on RSF (Running Spectrum Filtering) and DRA (Dynamic Range Adjustment). RSF focuses on modulation spectrum and extracts speech components with FIR filtering. DRA normalizes maximum amplitudes of cepstrum. The compensation for burst noise influences is performed by burst noise skipping algorithm. It skips burst noise periods and links speeches before and after those periods. The identification of burst noise periods is performed by using speech estimation with Auto-Regressive Moving- Average (ARMA) model. The experimental results on isolated word speech recognition showed the effectiveness and high noise robustness of the proposed methods. Keywords: Noise Robustness, Speech Recognition, Burst Noise, Modulation Spectrum. 1. INTRODUCTION Speech recognition systems have been widely explored as one of the important human interfaces. Nowadays it is implemented to various applications such as car navigation systems, mobile terminal units, robots, etc.. Taking the fact that their systems are used in various real environments into consideration, Noise robustness is considerably required. Noises can be classified to two types of noise. One is the continuous noise. It is added to the whole speech and does not change so radically. The other is the burst noise such as a shut door and a channel impulse. It is characterized by the large occasional burst of energy. Continuous noise can be easily estimated comparing to burst noise and it has led various 04PSI09: Manuscript received on December 31, 2004 ; revised on August 26, The authors are with the Department of Graduate School of Engineering, Hokkaido University, Chuo-ku Kita 14 Jyo Nishi 9 Chome, Sapporo-shi, Hokkaido , Japan [wada, yosizawa, hayasaka]@csm.ist.hokudai.ac.jp, miya@ist.hokudai.ac.jp noise robust methods such as noise-robust LPC analysis [1],[2], Hidden Markov Model (HMM) decomposition and composition [3],[4], and the extraction of dynamic cepstrum, [5] etc.. Besides such research activities, spectral subtraction (SS) [6] has been the most known method and widely used to improve noise robustness. However, in real environment, burst noise can be combined with continuous noise and added into input speech. Nevertheless it seriously degrades the recognition accuracy, the robust analysis against it is not enough because the prediction of it is difficult. In this paper, we explore the robust speech feature extraction for recognition of speeches including multiple noises of continuous noise and burst noise, and propose new speech recognition techniques. The noise robust techniques for continuous noises are based on our proposing speech feature extraction using RSF and DRA [7],[8]. RSF focuses on the modulation spectrum obtained from the time trajectory of spectrum and extracts speech components by applying band-pass filtering. We employ FIR filtering as Kanadera et al. [9] for the stability and the accuracy. Furthermore, RSF applies filtering twice before logprocess and eliminates both multiplicative noise and additive noise. DRA normalizes the maximum amplitudes of feature parameters and corrects the differences of dynamic ranges between that of trained data and observed speech data. It is reported [10] that normalization of cepstral dynamic range is more suitable for combining with RSF than other normalizing method such as Cepstral Mean Normalization and Cepstral Variance Normalization. The noise robust technique for burst noise is skipping burst noise and links the input speeches before and after it. It is indeed that some speech components are lost when burst noise is included in utterances. However, the prediction of burst noise is so difficult that the extraction of speech components from burst noise periods should leave not a little amount of noise components when a large amount of burst noise is considered, and it should deteriorate recognition accuracy. Therefore, the skipping burst noise causes less influences than it because periods of burst noise are much shorter than speech periods. We utilize the estimation of speeches with ARMA models in order to identify burst noise periods. The former part of this paper presents robust speech feature extraction using RSF/DRA and burst

2 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping 101 noise skipping. The latter part estimates noise robustness of each method with isolated word speech recognition experiments using HMM. 2. ROBUST ANALYSIS FOR STATIONARY NOISES 2. 1 Running Spectrum Filtering (RSF) RSF focuses on modulation spectrum which shows the characteristics of time trajectory on each frame. Modulation spectrum is obtained as follows and fig. 1 illustrates it. Short-time speech characteristics in frequency domain are obtained by applying windowing and Fourier Transform to speech waveform in time domain. Therefore, the time trajectory in specific frequency is obtained by tracing its values in each time. The time trajectory of value in frequency domain is the running spectrum, and what is obtained from its frequency analysis is the modulation spectrum. It has been reported [11] that speech components in modulation frequency domain are dominant around 4Hz and out of the range from 1Hz up to 12Hz can be regarded as noise and unnecessary components. Although RASTA (RelAtive SpecTrA) is a wellknownmethodfocusingonthemodulationspectrum, primary RASTA employs IIR filtering and it may cause some problems such as phase distortion and instability of filtering. RSF applies FIR bandpass filtering to the modulation spectrum in order to avoid such difficulties and remove noise components. However, RSF needs high-order FIR filters (240-tap filters in this paper) to realize sharp modulation frequency cut off. In addition, such high order of FIR filters causes many delay boxes and needs to a long delay to calculate an output. Therefore, to realize applying FIR filtering to whole speech, non-speech frames which have enough length to obtain enough filtering orders should be included in the front and the back of speech frames. However, its time length becomes about 2800ms when the sampling frequency is 11025Hz it is not practical. In the RSF, several non-speech frames are put into the front and the back of speech frames in a certain length so that enough filtering orders are obtained. Thus RSF realizes effective feature extraction and can be applied in practical speech recognition system. The process of RSF is as follows. Noisy speech signal y(t) is converted to frequency domain by FFT as y(t) = h(t) (x(t)+a(t)) (1) Y (f) = H(f)X(f)+H(f)A(f) (2) where x(t) denotes the signal component, h(t) denotes the system noise and a(t) denotes the environmental noise. In (5), H(f)A(f) is additive noise component and the time trajectory of its spectrum is slower than that of speech component. Therefore, it The process for obtaining modulation spec- Fig.1: trum. Frequency FFT Running Spectrum FFT on each frequency Frequency Modulation Spectrum Modulation frequency can be removed with low-pass filtering on time spectrum domain. Then the logarithmic power spectrum without the additive noise component is written as log Y (f) = log H(f)X(f) = log X(f) + log H(f), (3) and this system noise component H(f) can be removed by applying band-pass filtering to the time trajectory of logarithmic power spectrum Dynamic Range Adjustment (DRA) on Cepstrum One of the other causes of noise corruption is derived from the differences in the dynamic ranges of cepstrum. The dynamic range of cepstrum indicates the difference between maximum and minimum of cepstral values in each order. Both the peaks of cepstrum, maxima and minima show the important characteristics of speech. However, as shown in Fig.2(a), the cepstral amplitude of peak are reduced comparing to the amplitude of noise free speech and characteristics are degraded. Considering that speech recognition is a kind of pattern matching, these differences can be compensated by normalizing both amplitudes of clean speech and noisy speech. DRA adjusts these various dynamic ranges by normalizing the amplitude of speech features. In the DRA, each coefficient of a speech feature vector is adjusted in proportion to its maximum

3 102 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Baseline MFCC With RSF Cepstral Value Cepstral Value Without DRA (a) (c) Cepstral Value Cepstral Value (b) With DRA (d) Fig.2: A comparison of trajectories of the 1st order cepstra among baseline MFCC and MFCC after RSF. The solid lines show cepstrum of clean speech and the dash lines show one of noisy speech (runningcar noise, 0dB SNR). The sample speech is Kitami. Used analysis methods are as follows; (a): Baseline MFCC (b): MFCC after DRA (c): MFCC after RSF (d): MFCC after RSF and DRA. amplitudes as f i (t) =f i (t)/ max f j(t) j=1,,m (i =1,,m), (4) where f i (t) denotes an element of the feature vector, m denotes the dimension and t denotes the frame number. Using (1), all coefficients are adjusted into the range from -1 to 1. Using RSF influences of the differences in the spectral fine structure are eliminated as shown in Fig.2 (c). This process removes unnecessary parts of speeches for recognition such as characteristics of speakers and noise influences. Then, using DRA, the difference of cepstral dynamic range is adjusted as shown in Fig.2 (d) and the cepstrum from noisy speech is adjusted to the one from clean speech. 3. BURST-NOISE SKIPPING ALGORITHM One thought for removing burst noise is skipping speech frames including it and link frames before and after it. This manner removes some speech components. However, the influence of burst noise is greater than that of lack of speech frames. Furthermore, HMM is flexible for time variances of speeches and the lack of some speech frames can be compensated. Determining a criterion is one of the most important factor in order to identify burst noise. It is an idea to use parameters of observed speech such as variancesduringshortperiodsasacriterion. However, it depends on the level of input speech and the estimation becomes difficult when the level of input speech and burst noise are not so much. We employ adaptive estimation with ARMA and the short-time variances of the estimation error as a criterion to identify burst noise. If the ARMA process assumes the stationary characteristics of speech in a certain short period, the estimation of ARMA process becomes difficult when non-stationary (burst) noise is added. It means that the estimation of speech works well and estimation error is little during the period where burst noise is not included. However, estimation error in ARMA process increases when non-stationary noise is included. It enables to identify burst noise periods. The algorithm to estimate input speech is as follows. We assume that observed speech signal y k can be written using speech generation process following ARMA model as, y k = n a i (k)y k i + m b j (k)u k j i=1 j=1 (5) +u k + n k u k = u p k + uw k where k is a time index, a i (i=1,c,n) are AR parameter and b j (j=1,c,m) are MA parameters. Input signal u k is a mixture of periodic pulse signal u p k which denotes a voiced sound and zero-mean white noise u w k with variance σu,k 2 which denotes unvoiced sound. Using (5), the estimation signal ŷ k and prediction signal ŷ k/k 1 are defined as ŷ k = n â i (k)y k i + m ˆbj (k)û k j +û k i=1 j=1 ŷ k k 1 = n â i (k 1)y k i + m ˆbj (k 1)û k j. i=1 j=1 Since it is assumed that û k cannot be observed at k 1, û k cannot be used in the prediction of signal. We define the estimated parameter vector ˆp(k), and vector ĥ(k) as follows: â 1 (k). ˆp(k) = â n (k) ˆb1 (k). ˆbm (k) ŷ k 1.. ĥ(k) = ŷ k n û k 1.. û k m (7) (8) (6)

4 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping 103 (a) Original Speech Waveform (b) Waveform with Burst Noise Speech Signal Noise Skipping Algorithm Fourier Transform ABS Low-pass Filtering (First RSF Process) (c) Variance of Input Speech (d) Criterion Using ARMA Model (e) Waveform After Noise Skipping with (d) Fig.3: Sample speech waveforms before and after the noise skipping and criterions. The sample speech is Hachinohe. Masked areas denote periods of burst noise added to the original speech. The added burst noise is temporal white noises at 0dB SNR (calculated from the total of the energy of temporal noises). Using (7) and (8), ŷ k and ŷ k k 1 are rewritten as, ŷ k = ĥ k ˆp(k)+û k ŷ k k 1 = ĥ k ˆp(k 1) (9) where represents the transpose. We now introduce the least squares criterion in order to estimate ARMA parameters and estimation error: V k = k ρ(i, k)(y i ŷ i ) 2 + i=1 ρ(1,k)ˆp (k)f 1 1 ˆp(k), (10) where F 1 is an arbitrary real symmetric positive definite matrix and ρ(i, k) are weighting coefficients given by k 1 λ ρ(i, k) = j (i =1, 2,,k 1) (11) j=i 1 (i = k, k +1, ). The second term in (10) is used to initialize V k.generally, F 1 1 is a symmetry matrix, and the amount Mel Filterbank Analysis Log Band-pass Filtering (Second RSF Process) Inverse Fourier Transform Delta Cepstrum Dynamic Range Adjustment (DRA) Speech Feature Vector Fig.4: Analysis method with DRA/RSF and noise skipping algorithm. of its components are quite small. If k increases, the first term becomes larger, and ρ(1,k) becomes smaller so that the second term becomes negligible. When λ j = 1and0<λj < 1, this criterion progressively decreases the weight of previous estimation errors. Using (5)-(9), the estimated parameter vector that minimizes the criterion can be obtained with the following equations: ˆp(k) = ˆp(k 1) + F k ĥ(k){λ k 1 + F k+1 ĥ (k)f k ĥ(k)} 1 v(k) (12) = λ 1 k 1 {F k F k ĥ[λ k 1 + ĥ (k)f k ĥ(k)] 1 ĥ (k)f k } (13) ν(k) = y k ŷ k k 1 = y k ĥ (k)ˆp(k 1), (14) where ν(k) is estimation error. Using the short-time variance of it as a criterion, noise skipping algorithm is performed. In order to suppress the influence on estimation error which is caused by continuous noise, the mean of the first few variances is subtracted from ones in each time index same as Spectral Subtraction. Fig. 3 shows sample speech waveforms before and after the noise skipping and criterions for it. When the peak of the power spectrum of burst noise is not so much, the short-time variances during burst noise periods are almost same as the ones of speech periods

5 104 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Table 1: The condition of speech recognition experiments. Recognition Task Isolated 100 words vocabulary Speech Data 100 Japanese place names from JEIDA Sampling kHz, 16-bit Window Length 23.2ms (256 points) Frame Period 11.6ms (128 points) Window Function Hanning window Pre-emphasis z 1 38th order, based on MFCC (12-dimensional MFCC, Baseline Speech 12-dimensional delta MFCC, Feature Vector 12-dimensional delta-delta MFCC, delta log-energy, delta-delta log-energy) Acoustic Model 32-states continuous word HMMs Training Set 40 male speakers, 3 utterances each Tested Set Speaker-independent, 5 male speakers, 2 utterances each and it is difficult to identify burst noise periods using it. However, the values of proposed criterion during speech periods are eliminated by ARMA estimation and it makes the identification of burst noise periods easier. With proposed methods, speech analysis is refined as shown in Fig. 4. At first, a speech without burst noise is obtained with noise skipping algorithm. Then FIR filtering is applied twice to the obtained speech as RSF processes and jitter influences are removed. Finally, the obtained cepstrum is normalized by DRA and the robust speech feature vector is obtained. 4. EVALUATION EXPERIMENTS In order to evaluate the noise robustness of the proposed techniques, isolated word speech recognition has been examined. At first, with speech recognition experiments versus continuous noises, we compare the performances of three speech feature extraction method; ordinary MFCC, MFCC after spectral subtraction and MFCC after RSF and DRA. The task is the recognition of 100 Japanese word speeches including additive noise. Any of white noise, speech babble noise or High-frequency radio channel noise is applied to tested speeches, and the SNR is at 10 db. The database of continuous noises is obtained from NoiseX ( noise.html). The conventional recognition system consists of ordinary MFCC feature extraction and with HMMs. The training database consists of three utterances of 100 isolated words spoken by 40 male speakers, and the test data for recognition consists of two utterances of same 100 words spoken by unspecific five male speakers. The recognitions part is implemented using the MATLAB software. The whole database is Japanese common voice data Chimei (means the names of places) provided from the Japan Electric Industry Development Association. Other conditions are described in Table. 1. Recognition results are shown in Table 2. The result versus continuous noise confirms that combined method of RSF and DRA shows better robust performance for various noises. Although spectral subtraction shows better performance for white noise, it degrades recognition accuracy rather than improves in speech babble noise. It can be thought that this result is derived from the fluctuation in continuous noise. Noises in real environment include more fluctuation than artificial white noise, and it causes musical noise or the distortion of spectrum. Secondly, we evaluate the performance of the proposed criterion. We prepare two criterions, shorttime variances of speech waveform and the proposed short-time variances of estimation errors in speech estimation with ARMA model and estimated burst noise periods of speeches using each criterion. Then the estimation accuracies of burst noise periods are compared. An obtained period is regarded as a correct one when both differences between the obtained start/end point and the known start/end point are below 10 msec. Added noise is only burst noise or combined noise of burst noise and continuous noise (either of last three noises at 10 db SNR). We assume the applied burst noise as the occasional large burst of white noise which occupies 20% of the frames of the original speech. The length of burst noise is obtained from the Gaussian distribution whose mean is 70 (msec). The SNR versus burst noise is at 0 db or -10 db (calculated from the total energy of burst noises) and the SNR versus continuous noise is at 10 db. Both burst noises whose SNR are at 0 db and -10 db are added at the same periods if the original speeches are same. Other analysis conditions are same as last experiment. The result is shown in Table 3 and 4. Even using short-time variances of speech waveform as a criterion, correct periods are obtained almost perfectly when the level of burst noise is higher. However, no correct periods are obtained when the level is lower. On the other hand, using the proposed criterion, the estimation accuracy is improved and 89.8 % of accuracy rate is obtained when the SNR versus burst noise is 0 db. When continuous noise is combined, the accuracy rate is degraded. However, how much it is degraded depends on noise varieties. The accuracy rate is only 56.6 % when continuous noise is white noise and 88.9 % when continuous noise is speech babble. It can be thought that babble noise is regarded as a speech component in the ARMA estimation process and does not influence on the estimation of burst noise periods. This result shows correct periods can

6 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping 105 Table 2: Recognition rates versus several types of continuous noise. SNR is at 10dB. Noise Varieties White Speech HF- Noise Speech Feature Noise Babble channel Free Conventional SS RSF/DRA Table 3: Accuracy rates of estimation of burst noise periods versus several types of continuous noise. SNR versus burst noise is at 0dB. Noise Burst Burst + Burst + Burst Varieties Noise White Speech +HF- Criterion Only Noise Babble channel Short-time variance ARMA Estimation Error Table 4: Accuracy rates of estimation of burst noise periods versus several types of continuous noise. SNR versus burst noise is at -10dB. Noise Burst Burst + Burst + Burst Varieties Noise White Speech +HF- Criterion Only Noise Babble channel Short-time variance ARMA Estimation Error be obtained even when the peak energy level of burst noiseisalmostsameastheoneofspeechasshownin Fig. 3, even though it is not fully practical for single use in that case. Then the performance of noise skipping algorithm is estimated with speech recognition experiments versus burst noise. We prepared following seven methods against burst noise and estimates the performance of proposed method using noise skipping and RSF/DRA: (a) No skipping, no specific processing. (b) Masking burst noise periods with blanks of speech. Burst noise periods are estimated using short-time variances of observed speech. (c) Masking burst noise periods with blanks of speech. Burst noise periods are estimated using short-time variances of estimation errors in ARMA estimation. (d) Masking known burst noise periods with blanks of speech (periods are given in advance manually). (e) Skipping of burst noise periods which are estimated using short-time variances of observed speech. (f) Skipping of burst noise periods which are estimated using short-time variances of estimation errors in speech estimation with ARMA model. (g) Skipping of known burst noise periods (periods are given in advance manually). Only burst noise or combined noise of burst noise and continuous noise (either of last three noises at 10dB SNR) is applied to tested speeches. Recognition results are shown in Table 5 and 6.The result versus burst noise shows following things. At first, Burst noise seriously degrades recognition accuracy. When only burst noise is applied, the recognition rates obtained from ordinary MFCC are only 8.4% (-10 db) and 38.0% (0 db) while about 99% of accuracy is obtained in noise-free environment. Even when RSF/DRA is applied, the recognition rates are still 36.3% (-10 db) and 57.4% (0 db), much less combined noises of burst noise and continuous noise. However, secondary, the presented noise skipping quite improves recognition accuracy while the improvement by masking noise with blank is not enough. Furthermore, especially when the SNR versus burst noise is at 0 db, the better performance is obtained by adopting variances of the estimation error in ARMA process as a criterion, rather than variances of the observed speech. Thirdly, when burst noise skipping is applied, the recognition accuracy of -10 db becomes better than that of 0dB. It is because the estimation of burst noise periods is difficult when theenergylevelofappliedburstnoiseislow,asshown in last experiment. However, even in that case, the proposed burst noise skipping technique makes benefits for speech recognition accuracy. Lastly, comparing the performances obtained from both noise skipping manners with proposed method and ideal periods, the difference of recognition accuracy is not so much when only burst noise is applied. However, the difference increases when continuous noise is also applied, even though the accuracy rate in estimation of burst noise periods are almost 100 % in last experiment. It means that remaining estimation errors and fatal and minute estimation errors less than 10 msec may be caused by included continuous noise and degrade recognition accuracy. Some improvements may be required in order to suppress the influence of continuous noise entirely. 5. CONCLUSION In this paper, the suppression of combined noise of continuous noise and burst noise is explored and new speech feature extraction techniques are proposed. In order to suppress the influence of continuous noise, a combined method RSF and DRA are presented. RSF emphasizes modulation frequency bands of speech by applying the FIR filtering. DRA normalizes the maximum amplitudes of the cepstrum. In order to sup-

7 106 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Table 5: Recognition rates versus mixtures of several types of continuous noise and burst noise. SNR versus burst noise is at 0dB. Noise Varieties Burst Noise White Noise Speech Babble HF Channel Noise Skipping Speech Feature Only + Burst Noise + Burst Noise + Burst Noise Ordinary MFCC (a): No Skipping SS RSF/DRA (b): Masking with Ordinary MFCC Blanks Using SS Short-time Variances RSF/DRA (c): Masking with Ordinary MFCC Blanks Using SS ARMA Estimation RSF/DRA (d): Masking with Ordinary MFCC Blanks Using SS Ideal Periods RSF/DRA (e): Burst Noise Ordinary MFCC Skipping Using SS Short-time Variances RSF/DRA (f): Burst Noise Ordinary MFCC Skipping Using SS ARMA Estimation RSF/DRA (g): Burst Noise Ordinary MFCC Skipping Using SS Ideal Periods RSF/DRA Table 6: Recognition rates versus mixtures of several types of continuous noise and burst noise. SNR versus burst noise is at 50dB. Noise Varieties Burst Noise White Noise Speech Babble HF Channel Noise Skipping Speech Feature Only + Burst Noise + Burst Noise + Burst Noise Ordinary MFCC (a): No Skipping SS RSF/DRA (b): Masking with Ordinary MFCC Blanks Using SS Short-time Variances RSF/DRA (c): Masking with Ordinary MFCC Blanks Using SS ARMA Estimation RSF/DRA (d): Masking with Ordinary MFCC Blanks Using SS Ideal Periods RSF/DRA (e): Burst Noise Ordinary MFCC Skipping Using SS Short-time Variances RSF/DRA (f): Burst Noise Ordinary MFCC Skipping Using SS ARMA Estimation RSF/DRA (g): Burst Noise Ordinary MFCC Skipping Using SS Ideal Periods RSF/DRA

8 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping 107 press the influence of burst noise, noise skipping algorithm using ARMA analysis is presented. The effectiveness is estimated in speech recognition experiments and the application of both techniques shows the best performance. This result indicates that the combined method of them has the best performance and flexibility for various environments, even in a combined noise of continuous noise and burst noise. References [1] Tierney J., A study of LPC analysis of speech in additive noise, IEEE Trans. on Acoust., Speech, and Signal Process., vol. ASSP-28, no.4 p.p , Aug [2] Kay S.M., Noise compensation for autoregressive spectral estimation, IEEE Trans. on Acoust., Speech, and Signal Process., vol. ASSP-28, no.3 p.p , March [3] Varga A. and Moore R., Hidden Markov Model Decomposition of Speech and Noise, Proc. IEEE ICASSP p.p , [4] Gales M.J.F. and Young S.J., Cepstral parameter compensation for HMM recognition in noise, Speech Communication, vol.12, no.3, p.p , [5] Aikawa K. and Saito T., Noise robustness evaluation on speech recognition using a dynamic cepstrum, IEICE Technical Report, SP94-14, p.p. 1-8, June [6] Boll S., Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. ASSP, vol. ASSP-27, no.2, p.p , [7] Hayasaka N., Miyanaga Y. and Wada N., Running spectrum filtering in speech recognition, SCIS Signal Processing and Communications with Soft Computing, Oct [8] Yoshizawa, S., Wada, N., Hayasaka, N. and Miyanaga, Y. Noise Robust Speech Recognition Focusing on Time Variation and Dynamic Range of Speech Feature Parameters Proc. IEEE IS- PACS p.p , [9] Kanedera N., Arai T., H. Hermansky and M. Pavel, On the importance of various modulation frequencies for speech recognition, Proc. Eurospeech, p.p , [10] Yoshizawa, S., Hayasaka, N., Wada, N. and Miyanaga, Y. Cepstral amplitude range normalization for noise robust speech recognition IEICE Trans. on Information and Systems, Vol.E87-D, No.8, p.p , Aug [11] Hermansky H. and Morgan N., RASTA processing of speech, IEEE Trans. Speech and Audio Process, vol.2, p.p , Oct [12] Furui S., Speaker-Independent isolated word recognition using dynamic features of speech spectrum, IEEE Trans. on Acoust., Speech, and Signal Process., vol.assp-34, no.1 p.p , Feb Naoya Wada received the B.E. and M.E. degrees in Electrical Engineering from Hokkaido University, Japan in 2001 and 2003, respectively. He is currently studying at Graduate School of Information Science and Technology, Hokkaido University. His research interests are digital signal processing, speech analysis, and speech recognition. Shingo Yoshizawa received the B.E. and M.E. degrees in Electrical Engineering from Hokkaido University, Japan in 2001 and 2003, respectively. He is currently studying at Graduate School of Information Science and Technology, Hokkaido University. His research interests are speech processing, wireless communication systems, and VLSI architecture. Noboru Hayasaka received the B.E. and M.E. degrees in Electrical Engineering from Hokkaido University, Japan in 2002 and 2004, respectively. He is currently studying at Graduate School of Information Science and Technology, Hokkaido University. His research interests are digital signal processing, speech analysis, and speech recognition. Yoshikazu Miyanaga received the B.S., M.S., and Dr.Eng. degrees from Hokkaido University, Japan in 1979, 1981, and 1986, respectively. Since 1983, he has been with Hokkaido University, Japan, where he is a Professor and working at Graduate School of Information Science and Technology, Hokkaido University. His research interests are adaptive signal processing, non-linear signal processing, and parallel-pipelined VLSI systems. Yoshikazu Miyanaga is a member of IEICE, Information Processing Society of Japan, and Acoustical Society of Japan.

A Real Time Noise-Robust Speech Recognition System

A Real Time Noise-Robust Speech Recognition System A Real Time Noise-Robust Speech Recognition System 7 A Real Time Noise-Robust Speech Recognition System Naoya Wada, Shingo Yoshizawa, and Yoshikazu Miyanaga, Non-members ABSTRACT This paper introduces

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Modulation Spectrum Power-law Expansion for Robust Speech Recognition Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Fundamental frequency estimation of speech signals using MUSIC algorithm

Fundamental frequency estimation of speech signals using MUSIC algorithm Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Robust Speech Recognition and its ROBOT implementation

Robust Speech Recognition and its ROBOT implementation Robust Speech Recognition and its ROBOT implementation Yoshikazu Miyanaga Hokkaido University Conditions for Speech Recognition Short Isolated Speech: words, phrase (

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Robust telephone speech recognition based on channel compensation

Robust telephone speech recognition based on channel compensation Pattern Recognition 32 (1999) 1061}1067 Robust telephone speech recognition based on channel compensation Jiqing Han*, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology,

More information

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Using a Mixture-Maximum Model IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System C.GANESH BABU 1, Dr.P..T.VANATHI 2 R.RAMACHANDRAN 3, M.SENTHIL RAJAA 3, R.VENGATESH 3 1 Research Scholar (PSGCT)

More information

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Chanwoo Kim 1 and Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR Syu-Siang Wang 1, Jeih-weih Hung, Yu Tsao 1 1 Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan Dept. of Electrical

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. Department of Signal Theory and Communications. c/ Gran Capitán s/n, Campus Nord, Edificio D5

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. Department of Signal Theory and Communications. c/ Gran Capitán s/n, Campus Nord, Edificio D5 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING Javier Hernando Department of Signal Theory and Communications Polytechnical University of Catalonia c/ Gran Capitán s/n, Campus Nord, Edificio D5 08034

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

A ROBUST FRONTEND FOR ASR: COMBINING DENOISING, NOISE MASKING AND FEATURE NORMALIZATION. Maarten Van Segbroeck and Shrikanth S.

A ROBUST FRONTEND FOR ASR: COMBINING DENOISING, NOISE MASKING AND FEATURE NORMALIZATION. Maarten Van Segbroeck and Shrikanth S. A ROBUST FRONTEND FOR ASR: COMBINING DENOISING, NOISE MASKING AND FEATURE NORMALIZATION Maarten Van Segbroeck and Shrikanth S. Narayanan Signal Analysis and Interpretation Lab, University of Southern California,

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

DWT and LPC based feature extraction methods for isolated word recognition

DWT and LPC based feature extraction methods for isolated word recognition RESEARCH Open Access DWT and LPC based feature extraction methods for isolated word recognition Navnath S Nehe 1* and Raghunath S Holambe 2 Abstract In this article, new feature extraction methods, which

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering

Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering Yun-Kyung Lee, o-young Jung, and Jeon Gue Par We propose a new bandpass filter (BPF)-based online channel normalization

More information

SPEECH ENHANCEMENT BASED ON ITERATIVE WIENER FILTER USING COMPLEX SPEECH ANALYSIS

SPEECH ENHANCEMENT BASED ON ITERATIVE WIENER FILTER USING COMPLEX SPEECH ANALYSIS SPEECH ENHANCEMENT BASED ON TERATVE WENER FLTER USNG COMPLEX SPEECH ANALYSS Keiichi Funaki Computing & Networking Center, Univ. o the Ryukyus Senbaru, Nishihara, Okinawa, 93-3, Japan phone: +(8)98-895-8946,

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Aadel Alatwi, Stephen So, Kuldip K. Paliwal Signal Processing Laboratory Griffith University, Brisbane, QLD, 4111,

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Separating Voiced Segments from Music File using MFCC, ZCR and GMM Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Sriram Ganapathy 1, Samuel Thomas 1 and Hynek Hermansky 1,2 1 Dept. of ECE, Johns Hopkins University, USA 2 Human Language Technology

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM MAY 21 ABSTRACT Although automatic speech recognition systems have dramatically improved in recent decades,

More information

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE 24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY 2009 Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation Jiucang Hao, Hagai

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Time-Frequency Distributions for Automatic Speech Recognition

Time-Frequency Distributions for Automatic Speech Recognition 196 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001 Time-Frequency Distributions for Automatic Speech Recognition Alexandros Potamianos, Member, IEEE, and Petros Maragos, Fellow,

More information

EXTRACTING a desired speech signal from noisy speech

EXTRACTING a desired speech signal from noisy speech IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 3, MARCH 1999 665 An Adaptive Noise Canceller with Low Signal Distortion for Speech Codecs Shigeji Ikeda and Akihiko Sugiyama, Member, IEEE Abstract

More information

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter Sana Alaya, Novlène Zoghlami and Zied Lachiri Signal, Image and Information Technology Laboratory National Engineering School

More information

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Monophony/Polyphony Classification System using Fourier of Fourier Transform International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

works must be obtained from the IEE

works must be obtained from the IEE Title A filtered-x LMS algorithm for sinu Effects of frequency mismatch Author(s) Hinamoto, Y; Sakai, H Citation IEEE SIGNAL PROCESSING LETTERS (200 262 Issue Date 2007-04 URL http://hdl.hle.net/2433/50542

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

Enhancing the Complex-valued Acoustic Spectrograms in Modulation Domain for Creating Noise-Robust Features in Speech Recognition

Enhancing the Complex-valued Acoustic Spectrograms in Modulation Domain for Creating Noise-Robust Features in Speech Recognition Proceedings of APSIPA Annual Summit and Conference 15 16-19 December 15 Enhancing the Complex-valued Acoustic Spectrograms in Modulation Domain for Creating Noise-Robust Features in Speech Recognition

More information

SPEECH communication under noisy conditions is difficult

SPEECH communication under noisy conditions is difficult IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 6, NO 5, SEPTEMBER 1998 445 HMM-Based Strategies for Enhancement of Speech Signals Embedded in Nonstationary Noise Hossein Sameti, Hamid Sheikhzadeh,

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION Changkyu Choi, Seungho Choi, and Sang-Ryong Kim Human & Computer Interaction Laboratory Samsung Advanced Institute of Technology

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Automatic Morse Code Recognition Under Low SNR

Automatic Morse Code Recognition Under Low SNR 2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) Automatic Morse Code Recognition Under Low SNR Xianyu Wanga, Qi Zhaob, Cheng Mac, * and Jianping

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

VQ Source Models: Perceptual & Phase Issues

VQ Source Models: Perceptual & Phase Issues VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu

More information

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators

Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators 374 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 52, NO. 2, MARCH 2003 Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators Jenq-Tay Yuan

More information

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement

Optimal Adaptive Filtering Technique for Tamil Speech Enhancement Optimal Adaptive Filtering Technique for Tamil Speech Enhancement Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore,

More information

A LPC-PEV Based VAD for Word Boundary Detection

A LPC-PEV Based VAD for Word Boundary Detection 14 A LPC-PEV Based VAD for Word Boundary Detection Syed Abbas Ali (A), NajmiGhaniHaider (B) and Mahmood Khan Pathan (C) (A) Faculty of Computer &Information Systems Engineering, N.E.D University of Engg.

More information

Gammatone Cepstral Coefficient for Speaker Identification

Gammatone Cepstral Coefficient for Speaker Identification Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION

LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION 1 HSIN-JU HSIEH, 2 HAO-TENG FAN, 3 JEIH-WEIH HUNG 1,2,3 Dept of Electrical Engineering,

More information

Analysis of LMS Algorithm in Wavelet Domain

Analysis of LMS Algorithm in Wavelet Domain Conference on Advances in Communication and Control Systems 2013 (CAC2S 2013) Analysis of LMS Algorithm in Wavelet Domain Pankaj Goel l, ECE Department, Birla Institute of Technology Ranchi, Jharkhand,

More information