LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION

Size: px
Start display at page:

Download "LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION"

Transcription

1 LEVERAGING JOINTLY SPATIAL, TEMPORAL AND MODULATION ENHANCEMENT IN CREATING NOISE-ROBUST FEATURES FOR SPEECH RECOGNITION 1 HSIN-JU HSIEH, 2 HAO-TENG FAN, 3 JEIH-WEIH HUNG 1,2,3 Dept of Electrical Engineering, National Chi Nan University, Taiwan, Republic of China 1 s @mail1.ncnu.edu.tw, 2 s @ncnu.edu.tw, 3 jwhung@ncnu.edu.tw Abstract- This paper presents to adopt various fusion types of spatial, temporal and modulation domain speech feature enhancement techniques in order to achieve superior speech recognition performance under noise-corrupted environments. With the mel-frequency cepstral coefficients (MFCC) as the standard speech feature representation, the spatial-domain techniques involve the short-time intra-frame feature enhancement, while the temporal-domain techniques compensate for the noise distortion that exists in the long-term inter-frame MFCC time stream. Furthermore, the modulation- domain techniques are conducted on the Fourier transform of a MFCC time stream. The evaluation experiments conducted on the connected-digit Aurora-2 database reveal that each of the spatial/temporal enhancement techniques adopted here performs better than the unprocessed MFCC baseline, and the integration of the respectively for spatial-, temporal-and modulation-domain features can result in even better recognition accuracy than the individual component method under a wide range of noise-corrupted environments. These results clearly demonstrate that the in the three domains treat noise in different aspects and therefore they are complementary to each other. Keywords- Noise Robustness, Speech Recognition, Spatial Processing, Temporal Processing, Modulation Domain. I. INTRODUCTION Most of the state-of-the-art automatic speech recognition (ASR) systems can perform well in a controlled laboratory environment. However, their performance usually degrades dramatically when they are applied outside the laboratory and in real-word applications. The performance degradation is often caused by interfering sources and distortions which are usually termed the environment variability. This variability and the resulting environmental mismatch between the developing and application situations may be caused by additive noise, channel distortion, different speaker characteristics, etc. In order to alleviate this mismatch, a great number of robustness algorithms have been proposed and thereby the application field of speech recognition can be broadened. These robustness algorithms can be roughly classifies into three schools: signal enhancement, feature compensation and model adaptation. First of all, as for signal enhancement, the aim is to improve the quality and intelligibility of speech signals. The corresponding techniques include spectral subtraction (SS) [1]-[3], short-time spectral amplitude estimation based on minimum meansquared error criteria (MMSE-STSA) [4], MMSEbased log-spectral amplitude estimation (MMSE log- STSA) [5], Wiener filtering [6, 7], Kalman filtering [8], modulation spectral subtraction (ModSpecSub) [9] and minimum mean-square error short-time spectral modulation magnitude estimator (MME) [10], just to name a few. Next, the general purpose of feature compensation is to build a speech feature representation that is robust to noise, and most of these are focused on refining the conventional speech features, such as linear predictive coefficients (LPC) [11], melfrequency cepstral coefficients (MFCC) [12] and perceptual linear prediction (PLP) [13], which behave well in clean noise-free situation, but are vulnerable to noise/interference. One primary direction of this category of is to compensate the statistics of temporal feature streams, and several popular feature statistics compensation include cepstral mean subtraction (CMS) [14], mean and variance normalization (MVN) [15], cepstral histogram equalization (CHN) [16], higher order cepstral moment normalization (HOCMN) [17] and cepstral shape normalization (CSN) [18]. Finally, the last school of, including parallel model combination (PMC) [19], speech and noise decomposition (SND) [20], vector Taylor series (VTS) [21], maximum a posteriori (MAP) [22], maximum likelihood linear regression (MLLR) [23], statistical re-estimation (STAR) and maximum mutual information (MMI) [24, 25], etc., is focused on tuning the acoustic models in the recognizer with respect to noise conditions in application. These take into account the noise characteristics within recognition procedures rather than eliminate the noise effect in the input signals/features. In recent years, our research group has been focused on developing noise-robustness techniques which primarily fall in the category of feature compensation as mentioned earlier. In particular, these developed techniques are to enhance the widely used MFCC speech features on different perspectives, which refer to temporal, spatial and modulation domains. Therefore, in this paper we focus on exploring the effectiveness as for the pairing of any two developed techniques that dwell in different domains and investigating whether such a paring result in better 15

2 performance than each individual component technique. According to the recognition experiments conducted on the well-known Aurora2 database and task [26], it is found that in most cases the noise robustness algorithms in different domains benefit one another and can produce further noise-robust speech features accordingly. These results further show that our presented algorithms in different domains deal with different traits of noise effect, and thereby using them together can further alleviate the degradation on MFCC speech features caused by noise. Figure 1. Three domains of MFCC features and the respective robustness algorithms The remainder of the paper is organized as follows: Section II briefly reviews the various noise robustness algorithms in three different domains which we have presented. Experimental setup is provided in Section IV, and Section V gives the detailed experimental results for the various integrations of any two algorithms, together with the corresponding discussions. Finally, Section VI contains a concluding remark and future works. II. REVIEW OF THE ROBUSTNESS ALGORITHMS 2.1 Spatial-domain The spatial-domain mentioned here take into consideration the mutual correlation among the intraframe MFCC features, and the associated algorithms we have developed in [27] are the weighted spatial- MVN and spatial-heq, abbreviated by WS-MVN and WS-HEQ, respectively. Briefly speaking, WS- MVN and WS-HEQ adoptthe idea of S-HEQ [28] and divide the MFCC features within every individual frame into low- and high-subbands. Then the two sub-bands are weighted according to their relative influence in recognition accuracy, and finally the weighted subbands of MFCC time sequences are enhanced by either MVN or HEQ. We have shown in [27] that WS-MVN and WS-HEQ behave better than S-HEQ and provide promising results in promoting robustness of MFCC features, and they reveal good additive property when applied together with some well-known robustness algorithms like MVN plus ARMA filtering (MVA) [29] and temporal structure normalization (TSN) [30]. 2.2 Temporal-domain As pointed out in the introduction section, the wellknown CMS, MVN and CHN process the temporal MFCC sequences by compensating the associated statistics. Here we would like to introduce three other tempoal processing algorithms that we developed in [31, 32], which are cepstral wavelet denoising (WD), sub-band temporal MVN (SB-TMVN) and sub-band temporalheq (SB-THEQ). All of these three algorithms take advantage of the discrete wavelet transform (DWT) to split each cepstral temporal sequence into several sub-bands (approximation and detail parts). Then WD applied a thresholding scheme to remove the relatively small-valued components in each individual sub-band, while SB-TMVN and SB- THEQ adopt a statistics compensation procedure to normalize the mean, variance or histogram of each sub-band temporal sequence. It has been shown that the SB-TMVN and SB-THEQ outperform their fullband counterparts, viz. MVN [15] and CHN [16], and WD behaves better than the conventional wavelet threshold denoising algorithm that operates on the speech waveform directly. 2.3 Modulation-domain Via applying the Fourier transform to the MFCC temporal sequence, the corresponding modulation spectrum can be obtained. The noise effect can be clearly observed in the cepstral modulation spectrum, and thus the respective modulation-domain robustness are developed to compensate the modulation spectrum directly. Our recent research have come up with a series of robustness algorithms 16

3 in modulation domain, and some of them are subband modulation spectral MVN (SB-MSMVN) [33], sub-band modulation spectral HEQ (SB-MSHEQ) [33] and modulation spectrum power law expansion (MSPLE) [34]. Briefly speaking, SB-MSMVN and SB-MSHEQ split the magnitude component of cepstral modulation spectrum into several segments (i.e., sub-bands) first, and then employ MVN and HEQ to compensate the statistics for each segments. Besides, MSPLE applies a power law operation to the entire magnitude modulation spectrum in order to highlight the lower frequency components that are commonly viewed to be more beneficial for the speech recognition than the higher frequency components. SB-SMVN and SB-SHEQ have been shown to outperform their full-band counterparts, and we have demonstrated that a simple power-law operation as in MSPLE can improve the recognition accuracy significantly. III. EXPERIMENTAL SETUP modulation domain, SB-MSMVN, SB- MSHE and MSPLE. The resulting accuracy rates are summarized in Figure 2. From this figure, several findings can be made: 1. T-MVN brings significant accuracy improvement over the baseline. However, it behaves worse than SB-MSMVN (77.96% in accuracy) and SB-SHE (83.85%). 2. With the T-MVN as the pre-processing method, the performance of SB-MSMVN and MSPLE are further promoted, while SB-SHE drops lightly possibly due to over normalization. 3. Adding MSPLE to T-MVN benefits T-MVN a lot by providing an absolute accuracy improvement of 7.16%. This result implies that the method T-MVN plus MSPLE is well suited in application due to its computational efficiency together with good performance. The efficacy of the series of the integrations for different robustness mentioned in Section II was evaluated on the noisy Aurora-2 [26] database. Briefly speaking, Aurora-2 is a subset of the TI- DIGITS, which consists speech signals uttered by US adults. The task associated with Aurora-2 is to recognize connected digit utterances interfered with various noise sources at different signal-to-noise ratios (SNRs). In the mode of clean-condition training plus multi-condition testing, the acoustic models are trained via 8,440 clean noise-free utterances, and the testing data is further divided into three Sets: Test Sets A and B contain the utterances corrupted by additive noise, and Test Set C is composed of the utterances with additive noise and channel distortion. There are eight noise types in total, and two channel characteristics. Furthermore, the acoustic model for each digit in the Aurora-2 task is set to a left-to-right continuous density HMM with 16 states, each of which is a3- mixture GMM. In regard to speech feature extraction, each utterance of the training and testing sets was represented by a series of 13 static features augmented with their first- and second-order delta coefficients, resulting in a 39-dimensional MFCC feature vector. The training and recognition tests used the HTK recognition toolkit [35], which followed the setup originally defined for the ETSI evaluations [26]. IV. EXPERIMENTAL RESTLTS AND DISCUSSIONS 4.1 The paring of temporal- and modulationdomain At the outset, we evaluate the mode that the original MFCC features are first enhanced by the well-known temporal domain method, temporal MVN (T-MVN), and then further processed by the presented Figure 2. The averaged recognition accuracy rates for one temporal-domain method, T-MVN and three modulationdomain, MSPLE, SB-MSMVN and SB-MSHEQ, together with some possible types of integration. Figure 3. The averaged recognition accuracy rates for one spatial-domain method, WS-HEQ and three temporal-domain, WD, SB-TMVN and SB-THEQ, together with some possible types of integration. 4.2 The paring of spatial- and temporal-domain Next, we present the integration of spatial-domain method, WS-HEQ, and either of three temporal domain, WD, SB-TMVN and SB-THEQ. The corresponding evaluation results are shown in Figure 3. From this figure, we have several observations: 1. Any of the various integrations gives rise to an additive effect and show superior accuracy rates in comparison with any individual component 17

4 method. For example, "WS-HEQ plus SB- TMVN" (85.80% in accuracy) outperforms WS- HEQ (84.99%) and SB-TMVN (80.62%).Therefore, it is evident that the temporal can further improve the discriminability of speech features and reduce the noise distortion left by WS-HEQ. 2. Among the three temporal domain, SB- TMVN is the most effective in pairing WS-HEQ to provide the optimal accuracy. Despite the fact that SB-THEQ outperforms SB-TMVN in isolated operation, combining SB-THEQ with WS-HEQ possibly results in over compensation and less accuracy rates. This result also indicates that a simpler SB-TMVN (relative to SB-THEQ) can behave better when integrated with WS-HEQ. 4.3 The paring of spatial- and modulation-domain Finally, the performance of the fusion of WS-HEQ and either of three modulation domain, SB- MSMVN, SB-MSHEQ and MSPLE, is explored. The respective recognition accuracy rates averaged over all noise types and levels in three test sets are shown in Figure 4. This figure reveals that: 1. Most of the combinative procedures produce better results than the individual component method. For example, the integration of WS- HEQ and SB-MSMVN (85.72% in accuracy) behave better than the single WS-HEQ (84.99%) and SB-MSMVN (77.96%). 2. When the feature sequences are pre-processed by WS-HEQ, the three modulation domain used here behave very close to each other. This implies that MSPLE is a better choice among the three for integrating with WS-HEQ, since it has the simplest computation and achieves nearly optimal recognition accuracy in the integration. 3. Compared with the two sub-band modulationdomain, MSPLE only processes the lower band and still can promote the recognition performance as well. In particular, further compensating the WS-HEQ processed features with MSPLE can result in better performance, again evidencing the robustness capability of MSPLE. CONCLUSIONS AND FUTURE WORKS In this paper, we have demonstrated the preferable recognition performance of the integration of the robustness with respect to different domains for MFCC features. Most types of the integration are simple in implementation and very applicable in realworld scenarios. In the near future, we are going to adopt the speech features enhanced by the presented architectures in the model of state-of-the-art deeplearning neural network (DNN) to evaluate the respective performance. Another direction is to alter 18 the implementation order of the component in the integration to see the corresponding effect since these are mostly non-linear operations to the original features. Figure 4. The averaged recognition accuracy rates for one spatial-domain method, WS-HEQ and three modulationdomain, MSPLE, SB-MSMVN and SB-MSHEQ, together with some possible types of integration REFERENCES [1] S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), pp , [2] M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceedings of the Signal Processing, pp , [3] S. Kamath and P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proceedings of the IEEE InternationalConference on Acoustics, Speech and Signal Processing, pp. IV-4164, 2002 [4] Y. Ephraim and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Audio, Speech, and Language Processing, 32(6), pp , [5] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), pp , [6] C. Plapous, C. Marro, and P. Scalart, Improved signal-tonoise ratio estimation for speech enhancement, IEEE Transactions on Audio, Speech and Language Processing, 14(6), pp , [7] P. Scalart and J. V. Filho, Speech enhancement based on a priori signal to noise estimation, in Proceedings of the Signal Processing, pp , [8] V. Grancharov and J. S. B. Kleijn, On causal algorithms for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 14(3), pp , [9] K. Paliwal, K. Wojcicki and B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Communication, 9552(5), pp , [10] K. Paliwal, B. Schwerin and K. Wojcicki, Speech enhancement using minimum meansquare error short-time spectral modulation magnitude estimator, Speech Communication, 54(2), pp , [11] B. S. Atal, The history of linear prediction, IEEE Signal Processing Magazine, 23(2), pp , [12] S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuous spoken sentences, IEEE Transaction onacoustics, Speech and Signal Processing, 28(4), pp , [13] H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, Journal of the Acoustical Society of America, 87(4), pp , 1990.

5 [14] S. Furui, Cepstral analysis technique for automatic speaker verification, IEEE Transactions on Acoustics, Speech and Signal Processing, 29(2), pp , [15] S. Tibrewala, H. Hermansky, Multiband and adaptation approaches to robust speech recognition, in Proceedings of the Eurospeech Conference on Speech Communications and Technology, pp , [16] F. Hilger and H. Ney, Quantile based histogram equalization for noise robust large vocabulary speech recognition, IEEE Transaction on Audio, Speech, and Language Processing, 14(3), pp , [17] C.-W. Hsu and L.-S. Lee, Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition, IEEE Transactions on Audio, Speech, and Language Processing, 17(2), pp , [18] J. Du and R.-H. Wang, Cepstral shape normalization (CSN) for robust speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp , [19] J.-W. Hung, J.-L. Shen and L.-S. Lee, New approaches for domain transformation and parameter combination for improved accuracy in parallel model combination (PMC) techniques, IEEE Transactions on Speech and Audio Processing, 9(8), pp , [20] J. H. Holmes and N. C. Sedgwick, Noise compensation for speech recognition using probabilistic models, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp , [21] A. Acero, L. Deng, T. Jristjansson, and J. Zhang, HMM adaptation using vector taylor series for noisy speech recognition, in Proceedings of the International Conference on Spoken Language Processing, pp , [22] J.-L. Gauiain and C.-H. Lee, Maximun a posteriori estimation for multivariate Gaussian mixture observations of markov chains, IEEE Transactions on Speech and Audio Processing, 2(2), pp , [23] C. J. Leggester and P. C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density HMMs, Computer Speech and Language, 9(2), pp , [24] M. J. F. Gales and S. J. Young, Cepstral parameter compensation for HMM recognition in noise, Speech Communication, 12(3), pp [25] L. Bahl, P. Brown, P. de Souza and R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, in Proceedings of the Signal Processing, pp , [26] H. G. Hirsch and D. Pearce, The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions, in Proceedings of the 2000 Automatic Speech Recognition: Challenges for the new Millenium, pp , [27] J.-W. Hung and H.-T. Fan, Intra-frame cepstral sub-band weighting and histogram equalization for noise-robust speech recognition, EURASIP Journal on Audio, Speech, and Music Processing 2013:29, Dec [28] V. Joshi, R. Bilgi, S. Umesh, L. Garcia, M. C. Benitez, Sub-band level histogram equalization for robust speech recognition, in Proceedings of International Conference on Spoken Language Processing, pp , 2011 [29] C-P. Chen and J. Bilmes, MVA processing of speech features, IEEE Transactions on Audio, Speech, and Language Processing, 15(1), pp , 2007 [30] X. Xiao, E. S. Chng and H. Li, Normalization of the speech modulation spectra for robust speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, 16(8), pp , [31] H.-T. Fan, J.-Y. Lee, J.-W. Hung and I-C. Lu, Leveraging wavelet de-noising in temporal sequences of speech features for noise-robust speech recognition, in Proceeding of International Conference on Intelligent Information Processing (ICIIP) [32] J.-W. Hung and H.-T. Fan, Subband feature statistics normalization techniques based on a discrete wavelet transform for robust speech recognition, IEEE Signal Processing Letters, June 2009 [33] W.-H.Tu, S.-Y. Huang and J.-W. Hung, Sub-band Modulation Spectrum Compensation for Robust Speech Recognition, 2009 Automatic Speech Recognition and Understanding Workshop, Dec 2009 [34] H.-T. Fan, Z.-H. Ye and J.-W. Hung, Modulation spectrum power-law expansion for robust speech recognition, in Proceeding of Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Oct [35] 19

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Modulation Spectrum Power-law Expansion for Robust Speech Recognition Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:

More information

Enhancing the Complex-valued Acoustic Spectrograms in Modulation Domain for Creating Noise-Robust Features in Speech Recognition

Enhancing the Complex-valued Acoustic Spectrograms in Modulation Domain for Creating Noise-Robust Features in Speech Recognition Proceedings of APSIPA Annual Summit and Conference 15 16-19 December 15 Enhancing the Complex-valued Acoustic Spectrograms in Modulation Domain for Creating Noise-Robust Features in Speech Recognition

More information

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR Syu-Siang Wang 1, Jeih-weih Hung, Yu Tsao 1 1 Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan Dept. of Electrical

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

基於離散餘弦轉換之語音特徵的強健性補償法 Compensating the speech features via discrete cosine transform for robust speech recognition

基於離散餘弦轉換之語音特徵的強健性補償法 Compensating the speech features via discrete cosine transform for robust speech recognition 基於離散餘弦轉換之語音特徵的強健性補償法 Compensating the speech features via discrete cosine transform for robust speech recognition Hsin-Ju Hsieh 謝欣汝, Wen-hsiang Tu 杜文祥, Jeih-weih Hung 洪志偉暨南國際大學電機工程學系 Department of Electrical

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition

Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Chanwoo Kim 1 and Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Discriminative Training for Automatic Speech Recognition

Discriminative Training for Automatic Speech Recognition Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping 100 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping Naoya Wada, Shingo Yoshizawa, Noboru

More information

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

A ROBUST FRONTEND FOR ASR: COMBINING DENOISING, NOISE MASKING AND FEATURE NORMALIZATION. Maarten Van Segbroeck and Shrikanth S.

A ROBUST FRONTEND FOR ASR: COMBINING DENOISING, NOISE MASKING AND FEATURE NORMALIZATION. Maarten Van Segbroeck and Shrikanth S. A ROBUST FRONTEND FOR ASR: COMBINING DENOISING, NOISE MASKING AND FEATURE NORMALIZATION Maarten Van Segbroeck and Shrikanth S. Narayanan Signal Analysis and Interpretation Lab, University of Southern California,

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Speech Signal Enhancement Techniques

Speech Signal Enhancement Techniques Speech Signal Enhancement Techniques Chouki Zegar 1, Abdelhakim Dahimene 2 1,2 Institute of Electrical and Electronic Engineering, University of Boumerdes, Algeria inelectr@yahoo.fr, dahimenehakim@yahoo.fr

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment

Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment Urmila Shrawankar 1,3 and Vilas Thakare 2 1 IEEE Student Member & Research Scholar, (CSE), SGB Amravati University,

More information

JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES

JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES Qing Wang 1, Jun Du 1, Li-Rong Dai 1, Chin-Hui Lee 2 1 University of Science and Technology of China, P. R. China

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

CNMF-BASED ACOUSTIC FEATURES FOR NOISE-ROBUST ASR

CNMF-BASED ACOUSTIC FEATURES FOR NOISE-ROBUST ASR CNMF-BASED ACOUSTIC FEATURES FOR NOISE-ROBUST ASR Colin Vaz 1, Dimitrios Dimitriadis 2, Samuel Thomas 2, and Shrikanth Narayanan 1 1 Signal Analysis and Interpretation Lab, University of Southern California,

More information

Speech Enhancement Using a Mixture-Maximum Model

Speech Enhancement Using a Mixture-Maximum Model IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM

IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM Samuel Thomas 1, George Saon 1, Maarten Van Segbroeck 2 and Shrikanth S. Narayanan 2 1 IBM T.J. Watson Research Center,

More information

PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH RECOGNITION

PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH RECOGNITION Journal of Engineering Science and Technology Vol. 12, No. 4 (2017) 972-986 School of Engineering, Taylor s University PERFORMANCE ANALYSIS OF SPEECH SIGNAL ENHANCEMENT TECHNIQUES FOR NOISY TAMIL SPEECH

More information

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE 24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY 2009 Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation Jiucang Hao, Hagai

More information

IMPROVING WIDEBAND SPEECH RECOGNITION USING MIXED-BANDWIDTH TRAINING DATA IN CD-DNN-HMM

IMPROVING WIDEBAND SPEECH RECOGNITION USING MIXED-BANDWIDTH TRAINING DATA IN CD-DNN-HMM IMPROVING WIDEBAND SPEECH RECOGNITION USING MIXED-BANDWIDTH TRAINING DATA IN CD-DNN-HMM Jinyu Li, Dong Yu, Jui-Ting Huang, and Yifan Gong Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 ABSTRACT

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System Jordi Luque and Javier Hernando Technical University of Catalonia (UPC) Jordi Girona, 1-3 D5, 08034 Barcelona, Spain

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 8, NOVEMBER

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 8, NOVEMBER IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 8, NOVEMBER 2011 2439 Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features Fabio Valente, Member,

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Advances in Applied and Pure Mathematics

Advances in Applied and Pure Mathematics Enhancement of speech signal based on application of the Maximum a Posterior Estimator of Magnitude-Squared Spectrum in Stationary Bionic Wavelet Domain MOURAD TALBI, ANIS BEN AICHA 1 mouradtalbi196@yahoo.fr,

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH

IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH RESEARCH REPORT IDIAP IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH Cong-Thanh Do Mohammad J. Taghizadeh Philip N. Garner Idiap-RR-40-2011 DECEMBER

More information

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System C.GANESH BABU 1, Dr.P..T.VANATHI 2 R.RAMACHANDRAN 3, M.SENTHIL RAJAA 3, R.VENGATESH 3 1 Research Scholar (PSGCT)

More information

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks Australian Journal of Basic and Applied Sciences, 4(7): 2093-2098, 2010 ISSN 1991-8178 Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks 1 Mojtaba Bandarabadi,

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

DWT and LPC based feature extraction methods for isolated word recognition

DWT and LPC based feature extraction methods for isolated word recognition RESEARCH Open Access DWT and LPC based feature extraction methods for isolated word recognition Navnath S Nehe 1* and Raghunath S Holambe 2 Abstract In this article, new feature extraction methods, which

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

A New Framework for Supervised Speech Enhancement in the Time Domain

A New Framework for Supervised Speech Enhancement in the Time Domain Interspeech 2018 2-6 September 2018, Hyderabad A New Framework for Supervised Speech Enhancement in the Time Domain Ashutosh Pandey 1 and Deliang Wang 1,2 1 Department of Computer Science and Engineering,

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech Project Proposal Avner Halevy Department of Mathematics University of Maryland, College Park ahalevy at math.umd.edu

More information

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate

More information

Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering

Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering Yun-Kyung Lee, o-young Jung, and Jeon Gue Par We propose a new bandpass filter (BPF)-based online channel normalization

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Sriram Ganapathy 1, Samuel Thomas 1 and Hynek Hermansky 1,2 1 Dept. of ECE, Johns Hopkins University, USA 2 Human Language Technology

More information

Robustness (cont.); End-to-end systems

Robustness (cont.); End-to-end systems Robustness (cont.); End-to-end systems Steve Renals Automatic Speech Recognition ASR Lecture 18 27 March 2017 ASR Lecture 18 Robustness (cont.); End-to-end systems 1 Robust Speech Recognition ASR Lecture

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Speech Enhancement for Nonstationary Noise Environments

Speech Enhancement for Nonstationary Noise Environments Signal & Image Processing : An International Journal (SIPIJ) Vol., No.4, December Speech Enhancement for Nonstationary Noise Environments Sandhya Hawaldar and Manasi Dixit Department of Electronics, KIT

More information

HIGH RESOLUTION SIGNAL RECONSTRUCTION

HIGH RESOLUTION SIGNAL RECONSTRUCTION HIGH RESOLUTION SIGNAL RECONSTRUCTION Trausti Kristjansson Machine Learning and Applied Statistics Microsoft Research traustik@microsoft.com John Hershey University of California, San Diego Machine Perception

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

Robust speech recognition system using bidirectional Kalman filter

Robust speech recognition system using bidirectional Kalman filter IET Signal Processing Research Article Robust speech recognition system using bidirectional Kalman filter ISSN 1751-9675 Received on 31st October 2013 Revised on 13th July 2014 Accepted on 24th April 2015

More information

Robust telephone speech recognition based on channel compensation

Robust telephone speech recognition based on channel compensation Pattern Recognition 32 (1999) 1061}1067 Robust telephone speech recognition based on channel compensation Jiqing Han*, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology,

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

A Real Time Noise-Robust Speech Recognition System

A Real Time Noise-Robust Speech Recognition System A Real Time Noise-Robust Speech Recognition System 7 A Real Time Noise-Robust Speech Recognition System Naoya Wada, Shingo Yoshizawa, and Yoshikazu Miyanaga, Non-members ABSTRACT This paper introduces

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM MAY 21 ABSTRACT Although automatic speech recognition systems have dramatically improved in recent decades,

More information

Speech Enhancement Techniques using Wiener Filter and Subspace Filter

Speech Enhancement Techniques using Wiener Filter and Subspace Filter IJSTE - International Journal of Science Technology & Engineering Volume 3 Issue 05 November 2016 ISSN (online): 2349-784X Speech Enhancement Techniques using Wiener Filter and Subspace Filter Ankeeta

More information

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks Raw Waveform-based Speech Enhancement by Fully Convolutional Networks Szu-Wei Fu *, Yu Tsao *, Xugang Lu and Hisashi Kawai * Research Center for Information Technology Innovation, Academia Sinica, Taipei,

More information

A SUPERVISED SIGNAL-TO-NOISE RATIO ESTIMATION OF SPEECH SIGNALS. Pavlos Papadopoulos, Andreas Tsiartas, James Gibson, and Shrikanth Narayanan

A SUPERVISED SIGNAL-TO-NOISE RATIO ESTIMATION OF SPEECH SIGNALS. Pavlos Papadopoulos, Andreas Tsiartas, James Gibson, and Shrikanth Narayanan IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) A SUPERVISED SIGNAL-TO-NOISE RATIO ESTIMATION OF SPEECH SIGNALS Pavlos Papadopoulos, Andreas Tsiartas, James Gibson, and

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

POSSIBLY the most noticeable difference when performing

POSSIBLY the most noticeable difference when performing IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 7, SEPTEMBER 2007 2011 Acoustic Beamforming for Speaker Diarization of Meetings Xavier Anguera, Associate Member, IEEE, Chuck Wooters,

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Comparative Performance Analysis of Speech Enhancement Methods

Comparative Performance Analysis of Speech Enhancement Methods International Journal of Innovative Research in Electronics and Communications (IJIREC) Volume 3, Issue 2, 2016, PP 15-23 ISSN 2349-4042 (Print) & ISSN 2349-4050 (Online) www.arcjournals.org Comparative

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Automatic Morse Code Recognition Under Low SNR

Automatic Morse Code Recognition Under Low SNR 2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) Automatic Morse Code Recognition Under Low SNR Xianyu Wanga, Qi Zhaob, Cheng Mac, * and Jianping

More information

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Aadel Alatwi, Stephen So, Kuldip K. Paliwal Signal Processing Laboratory Griffith University, Brisbane, QLD, 4111,

More information

Implementation of SYMLET Wavelets to Removal of Gaussian Additive Noise from Speech Signal

Implementation of SYMLET Wavelets to Removal of Gaussian Additive Noise from Speech Signal Implementation of SYMLET Wavelets to Removal of Gaussian Additive Noise from Speech Signal Abstract: MAHESH S. CHAVAN, * NIKOS MASTORAKIS, MANJUSHA N. CHAVAN, *** M.S. GAIKWAD Department of Electronics

More information

Single channel noise reduction

Single channel noise reduction Single channel noise reduction Basics and processing used for ETSI STF 94 ETSI Workshop on Speech and Noise in Wideband Communication Claude Marro France Telecom ETSI 007. All rights reserved Outline Scope

More information

On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering

On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering 1 On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering Nikolaos Dionelis, https://www.commsp.ee.ic.ac.uk/~sap/people-nikolaos-dionelis/ nikolaos.dionelis11@imperial.ac.uk,

More information

Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering

Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering Sriram Ganapathy a) and Mohamed Omar IBM T.J. Watson Research Center, Yorktown Heights, New York 10562 ganapath@us.ibm.com,

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST 2010 1127 Speech Enhancement Using Gaussian Scale Mixture Models Jiucang Hao, Te-Won Lee, Senior Member, IEEE, and Terrence

More information

Speech Enhancement in Modulation Domain Using Codebook-based Speech and Noise Estimation

Speech Enhancement in Modulation Domain Using Codebook-based Speech and Noise Estimation Speech Enhancement in Modulation Domain Using Codebook-based Speech and Noise Estimation Vidhyasagar Mani, Benoit Champagne Dept. of Electrical and Computer Engineering McGill University, 3480 University

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

Speech Enhancement based on Fractional Fourier transform

Speech Enhancement based on Fractional Fourier transform Speech Enhancement based on Fractional Fourier transform JIGFAG WAG School of Information Science and Engineering Hunan International Economics University Changsha, China, postcode:4005 e-mail: matlab_bysj@6.com

More information

Electric Guitar Pickups Recognition

Electric Guitar Pickups Recognition Electric Guitar Pickups Recognition Warren Jonhow Lee warrenjo@stanford.edu Yi-Chun Chen yichunc@stanford.edu Abstract Electric guitar pickups convert vibration of strings to eletric signals and thus direcly

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

A Spectral Conversion Approach to Single- Channel Speech Enhancement

A Spectral Conversion Approach to Single- Channel Speech Enhancement University of Pennsylvania ScholarlyCommons Departmental Papers (ESE) Department of Electrical & Systems Engineering May 2007 A Spectral Conversion Approach to Single- Channel Speech Enhancement Athanasios

More information

The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments

The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments Felix Weninger, Jürgen Geiger, Martin Wöllmer, Björn Schuller, Gerhard

More information