SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS
|
|
- Morris Howard
- 5 years ago
- Views:
Transcription
1 SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS Bojana Gajić Department o Telecommunications, Norwegian University o Science and Technology 7491 Trondheim, Norway gajic@tele.ntnu.no Kuldip K. Paliwal School o Microelectronic Engineering, Griith University Brisbane, QLD 4111, Australia K.Paliwal@me.gu.edu.au ABSTRACT This paper is concerned with increasing the robustness o automatic speech recognition systems (ASR) against additive bacground noise, by inding speech parameters that are less inluenced by changes in acoustic environments than the conventional ones. Inspired by the good robustness o auditory based speech parameterization methods, we compare the steps involved with those in the conventional methods rom the signal processing point o view. The use o dominant spectral requencies is believed to be an important reason or the superior robustness o the auditory based methods. A new speech parameterization method is described that is conceptually similar to auditory based methods, while retaining the low computational cost o the conventional methods. Evaluation on an ASR tas has shown that the new method outperormed the conventional methods in presence o various bacground noises. 1. INTRODUCTION State-o-the-art automatic speech recognition (ASR) systems are capable o achieving a very high recognition accuracy when tested in laboratory conditions. However, they usually experience a dramatic decrease in perormance when used in real-world applications. One o the main reasons or such a behavior is presence o bacground noise in the testing environment that has not been observed during system training. This problem becomes especially important or ASR on mobile devices, as the acoustic environment is constantly changing and cannot be accounted or during system training. One way to overcome this problem is to ind a speech parameterization that is invariant to changing acoustic environments. The most commonly used speech parameters are based on the energy inormation derived rom the short-term speech spectrum. However, the dominant spectral requencies are less inluenced by additive noise than the energy inormation. Thus, it is expected that the robustness o ASR systems could be improved i the dominant spectral requencies are eiciently incorporated into speech parameter vectors. The paper is organized as ollows. It starts with an overview o ASR systems in Section 2, and describes the robustness problem with possible solutions in Section 3. Section 4 summarizes the main processing steps involved in conventional and auditory based speech parameterization methods and describe a new method that combines the advantages o both classes o methods. An experimental study perormed to compare the perormance o the dierent parameterization methods on an ASR tas in various acoustic environments is described in Section 5. Finally, the major conclusions are summarized in section THE ASR SYSTEM The aim o automatic speech recognition (ASR) is to transorm a given spoen utterance into the corresponding transcription. A bloc diagram o an ASR system is shown in Figure 1. Beore the system can be used, it has to learn the characteristic speech patterns rom a large speech database with accompanying transcriptions. A set o stochastic models (hidden Marov models) is trained, each corresponding to one speech unit (or example phoneme). In addition, a lexicon is prepared to describe how the words are build up rom the basic speech units, as well as a language model describing the relationship between words. The models, lexicon and language model are then used to determine the most liely transcription o an incoming spoen utterance. The speech parameterization bloc is used to extract rom the speech waveorm the relevant inormation or discriminating between dierent speech sounds. The inormation is presented as a sequence o parameter vectors. This paper describes several dierent approaches to speech parameterization, and compares
2 Trans cription Model training Training database waveorm Parameterization Parameter vectors. Models bla bla Language model Recognition Recognition result "bla bla" Lexicon Figure 1: Bloc diagram o an ASR system their perormance on an ASR tas in various noisy conditions. 3. THE ROBUSTNESS PROBLEM Robustness o an ASR system is the system s ability to successully deal with dierent aspects o variability in the speech signal. Some o the common variabilities that occur in speech signals are listed below: Pronunciation variations between speaers depending on speaers voice characteristics, dialect, social class, etc. Pronunciation variations or a given speaer depending on mood, emotions, context, etc. Variations in the acoustic environment. Variations in the transmission channel. A number o techniques have been proposed to increase the robustness o ASR systems. Nevertheless, it still remains a major obstacle or reliable use o ASR technology in many real-world applications. As the mobile hand-held terminals become more common, the robustness against variations in the acoustic environment becomes increasingly important. Stateo-the-art ASR systems experience a dramatic perormance degradation when the acoustic environment diers rom the one observed in the training. In the ollowing, we list the major classes o approaches or overcoming this problem. Multiconditional training: The idea is to train a separate set o models or each bacground environment liely to occur during system use. For a given acoustic environment, the most liely set o models is then ound and used during the recognition process. Noise reduction: This approach is concerned with reducing the presence o noise in the speech signal beore it is sent to the recognizer. When the models are trained in noise-ree environments, this will reduce the mismatch between the input speech signal and the models. A most common approach is to apply noise spectral subtraction. Model compensation and adaptation: Instead o modiying the speech signal to better comply with the models, in this approach the models are changed according to the statistical characteristics o the noise to better comply with the noisy speech. Robust speech parameterization: The aim is to ind such a speech representation that is invariant to changes o the acoustic environment. Note that this approach diers rom the other approaches in that it does not require the nowledge o a particular acoustic environment during the use o the system. In the rest o this paper, we will ocus on this approach. 4. SPEECH PARAMETERIZATION This section starts with a summary o the major processing steps involved in conventional methods or speech parameterization. It proceeds by explaining the idea behind auditory based methods that have been shown to outperorm the conventional methods in noisy conditions. The major dierences between the two classes o methods are then explained rom the signal processing point o view. At the end, a new parameterization method is described, that combines the advantages o both conventional and auditory based methods Conventional Methods Conventional methods or speech parameterization are based on extracting the inormation rom the shortterm power spectrum o speech. The speech signal is divided into overlapping speech rames o 20-30ms length, as the speech signal can be regarded stationary on such a short intervals. The short-term power spectrum is estimated or each rame using either discrete Fourier transorm (DFT), ast Fourier transorm (FFT), ilter ban analysis or linear prediction analysis. The resulting spectral representation is usually
3 modiied by applying some auditory motivated processing. At the end, it is usual to perorm a decorrelation transormation, as this simpliies the recognition process. Mel-requency cepstrum coeicients (MFCC) are the most widely used speech parameters or ASR. Figure 2 illustrates the major processing steps involved in their computation. The short-term speech spec- S 1 () Filter ban S () DCT s(n) Spectrum estimation S() Energy log e S () N... e... parameter vector Figure 2: Illustration o MFCC computation trum is estimated using FFT. It is passed through a ilter ban consisting o overlapping triangular bandpass ilters uniormly distributed along the perceptually based mel-requency scale. The choice o the ilter ban is motivated by the nowledge on human hearing. A vector o subband log-energies is then computed and sent to a discrete cosine transorm (DCT) or decorrelation purposes. The resulting DCT coeicients, reerred to as MFCC, serve as a inal representation o the given speech rame. In the case o noisy speech, the subband energies get aected by noise, and the resulting speech representation diers rom the one or clean speech. Thus, i an ASR system is trained on clean speech, and used in noisy conditions, the mismatch can cause a large perormance degradation Auditory Based Methods Humans have a ascinating ability to recognize speech in noisy acoustic environments. Thus, there is a belie that the robustness o ASR systems could be considerably improved by simulating the processes in human auditory system. However, not all the processes in human speech recognition are well understood, and auditory based methods or speech parameterization have to rely on some heuristics. Probably the best nown auditory based parameters or ASR are so called Ensemble Interval Histograms (EIH) [1]. In this paper, we will present a slight modiication o these parameters reerred to as Zero Crossings with Pea Amplitudes (ZCPA) [2]. These parameters have been shown to outperorm both the EIH and all o the conventional parameterization methods in presence o additive noise. An illustration o the ZCPA method is shown in Figure 3. A s 1 (n) s (n) s N(n)... Zero crossing detector z i 1 z i z i 1 z i z i 1 i s(n) Filter ban z i+1 Pea detector p i zi 1 p i Histogram construction log p i DCT bin(i ) i z i... parameter vector Figure 3: Illustration o ZCPA computation rame o the given speech signal is passed through a ilter ban o bandpass ilters. The iltering is done in time domain. The resulting subband signals are sent to zero-crossing detectors. The interval between each pair o successive zero-crossings is measured together with the signal pea amplitude between the zero crossings. Then, the inverse intervals between successive zero crossings over all the subband signals are recorded in a histogram. Each histogram entry is weighted by the logarithm o the corresponding pea amplitude. Finally, the DCT is perormed or decorrelation purposes. Note that the ZCPA computation represents an alternative way o perorming spectral analysis. The inverse intervals between successive zero-crossings represent the instantaneous dominant requencies o the subband signal. The pea amplitudes, on the other hand, represent a measure o the instantaneous energy o the subband signal. The histogram bins containing the dominant requencies are increased by the
4 corresponding energy measures. Thus the resulting histogram represents an alternative representation o the signal spectrum. While the MFCC is based only on the subband energy computation, ZCPA eiciently combines the energy and dominant requency inormation. We believe that this dierence can be a part o the explanation or the ZCPA s superior perormance in noisy conditions. The dominant speech requencies are much less aected by the presence o additive noise than the subband energy measures. Thus, incorporation o the dominant requencies in the speech parameter vector can lead to increased robustness against additive noise. However, the ZCPA computation is prohibitively computationally expensive or use in practical ASR systems. This is due to time-domain processing and the need or heavy interpolation o the higher requency subband signals in order to obtain a precise zero-crossing locations Subband Spectral Centroid Histograms Motivated by the good noise robustness o the ZCPA parameters and the computational eiciency o the MFCC parameters, we searched or the possibility to design a new parameterization method, that would be more robust than MFCC, but have an acceptable computational cost. We believed that this tas could be achieved by inding a more computationally eicient method or incorporating the dominant requency inormation. In [3] it has been shown that Subband Spectral Centroids (SSC) are closely related to the dominant speech requencies. Using SSC as additional eatures to MFCC has been shown to increase the robustness o the ASR systems against additive noise [3, 4, 5, 6, 7]. We proposed a new ramewor or combining the SSC and subband energies through the construction o Subband Spectral Centroid Histograms (SSCH) [8, 9]. An illustration o the processing steps involved in the SSCH computation is shown in Figure 4. The speech power spectrum is estimated using FFT, and iltering is perormed in the requency domain to produce a number o subband signal. This part o the processing is analogue to the MFCC method. The dominant requency o each subband signal is estimated by the subband centroid. In addition, a subband energy measure is computed similarly as or the MFCC method. The dominant requency and energy inormation over all the subbands are combined in a single histogram in the same way as or the ZCPA method. Finally, the DCT is perormed or decorrelation purposes. This method uses the same conceptual inormation as the ZCPA method. However, note that the dominant requencies are now estimated rom the short- Spectrum estimation S() S 1 () s(n) S() Filter ban Centroid DCT S () Energy e S () e Histogram construction log p bin() N parameter vector Figure 4: Illustration o SSCH computation term power spectrum. This is a disadvantage in noisy conditions, as the spectrum itsel is corrupted by noise. On the other hand, the act that the processing is done in the spectral domain dramatically reduces the computational cost compared to ZCPA. It is now in the same order as or the MFCC computation. 5. EXPERIMENTAL STUDY This section describes an experimental study perormed to compare the perormance o the described methods on an ASR tas in various bacground conditions Tas and Database The methods were evaluated on the ISOLET Spoen Letter Database [10] down-sampled to 8 Hz. The database consists o English letters spoen in isolation recorded in a quiet room. Two repetitions o each word were recorded or each speaer. Utterances rom 90 speaers were used or training, while utterances rom 30 speaers were used or evaluation. Although the vocabulary consisting o 26 English letters is rather small, this is not a simple recognition tas, since the vocabulary words are very short and highly conusable. Noisy speech was artiicially created by adding to the original test set our dierent noise types at our dierent signal-to-noise ratios (SNR). Those are:
5 white Gaussian noise, actory noise, car noise and bacground speech. The last three noise types were taen rom the NOISEX database, where they were reerred to as actory1, volvo and babble noise respectively. A segment o the noise ile equal to the length o the speech ile was randomly extracted and added to the speech ile at the required SNR. SNR was computed as the ratio between the maximal rame energy o the speech ile, and the average energy o the noise segment. This way o computation maes SNR independent o the duration o the surrounding silence in the speech iles. Model training and recognition was perormed using speech recognition toolit HTK [11]. One hidden Marov model (HMM) with ive states and ive Gaussian mixtures per state was trained or each vocabulary word Choice o Free Parameters In the ollowing we summarize the most important parameters involved in MFCC, ZCPA and SSCH computation. MFCC: Frame length was set to 25 ms. The ilter ban consisted o 24 overlapping triangular ilters uniormly spaced along the mel-requency scale. 12 DCT coeicients were used. This is the standard parameter setting or the MFCC computation. It has not be optimized on the particular tas. ZCPA: The ilter ban consisted o 20 bandpass FIR ilters linearly spaced on the bar-requency scale (perceptually based requency scale similar to the mel-requency scale), with bandwidths equal to 2 Bar. The ilters had order 61, and were designed using the windowing method. Frequency dependent rame lengths equal to 20/ c were used, where c is the center requency o the corresponding bandpass ilter. The number o histogram bins was 26. Number o DCT coeicients was 12. SSCH: Frame length was set to 25 ms. The ilter ban consisted o 65 rectangular ilters. In the low requency range, ilter bandwidth was 300 Hz and the ilters were linearly spaced along the requency scale. In the high requency region, ilter bandwidth was 2 Bar and the ilters were linearly spaced along the bar-requency scale. 12 DCT coeicients were computed rom 26 histogram bins. Delta and delta-delta parameters were computed in addition to the static parameters or all o the methods, resulting in 36-dimensional parameter vectors Experimental Results Table 1 shows the results o the evaluation o MFCC, SSCH and ZCPA parameterization methods on both clean and noisy versions o the ISOLET database. Model training was perormed using clean speech. The recognition perormance was measured in terms o word accuracy. Table 1: Word accuracy or dierent parameterization methods in various acoustic environments a) White Gaussian noise method clean MFCC SSCH ZCPA b) Car noise method clean MFCC SSCH ZCPA c) Factory noise method clean MFCC SSCH ZCPA d) Bacground speech method clean MFCC SSCH ZCPA Looing at the results in Table 1, we see that MFCC perorms best on clean speech. However, even in presence o only a small amount o noise, the situation changes completely, and MFCC becomes the worst o the three methods. This conirms the lac o the robustness o MFCC parameters. SSCH is signiicantly more robust than MFCC or all the noise types. The improvement is largest or car noise, and smallest in presence o bacground speech. The relatively poor perormance in presence o bacground speech is probably due to the existence o speech-lie spectral peas in the bacground signal. SSCH even outperorms the ZCPA in the case o car noise, while ZCPA is more robust in presence o the other noise types. However, it is important to note that ZCPA cannot be used in place or SSCH in
6 practical applications, due to its prohibitive computational cost. 6. CONCLUSIONS In this paper, we addressed the robustness problem o the ASR systems against additive bacground noise. One way o overcoming this problem is to ind a speech parameterization that is less inluenced by additive noise than the conventional parameters. We compared the steps involved in conventional and auditory based methods, and concluded that the superior perormance o the auditory methods can be explained by the incorporation o the dominant spectral requencies into parameter vectors. A new speech parameterization method was described that computes the dominant spectral requencies in a more eicient way, rom the short-term spectrum o speech. Also this method outperormed the conventional methods in noisy conditions, conirming the importance o utilizing the dominant spectral requencies or increasing the robustness o the ASR systems. [9] B. Gajić and K. K. Paliwal, Robust parameters or speech recognition based on subband spectral centroid histograms, in Proc. EUROSPEECH, September [10] R. A. Cole, Y. K. Muthusamy, and M. Fanty, The ISOLET spoen letter database, Technical report CSE , Oregon Graduate Institute o Science and Technology, Beverton, OR, USA, March [11] S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Boo. Entropic, REFERENCES [1] O. Ghitza, Auditory models and human perormance in tass related to speech coding and speech recognition, IEEE Trans. on and Audio Processing, vol. 2, pp , January [2] D.-S. Kim, S.-Y. Lee, and R. M. Kil, Auditory processing o speech signals or robust speech recognition in real-world noisy environments, IEEE Trans. on and Audio Processing, vol. 7, pp , January [3] K. K. Paliwal, Spectral subband centroid eatures or speech recognition, in Proc. ICASSP, vol. 2, pp , May [4] S. Tsuge, T. Fuada, and H. Singer, Speaer normalized spectral subband parameters or noise robust speech recognition, in Proc. ICASSP, May [5] D. Albesano, R. D. Mori, R. Gemello, and F. Mana, A study o the eect o adding new dimensions to trajectories in the acoustic space, in Proc. EU- ROSPEECH, vol. 4, pp , September [6] R. D. Mori, D. Albesano, R. Gemello, and F. Mana, Ear-model derived eatures or automatic speech recognition, in Proc. ICASSP, [7] E. Gjelsvi, Modiication o ront-end processing or robust speech recognition. Diploma thesis, Norwegian University o Science and Technology, June [8] B. Gajić and K. K. Paliwal, Robust eature extraction using subband spectral centroid histograms, in Proc. ICASSP, May 2001.
I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationIntroduction to OFDM. Characteristics of OFDM (Orthogonal Frequency Division Multiplexing)
Introduction to OFDM Characteristics o OFDM (Orthogonal Frequency Division Multiplexing Parallel data transmission with very long symbol duration - Robust under multi-path channels Transormation o a requency-selective
More informationModulation Spectrum Power-law Expansion for Robust Speech Recognition
Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:
More informationI D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear
More informationSinusoidal signal. Arbitrary signal. Periodic rectangular pulse. Sampling function. Sampled sinusoidal signal. Sampled arbitrary signal
Techniques o Physics Worksheet 4 Digital Signal Processing 1 Introduction to Digital Signal Processing The ield o digital signal processing (DSP) is concerned with the processing o signals that have been
More informationECE5984 Orthogonal Frequency Division Multiplexing and Related Technologies Fall Mohamed Essam Khedr. Channel Estimation
ECE5984 Orthogonal Frequency Division Multiplexing and Related Technologies Fall 2007 Mohamed Essam Khedr Channel Estimation Matlab Assignment # Thursday 4 October 2007 Develop an OFDM system with the
More informationComparison of Spectral Analysis Methods for Automatic Speech Recognition
INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationMFCC-based perceptual hashing for compressed domain of speech content identification
Available online www.jocpr.com Journal o Chemical and Pharmaceutical Research, 014, 6(7):379-386 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 MFCC-based perceptual hashing or compressed domain
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationDetermination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain
Determination o Pitch Range Based on Onset and Oset Analysis in Modulation Frequency Domain A. Mahmoodzadeh Speech Proc. Research Lab ECE Dept. Yazd University Yazd, Iran H. R. Abutalebi Speech Proc. Research
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More informationMusic Technology Group, Universitat Pompeu Fabra, Barcelona, Spain {jordi.bonada,
GENERATION OF GROWL-TYPE VOICE QUALITIES BY SPECTRAL MORPHING Jordi Bonada Merlijn Blaauw Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain Email: {jordi.bonada, merlijn.blaauw}@up.edu
More informationIsolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques
Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT
More informationRobust telephone speech recognition based on channel compensation
Pattern Recognition 32 (1999) 1061}1067 Robust telephone speech recognition based on channel compensation Jiqing Han*, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology,
More informationA STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR
A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR Syu-Siang Wang 1, Jeih-weih Hung, Yu Tsao 1 1 Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan Dept. of Electrical
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationRobust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping
100 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.3, NO.2 AUGUST 2005 Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping Naoya Wada, Shingo Yoshizawa, Noboru
More informationTime-Frequency Distributions for Automatic Speech Recognition
196 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001 Time-Frequency Distributions for Automatic Speech Recognition Alexandros Potamianos, Member, IEEE, and Petros Maragos, Fellow,
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationAuditory Based Feature Vectors for Speech Recognition Systems
Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines
More informationEnhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients
ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds
More informationInternational Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015
RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,
More informationDARK CURRENT ELIMINATION IN CHARGED COUPLE DEVICES
DARK CURRENT ELIMINATION IN CHARGED COUPLE DEVICES L. Kňazovická, J. Švihlík Department o Computing and Control Engineering, ICT Prague Abstract Charged Couple Devices can be ound all around us. They are
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this
More informationPerceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition
Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Aadel Alatwi, Stephen So, Kuldip K. Paliwal Signal Processing Laboratory Griffith University, Brisbane, QLD, 4111,
More informationSpeech and Music Discrimination based on Signal Modulation Spectrum.
Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we
More informationUsing RASTA in task independent TANDEM feature extraction
R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationEffective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a
R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,
More informationIsolated Digit Recognition Using MFCC AND DTW
MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi
More informationAN EFFICIENT SET OF FEATURES FOR PULSE REPETITION INTERVAL MODULATION RECOGNITION
AN EFFICIENT SET OF FEATURES FOR PULSE REPETITION INTERVAL MODULATION RECOGNITION J-P. Kauppi, K.S. Martikainen Patria Aviation Oy, Naulakatu 3, 33100 Tampere, Finland, ax +358204692696 jukka-pekka.kauppi@patria.i,
More informationAudio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23
Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationSPEECH ENHANCEMENT BASED ON ITERATIVE WIENER FILTER USING COMPLEX SPEECH ANALYSIS
SPEECH ENHANCEMENT BASED ON TERATVE WENER FLTER USNG COMPLEX SPEECH ANALYSS Keiichi Funaki Computing & Networking Center, Univ. o the Ryukyus Senbaru, Nishihara, Okinawa, 93-3, Japan phone: +(8)98-895-8946,
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationDERIVATION OF TRAPS IN AUDITORY DOMAIN
DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.
More informationCalibration of Microphone Arrays for Improved Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present
More informationA Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification
A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department
More informationA Real Time Noise-Robust Speech Recognition System
A Real Time Noise-Robust Speech Recognition System 7 A Real Time Noise-Robust Speech Recognition System Naoya Wada, Shingo Yoshizawa, and Yoshikazu Miyanaga, Non-members ABSTRACT This paper introduces
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationA MATLAB Model of Hybrid Active Filter Based on SVPWM Technique
International Journal o Electrical Engineering. ISSN 0974-2158 olume 5, Number 5 (2012), pp. 557-569 International Research Publication House http://www.irphouse.com A MATLAB Model o Hybrid Active Filter
More informationRECENTLY, there has been an increasing interest in noisy
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In
More informationEnhancing the Complex-valued Acoustic Spectrograms in Modulation Domain for Creating Noise-Robust Features in Speech Recognition
Proceedings of APSIPA Annual Summit and Conference 15 16-19 December 15 Enhancing the Complex-valued Acoustic Spectrograms in Modulation Domain for Creating Noise-Robust Features in Speech Recognition
More informationFatigue Life Assessment Using Signal Processing Techniques
Fatigue Lie Assessment Using Signal Processing Techniques S. ABDULLAH 1, M. Z. NUAWI, C. K. E. NIZWAN, A. ZAHARIM, Z. M. NOPIAH Engineering Faculty, Universiti Kebangsaan Malaysia 43600 UKM Bangi, Selangor,
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationCHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS
46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech
More informationTIME-FREQUENCY ANALYSIS OF NON-STATIONARY THREE PHASE SIGNALS. Z. Leonowicz T. Lobos
Copyright IFAC 15th Triennial World Congress, Barcelona, Spain TIME-FREQUENCY ANALYSIS OF NON-STATIONARY THREE PHASE SIGNALS Z. Leonowicz T. Lobos Wroclaw University o Technology Pl. Grunwaldzki 13, 537
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationCepstrum alanysis of speech signals
Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP
More informationHigh Speed Communication Circuits and Systems Lecture 10 Mixers
High Speed Communication Circuits and Systems Lecture Mixers Michael H. Perrott March 5, 24 Copyright 24 by Michael H. Perrott All rights reserved. Mixer Design or Wireless Systems From Antenna and Bandpass
More informationSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure
More informationSolid State Relays & Its
Solid State Relays & Its Applications Presented By Dr. Mostaa Abdel-Geliel Course Objectives Know new techniques in relay industries. Understand the types o static relays and its components. Understand
More informationNoise Removal from ECG Signal and Performance Analysis Using Different Filter
International Journal o Innovative Research in Electronics and Communication (IJIREC) Volume. 1, Issue 2, May 214, PP.32-39 ISSN 2349-442 (Print) & ISSN 2349-45 (Online) www.arcjournal.org Noise Removal
More informationSEG/San Antonio 2007 Annual Meeting. Summary. Morlet wavelet transform
Xiaogui Miao*, CGGVeritas, Calgary, Canada, Xiao-gui_miao@cggveritas.com Dragana Todorovic-Marinic and Tyler Klatt, Encana, Calgary Canada Summary Most geologic changes have a seismic response but sometimes
More informationIMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH
RESEARCH REPORT IDIAP IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH Cong-Thanh Do Mohammad J. Taghizadeh Philip N. Garner Idiap-RR-40-2011 DECEMBER
More information651 Analysis of LSF frame selection in voice conversion
651 Analysis of LSF frame selection in voice conversion Elina Helander 1, Jani Nurminen 2, Moncef Gabbouj 1 1 Institute of Signal Processing, Tampere University of Technology, Finland 2 Noia Technology
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationDetection and direction-finding of spread spectrum signals using correlation and narrowband interference rejection
Detection and direction-inding o spread spectrum signals using correlation and narrowband intererence rejection Ulrika Ahnström,2,JohanFalk,3, Peter Händel,3, Maria Wikström Department o Electronic Warare
More informationImproving ASR performance on PDA by contamination of training data
Improving ASR performance on PDA by contamination of training data Christophe Ris and Laurent Couvreur Multitel & FPMS-TCTS, Avenue Copernic, B-7 Mons, Belgium ris,couvreur@multitel.be Abstract Automatic
More informationPerformance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment
BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity
More informationOptimizing Reception Performance of new UWB Pulse shape over Multipath Channel using MMSE Adaptive Algorithm
IOSR Journal o Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 05, Issue 01 (January. 2015), V1 PP 44-57 www.iosrjen.org Optimizing Reception Perormance o new UWB Pulse shape over Multipath
More informationGammatone Cepstral Coefficient for Speaker Identification
Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia
More informationAudio Restoration Based on DSP Tools
Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationMachine recognition of speech trained on data from New Jersey Labs
Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation
More informationSPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT
SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com
More informationTraditional Analog Modulation Techniques
Chapter 5 Traditional Analog Modulation Techniques Mikael Olosson 2002 2007 Modulation techniques are mainly used to transmit inormation in a given requency band. The reason or that may be that the channel
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationFrequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement
Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation
More informationSimultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array
2012 2nd International Conference on Computer Design and Engineering (ICCDE 2012) IPCSIT vol. 49 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V49.14 Simultaneous Recognition of Speech
More informationA multi-class method for detecting audio events in news broadcasts
A multi-class method for detecting audio events in news broadcasts Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis Computational Intelligence Laboratory, Institute of Informatics and
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationSpread-Spectrum Technique in Sigma-Delta Modulators
Spread-Spectrum Technique in Sigma-Delta Modulators by Eric C. Moule Submitted in Partial Fulillment o the Requirements or the Degree Doctor o Philosophy Supervised by Proessor Zeljko Ignjatovic Department
More informationMultimedia Signal Processing: Theory and Applications in Speech, Music and Communications
Brochure More information from http://www.researchandmarkets.com/reports/569388/ Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Description: Multimedia Signal
More informationImplementation of an Intelligent Target Classifier with Bicoherence Feature Set
ISSN: 39-8753 International Journal o Innovative Research in Science, (An ISO 397: 007 Certiied Organization Vol. 3, Issue, November 04 Implementation o an Intelligent Target Classiier with Bicoherence
More informationAudio Fingerprinting using Fractional Fourier Transform
Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,
More informationA new zoom algorithm and its use in frequency estimation
Waves Wavelets Fractals Adv. Anal. 5; :7 Research Article Open Access Manuel D. Ortigueira, António S. Serralheiro, and J. A. Tenreiro Machado A new zoom algorithm and its use in requency estimation DOI.55/wwaa-5-
More informationRhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University
Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004
More informationElectronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis
International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate
More informationSingle channel speech separation in modulation frequency domain based on a novel pitch range estimation method
RESEARCH Open Access Single channel speech separation in modulation requency domain based on a novel pitch range estimation method Azar Mahmoodzadeh 1, Hamid Reza Abutalebi 1*, Hamid Soltanian-Zadeh 2,3
More informationRobustness (cont.); End-to-end systems
Robustness (cont.); End-to-end systems Steve Renals Automatic Speech Recognition ASR Lecture 18 27 March 2017 ASR Lecture 18 Robustness (cont.); End-to-end systems 1 Robust Speech Recognition ASR Lecture
More informationECEN 5014, Spring 2013 Special Topics: Active Microwave Circuits and MMICs Zoya Popovic, University of Colorado, Boulder
ECEN 5014, Spring 2013 Special Topics: Active Microwave Circuits and MMICs Zoya Popovic, University o Colorado, Boulder LECTURE 13 PHASE NOISE L13.1. INTRODUCTION The requency stability o an oscillator
More informationGlobal Design Analysis for Highly Repeatable Solid-state Klystron Modulators
CERN-ACC-2-8 Davide.Aguglia@cern.ch Global Design Analysis or Highly Repeatable Solid-state Klystron Modulators Anthony Dal Gobbo and Davide Aguglia, Member, IEEE CERN, Geneva, Switzerland Keywords: Power
More informationMOST MODERN automatic speech recognition (ASR)
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 5, SEPTEMBER 1997 451 A Model of Dynamic Auditory Perception and Its Application to Robust Word Recognition Brian Strope and Abeer Alwan, Member,
More informationLousy Processing Increases Energy Efficiency in Massive MIMO Systems
1 Lousy Processing Increases Energy Eiciency in Massive MIMO Systems Sara Gunnarsson, Micaela Bortas, Yanxiang Huang, Cheng-Ming Chen, Liesbet Van der Perre and Ove Edors Department o EIT, Lund University,
More informationSIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM
SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM MAY 21 ABSTRACT Although automatic speech recognition systems have dramatically improved in recent decades,
More informationNonuniform multi level crossing for signal reconstruction
6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven
More informationMODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS
MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationFrequency-Foldback Technique Optimizes PFC Efficiency Over The Full Load Range
ISSUE: October 2012 Frequency-Foldback Technique Optimizes PFC Eiciency Over The Full Load Range by Joel Turchi, ON Semiconductor, Toulouse, France Environmental concerns lead to new eiciency requirements
More informationSpeech Enhancement Using a Mixture-Maximum Model
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 6, SEPTEMBER 2002 341 Speech Enhancement Using a Mixture-Maximum Model David Burshtein, Senior Member, IEEE, and Sharon Gannot, Member, IEEE
More information