A comparative study on feature extraction techniques in speech recognition

Size: px
Start display at page:

Download "A comparative study on feature extraction techniques in speech recognition"

Transcription

1 A comparative study on feature techniques in speech recognition Smita B. Magre Department of C. S. and I.T., Dr. Babasaheb Ambedkar Marathwada University, Aurangabad ABSTRACT Automatic speech recognition (ASR) has created nice strides with the event of digital signal process hardware and software package. But despite of those advances, machines cannot match the performance of their human counterparts in terms of accuracy and speed, especially simply just in case of speaker independent speech recognition. So nowadays good portion of speech recognition analysis is focused on speaker independent speech recognition drawback. The reasons are its big selection of applications, and limitations of obtainable techniques of speech recognition. This paper provides outline of technique developed in each stage of speech recognition. This paper helps in selecting the technique beside their relative advantages & disadvantages. Comparative study of various techniques is finished as per stages. This paper is concludes with the choice on feature direction for developing technique in human pc interface system using Marathi Language. General Terms Modeling technique, speech processing, signal processing Keywords- Speech Recognition; Feature Extraction; MFCC; LPC;PCA;LDA;WAVELET;DTW. 1. INTRODUCTION Speech recognition is also known as automatic speech recognition or computer speech recognition which means understanding voice of the computer and performing any required task or the ability to match a voice against a provided or acquired vocabulary. The task is to getting a computer to understand spoken language. By understand we mean to react appropriately and convert the input speech into another medium e.g. text. Speech recognition is therefore sometimes referred to as speech-to-text (STT).speech processing is one of the exciting areas of signal processing. A speech recognition system consists of a microphone, for the person to speak into; speech recognition software; a computer to take and interpret the speech; a good quality soundcard for input and/or output; a proper and good pronunciation. Ratnadeep R. Deshmukh Department of C.S. and I.T., Dr. Babasaheb Ambedkar Marathwada University, Aurangabad ratnadeep_deshmukh@yahoo.co. in Pukhraj P. Shrishrimal Department of C.S. and I.T., Dr. Babasaheb Ambedkar Marathwada University, Aurangabad pukhraj.shrishrimal@gmail.com 1.1 Topology of Speech Recognition System Speaker Dependent: These systems require a user to train the system according to his or her voice. Speaker Independent Systems: This system does not require a user to train the system i.e. they are developed to operate for any speaker. Isolated word recognizer: Accept one word at a time. These recognition systems allow us to speak naturally continuous. Connected word systems: It allows speaker to speak slowly and distinctly each word with a short pause i.e. planned speech. Spontaneous recognition systems: It allows us to speak spontaneously [3]. 1.2 Overview of Speech Recognition All paragraphs must be indented. All paragraphs must be justified, i.e. both left-justified and right-justified. A speech recognition system consists of five blocks: - Feature, Acoustic modeling, Pronunciation modeling, Decoder. The process of speech recognition begins with a speaker creating an utterance which consists of the sound waves. These sound waves are then captured by a microphone and converted into electrical signals. These electrical signals are then converted into digital form to make them understandable by the speech-system. Speech signal is then converted into discrete sequence of feature vectors, which is assumed to contain only the relevant information about given utterance that is important for its correct recognition. An important property of feature is the suppression of information irrelevant for correct classification such as information about speaker (e.g. fundamental frequency) and information about transmission channel (e.g. characteristic of a microphone). Finally recognition component finds the best match in the knowledge base, for the incoming feature vectors. Sometimes, however the information conveyed by these feature vectors may be correlated and less discriminative which may slow down the further processing. Feature methods like Mel frequency Cepstral coefficient (MFCC) provide some way to get uncorrelated vectors by means of discrete cosine transforms (DCT).

2 index scaling of the repetition frequency, using Δf = (fh fl )/ (M + 1) wherever fh is that the highest frequency of the filter bank on the mel scale, computed from using equation given on top of, fl is that the lowest frequency in mel scale, having a corresponding and M is that the range of filter bank. The values thought-about for the parameters within the gift study are: f = 16KHzand f = 0 Hz.The middle frequencies on the mel scale are given by: Figure 1: Outline of Speech Recognition System [4]. 2. FEATURE EXTRACTION First of all, recording of various speech samples of each word of the vocabulary is done by different speakers. After the speech samples are collected; they are converted from analog to digital form by sampling at a frequency of 16 khz. Sampling means recording the speech signals at a regular interval. The collected data is now quantized if required to eliminate noise in speech samples. The collected speech samples are then passed through the feature, feature training & feature testing stages. Feature transforms the incoming sound into an internal representation such that it is possible to reconstruct the original signal from it. There are various techniques to extract features like MFCC, PLP, RAST, LPCC, PCA, LDA, Wavelet, DTW but mostly used is MFCC. Figure 2: Feature Extraction Diagram [1]. 2.1 MFCC: Mel-Frequency Cepstral Coefficients Mel Frequency Cepstral Coefficients (MFCC) is one amongst the most normally used feature methodology in speech recognition. The technique is named FFT based mostly which implies that feature vectors square measure extracted from the frequency spectra of the windowed speech frames. The Mel frequency filter bank may be a series of triangular bandpass filters. The filter bank relies on a non-linear frequency scale referred to as the mel scale. An a1000 thousand Hz tone is outlined as having a pitch of1000 mel. Below a thousand Hz, the Mel scale is more or less linear to the linear frequency scale. On top of the a thousand Hz point of reference, the connection between Mel scale and therefore the linear frequency scale is non-linear and more or less power. The following equation describes the mathematical relationship between the Mel scale and therefore the linear frequency scale f = ln( f ) The Mel frequency filter bank contains triangular band pass filters in such the simplest way that lower boundary of one filter is placed at the middle frequency of the previous filter and also the higher boundary placed within the center frequency of the next filter. a set frequency resolution within the Mel scale is computed, appreciate a f () = f () + mf () + f (), M m M The center frequency in Hertz, is given by () f = 700(e. 1) Above Equation is inserted into equation of f to give the mel filter bank. Finally, the MFCCs are obtained bycomputing the discrete cosine transform of using c(l) = X (m)cos (l π M (m 1 2 )) For l = 1, 2, 3,.., M where c(l) is the lth MFCC. The time derivative is approximated by a linear regression coefficient over a finite window, which is defined as c (l) = k c (m) G, 1 l M Where is the lthcepstral coefficient at time t and G is a constant used to make the variances of the derivative terms equalto those with the original cepstral coefficients. Figure 3: Steps involved in MFCC Feature Extraction Advantage As the frequency bands are positioned logarithmically in MFCC, it approximates the human system response more closely than any other system Disadvantage MFCC values are not very robust in the presence of additive noise, and so it is common to normalize their values in speech recognition systems to lessen the influence of noise

3 2.1.3 Applications MFCCs are commonly used as features in speech recognition systems, such as the systems which can automatically recognize numbers spoken into a telephone. They are also common in speaker recognition, which is the task of recognizing people from their voices. MFCCs are also increasingly finding uses in music information retrieval applications such as genre classification, audio similarity measures, etc m = 0,1, p Where the highest autocorrelation value, p, is the order of the LPC analysis LPC Analysis The next processing step is the LPC analysis, which converts each frame of p + 1 autocorrelations into LPC parameter set by using Durbin s method. This can formally be given as the following algorithm: 2.2 Linear Prediction Coefficient LPC (Linear Predictive coding) analyzes the speech signal by estimating the formants, removing speech signal, and estimating the intensity and frequency of the remaining buzz. The process is called inverse filtering, and the remaining signal is called the residue. In LPC system, each expressed as a linear combination of the previous samples. This equation is called as linear predictive coding [9]. The coefficients of the difference equation characterize the formants. The basic steps of LPC processor include the following [5]: Preemphasis The digitized speech signals (n)is put through a low order digital system, signal and to make it less susceptible to finite precision effects later in the signal processing. preemphasizer network is related to the input to the network, s (n), by difference equation: s (n) = s(n) as(n 1) Frame Blocking The output of preemphasis step,s (n), is blocked into frames of N samples, with adjacent frames being separated by M samples. If x(n)l is the lth frame of speech, and there are L frames within entire speech signal, then [5]. x (n) = s (Ml + n) windowing After frame blocking, the next step is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. If we define the window as w(n), 0 n N 1, then the result of windowing is the signal: x (n) = x (n)w(n), 0 n N 1 Typical window is the Hamming window, which has the form w(n) = cos 2πn, 0 n N 1 N 1 Autocorrelation Analysis: The next step is to auto correlate each frame of windowed signal in order to give r (m) = x (n) x (n + m), E () = r(0) k = r(i) α () = k E α r( i j ), 1 i p α () = α () k α (), 1 j i 1 E () = (1 k )E By solving above equations recursively for i = 1,2,, p the LPC coefficient, am, is given as a = α () c = k m. c. a, m > p Figure 4: Block Diagram of LPC Advantages One of the most powerful signal analysis techniques is the method of linear prediction. LPC is to minimize the sum of the squared differences between the original speech signal and the estimated speech signal over a finite duration. 2.3 Perceptually Based Linear Predictive Analysis (PLP) PLP analysis models perceptually motivated auditory spectrum by a low order all pole function, using

4 theautocorrelation LP technique. Basic concept of PLP method is shown in block diagram of Fig. 5 Figure 5: Block Diagram of PLP Speech Analysis Method [7]. It involves two major steps: obtaining auditory spectrum, approximating the auditory spectrum by an all pole model. Auditory spectrum is derived From the speech waveform by critical-band filtering, equal loudness curve preemphasis, and intensity loudness root compression. Eighteen critical band filter outputs with their center frequencies equally spaced in bark domain, are defined as Ω(w)=6ln(( ) +((w/1200π)+1). ) Center frequency of kth critical band Ω = 0.994k Advantages PLP coefficients are often used because they approximate well the high-energy regions of the speech spectrum while simultaneously smoothing out the fine harmonic structure, which is often characteristic of the individual but not of the underlying linguistic unit. LPC, however, approximates the speech spectrum equally well at all frequencies, and this representation is contrary to known principles of human hearing. The spectral resolution of human hearing is roughly linear up to 800 or 1000 Hz, but it decreases with increasing frequency above this linear range. PLP incorporates critical-band spectralresolution into its spectrum estimate by remapping the frequency axis to the Bark scale and integrating the energy in the critical bands to produce a critical-band spectrum approximation 2.4 Dynamic Time Warping (DWT) In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences which may vary in time or speed. For instance, similarities in walking patterns could be detected using DTW, even if one person was walking faster than the Π Q = [E(w ) C (w). P(w)dw] c (W)=10 (ΩΩ.), Ω Ω-0.5 = 1, Ω -0.5 Ω Ω +0.5 = 10.(ΩΩ.), Ω +0.5 Ω The output thus obtained is linearly interpolated to give interpolated auditory spectrum. The interpolated auditory spectrum is approximated by fifth order all pole model spectrum. The IDFT of interpolated auditory spectrum provides first six terms of autocorrelation function. These are used in solution of Yule Walker equations [7] to obtain five autoregressive coefficients of all-pole filter. The PLP analysis provides similar results as with LPC analysis but the order of PLP model is half of LP model. This allows computational and storage saving for ASR. Also it provides better performance to cross speaker ASR. other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio and graphics data indeed, any data which can be turned into a linear sequence can be analyzed with DTW. Figure 6: Block Diagram of DWT Advantages Increased speed of recognition. Reduced storing space for the reference template. Constraints could be made in finding the optimal path. Increased recognition rate. A threshold can be used in order to stop the process if the error is too great Disadvantage To find the best reference template for a certain word. Choosing the appropriate reference template is a difficult task 2.5 Wavelet The speech is a non stationary signal. The Fourier transform (FT) isn't suitable for the analysis of such non stationary signal as a result of it gives only the frequency data of signal however doesn't provide the data about at what time which frequency is present. The windowed short-time ft (STFT) provides the temporal data concerning the frequency content of signal. A disadvantage of the STFT is its fastened time resolution attributable to fixed window length. The WT, with its versatile time-frequency window, is an which tool for the analysis of non stationary signals like speech fixed have both short high frequency bursts and long quasi-stationary component also.

5 2.5.1 Advantages Wavelet transform have been used for speech feature in which the energies of wavelet decomposed sub-bands have been used in place of Mel filtered sub-band energies. Because of its better energy compaction property. Wavelet transform-based features give better recognition accuracy than LPC and MFCC The WT has a better capability to model the details of unvoiced sound portions. Better time resolution than Fourier Transform. To compensate for linear channel distortions the analysis library provides the power to perform rasta filtering. The rasta filter is used either within the log spectral or cepstral domains. In result the rasta filter band passes every feature coefficient. Linear channel distortions seem as an additive Figure 8: Block Diagram of RASTA. constant in each the log spectral and therefore the cepstral domains. The high-pass portion of the equivalent band pass filter alleviates the result of convolution noise introduced inthe channel. The low-pass filtering helps in smoothing frame to border spectral changes. 2.7 Principal Component Analysis (PCA) Figure 7: Block Diagram of Wavelet. The main advantage of wavelets is that they offer a simultaneous localization in time and frequency domain. Wavelet is that, using fast wavelet transform, it is computationally very fast. Wavelets have the great advantage of being able to separate the fine details in a signal. Very small wavelets can be used to isolate very fine details in a signal, while very large wavelets can identify coarse details. A wavelet transform can be used to decompose a signal into component wavelets. PCA is thought a Principle part Analysis this is often a statistical analytical tool that's used to explore kind and cluster information. PCA take an oversized variety of correlate (interrelated) variables and rework this information into a smaller variety of unrelated variables (principal components) whereas holding largest quantity of variation, so creating it easier to work the information and build predictions. PCA could be a method of distinguishing patterns in information, and expressing the information in such some way on highlight their similarities and variations. Since a pattern in information is hard to seek out in information of high dimension, wherever the posh of graphical illustration isn't offered, PCA could be a powerful tool for analyzing information. Figure 9: Block Diagram of PCA Advantage High dimensionality reduction techniques Disadvantage The cost of computing DWT as compare to DCT may be higher. It requires longer compression time Application Signal processing. Data compression. Speech recognition. Computer graphics and multi-fractal analysis. 2.6 Relative Spectra Filtering of Log Domain Coefficients (RASTA)

6 2.7.2 Disadvantage 3. COMPARATIVE ANALYSIS The results of PCA depend on the scaling of the variables. The applicability of PCA is limited by certain assumptions made in its derivation. The absence of a probability density model and associated likelihood measure. Various methods for Feature Extraction in speech recognition are broadly shown in the following table 1. Table 1: Feature Extraction Methods [1] Method property comments 2.8 Combined LPC and MFCC The determination algorithms MFCC and LPC coefficients expressing the basic speech features are developed by author. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested by author to improve the reliability of speech recognition system. The recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. Author claimed that, results in decrease of error rate during recognition Steps of Combined Use of Cepstral of MFCC and LPC 1. The speech signals is passed through a first-order FIR high pass filter 2. Voice activation detection (VAD). Locating the endpoints of an utterance in a speech signal with the help of some commonly used methods such as short-term energy estimate Es, short-term power estimate Ps, shortterm zero crossing rate Zs etc. 3. Then the mean and variance for the measures calculated for background noise, assuming that the first 5 blocks are background noise. 4. Framing. 5. Windowing. 6. Calculating of MFCC features. 7. Calculating of LPC features. 8. The speech recognition system consists of MFCC and LPC-based two subsystems. These subsystems are trained by neural networks with MFCC and LPC features The Recognition Process Stages 1. In MFCC and LPC based recognition subsystems recognition processes are realized in parallel. 2. The recognition results of MFCC and LPC based recognition subsystems are compared and the speech recognition system confirms the result, which confirmed by the both subsystems. Principal Component Analysis(PCA) Linear Discriminant Analysis(LDA) Linear Predictive coding Cepstral Analysis Mel-frequency cepstrum (MFFCs) Independent Component Analysis (ICA) Cepstral Analysis Non linear feature Linear map; fast; eigenvector-based Non linear feature Supervised linear map; fast; eigenvector-based Static feature 10 to 16 lower order coefficient, Static feature Power spectrum Power spectrum is computed by performing Fourier Analysis Non linear feature Linear map, iterative non- Gaussian Static feature Power spectrum Traditional, eigenvector based also known as karhuneu-loeve expansion; good for Gaussian data. Better than PCA for classification; It is used for feature Extraction at lower order Used to represent spectral envelope This method is used for find our features Blind course separation, used for de-mixing non- Gaussian distributed sources(features) Used to represent spectral envelope Since the MFCC and LPC [2] methods are applied to the overlapping frames of speech signal, the dimension of feature vector depends on dimension of frames. At the same time, the number of frames depends on the length of speech signal, sampling frequency, frame step, frame length. An author use sampling frequency is 16khs, the frame step is 160 samples, and the frame length is 400 samples. The other problem of speech recognition is the same speech has different time duration. Even when the same person repeats the same speech, it has the different time durations. Author suggested that, for partially removing the problem, time durations are led to the same scale. When the dimension of scale defined for the speech signal increases, then the dimension of feature vector corresponding to the signal also increases. Mel-frequency scale analysis Kernel based feature method Static feature Spectral analysis Non linear transformations, Spectral analysis is done with a fixed resolution along a subjective frequency scale i.e. Mel-frequency scale. Dimensionality reduction leads to better classification and it is used to remove noisy and

7 Wavelet Dynamic feature s i)lpc ii)mfccs Spectral subtraction RASTA filtering Integrated Phoneme subspace method Better time resolution than Fourier Transform Acceleration and delta coefficients i.e. II and III order derivatives of normal LPC and MFCCs coefficients Robust Feature method For Noisy speech A transformation based on PCA+LDA+ICA redundant features, and improvement in classification error It replaces the fixed bandwidth of Fourier transform with one proportional to frequency which allow better time resolution at high frequencies than Fourier Transform It is used by dynamic or runtime Feature It is used basis on Spectrogram It is find out Feature in Noisy data Higher Accuracy than the existing methods Engineering Trends and Technology- Volume4Issue [5] Leena R Mehta, S. P. Mahajan, Amol S Dabhade, Comparative Study of MFCC and LPC For Marathi Isolated Word Recognition System, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering Vol. 2, Issue 6, June 2013 [6] Kayte Charansing Nathoosing, Isolated Word Recognition for Marathi Language using VQ and HMM Science Research Reporter 2(2): , April 2012 [7] Manish P. Kesarkar, Feature Extraction for Speech Recognition M.Tech. Credit Seminar Report, Electronic Systems Group, EE. Dept, IIT Bombay, Submitted November2003. [8] Hynek Hermansky Perceptual Liner Predictive (PLP) Analysis Of Speech speech technology laboratory, division Panasonic technologies, Journal Acoustic Society of America,Vol. 87,No 4.April [9] Bharti W. Gawali, Santosh Gaikwad, Pravin Yannawar, Suresh C. Mehrotra, Marathi Isolated Word Recognition System using MFCC and DTW Features, ACEEE International Journal on Information Technology, Vol. 01, No. 01, Mar [10] К. R. Aida Zade, C. Ardil and S. S. Rustamov, Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems, Proceeding of World Academy of Science, Engineering and Technology Volume 13 May 2006 ISSN CONCLUSION We have discussed some features techniques and their advantages and disadvantages. Some new methods are developed using combination of more techniques. Authors have claimed improvement in performance. There is a need to develop new hybrid methods that will give better performance in robust speech recognition area. 5. REFERENCES [1] Santosh K. Gaikwad, Bharti W. Gawali, Pravin Yannawar A Review on Speech Recognition Technique International Journal of Computer Applications ( ). Volume 10 No.3, November [2] Urmila Shrawankar, Techniques for Feature Extraction in Speech Recognition System: A Comparative Study. International Journal Of Computer Applications In Engineering, Technology and Sciences (IJCAETS), 6 May 2013 [3] M. A. Anusuya, S. K.Katti Speech Recognition by Machine: A Review (IJCSIS) International Journal of Computer Science and Information Security, Vol. 6, No. 3, 2009 [4] Preeti Saini, Parneet Kaur, Automatic Speech Recognition: A Review International Journal of

8

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction by Xi Li A thesis submitted to the Faculty of Graduate School, Marquette University, in Partial Fulfillment of the Requirements

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21 E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Audio processing methods on marine mammal vocalizations

Audio processing methods on marine mammal vocalizations Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio http://labrosa.ee.columbia.edu Sound to Signal sound is pressure

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Signal Processing Toolbox

Signal Processing Toolbox Signal Processing Toolbox Perform signal processing, analysis, and algorithm development Signal Processing Toolbox provides industry-standard algorithms for analog and digital signal processing (DSP).

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Brochure More information from http://www.researchandmarkets.com/reports/569388/ Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Description: Multimedia Signal

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Machine recognition of speech trained on data from New Jersey Labs

Machine recognition of speech trained on data from New Jersey Labs Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System C.GANESH BABU 1, Dr.P..T.VANATHI 2 R.RAMACHANDRAN 3, M.SENTHIL RAJAA 3, R.VENGATESH 3 1 Research Scholar (PSGCT)

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 14 Quiz 04 Review 14/04/07 http://www.ee.unlv.edu/~b1morris/ee482/

More information

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR Syu-Siang Wang 1, Jeih-weih Hung, Yu Tsao 1 1 Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan Dept. of Electrical

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

T Automatic Speech Recognition: From Theory to Practice

T Automatic Speech Recognition: From Theory to Practice Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 27, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University

More information

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003 CG40 Advanced Dr Stuart Lawson Room A330 Tel: 23780 e-mail: ssl@eng.warwick.ac.uk 03 January 2003 Lecture : Overview INTRODUCTION What is a signal? An information-bearing quantity. Examples of -D and 2-D

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

Speech Recognition using FIR Wiener Filter

Speech Recognition using FIR Wiener Filter Speech Recognition using FIR Wiener Filter Deepak 1, Vikas Mittal 2 1 Department of Electronics & Communication Engineering, Maharishi Markandeshwar University, Mullana (Ambala), INDIA 2 Department of

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Lab 3 FFT based Spectrum Analyzer

Lab 3 FFT based Spectrum Analyzer ECEn 487 Digital Signal Processing Laboratory Lab 3 FFT based Spectrum Analyzer Due Dates This is a three week lab. All TA check off must be completed prior to the beginning of class on the lab book submission

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER X. SPEECH ANALYSIS Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER Most vowel identifiers constructed in the past were designed on the principle of "pattern matching";

More information

Speech/Music Change Point Detection using Sonogram and AANN

Speech/Music Change Point Detection using Sonogram and AANN International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 6, Number 1 (2016), pp. 45-49 International Research Publications House http://www. irphouse.com Speech/Music Change

More information

MOST MODERN automatic speech recognition (ASR)

MOST MODERN automatic speech recognition (ASR) IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 5, NO. 5, SEPTEMBER 1997 451 A Model of Dynamic Auditory Perception and Its Application to Robust Word Recognition Brian Strope and Abeer Alwan, Member,

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

Evoked Potentials (EPs)

Evoked Potentials (EPs) EVOKED POTENTIALS Evoked Potentials (EPs) Event-related brain activity where the stimulus is usually of sensory origin. Acquired with conventional EEG electrodes. Time-synchronized = time interval from

More information

Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech

Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech L. Demri1, L. Falek2, H. Teffahi3, and A.Djeradi4 Speech Communication

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

Lecture 6: Speech modeling and synthesis

Lecture 6: Speech modeling and synthesis EE E682: Speech & Audio Processing & Recognition Lecture 6: Speech modeling and synthesis 1 2 3 4 5 Modeling speech signals Spectral and cepstral models Linear Predictive models (LPC) Other signal models

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition Aadel Alatwi, Stephen So, Kuldip K. Paliwal Signal Processing Laboratory Griffith University, Brisbane, QLD, 4111,

More information

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India

More information

Lecture 5: Speech modeling. The speech signal

Lecture 5: Speech modeling. The speech signal EE E68: Speech & Audio Processing & Recognition Lecture 5: Speech modeling 1 3 4 5 Modeling speech signals Spectral and cepstral models Linear Predictive models (LPC) Other signal models Speech synthesis

More information

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23 Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 3: Feature Computation 30 Jan 2013 1 First Step: Feature Extraction Speech recognition is a type of pattern recognition problem

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Speech Coding in the Frequency Domain

Speech Coding in the Frequency Domain Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215 Introduction The speech production model can be used to efficiently encode speech signals.

More information

Lab 8. Signal Analysis Using Matlab Simulink

Lab 8. Signal Analysis Using Matlab Simulink E E 2 7 5 Lab June 30, 2006 Lab 8. Signal Analysis Using Matlab Simulink Introduction The Matlab Simulink software allows you to model digital signals, examine power spectra of digital signals, represent

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Robust Algorithms For Speech Reconstruction On Mobile Devices

Robust Algorithms For Speech Reconstruction On Mobile Devices Robust Algorithms For Speech Reconstruction On Mobile Devices XU SHAO A Thesis presented for the degree of Doctor of Philosophy Speech Group School of Computing Sciences University of East Anglia England

More information

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Modulation Spectrum Power-law Expansion for Robust Speech Recognition Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:

More information

A LPC-PEV Based VAD for Word Boundary Detection

A LPC-PEV Based VAD for Word Boundary Detection 14 A LPC-PEV Based VAD for Word Boundary Detection Syed Abbas Ali (A), NajmiGhaniHaider (B) and Mahmood Khan Pathan (C) (A) Faculty of Computer &Information Systems Engineering, N.E.D University of Engg.

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain

Chapter 3. Speech Enhancement and Detection Techniques: Transform Domain Speech Enhancement and Detection Techniques: Transform Domain 43 This chapter describes techniques for additive noise removal which are transform domain methods and based mostly on short time Fourier transform

More information

Lecture 2: SIGNALS. 1 st semester By: Elham Sunbu

Lecture 2: SIGNALS. 1 st semester By: Elham Sunbu Lecture 2: SIGNALS 1 st semester 1439-2017 1 By: Elham Sunbu OUTLINE Signals and the classification of signals Sine wave Time and frequency domains Composite signals Signal bandwidth Digital signal Signal

More information