MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

Size: px
Start display at page:

Download "MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM"

Transcription

1 Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India Correspondence should be addressed to P. Santhiya Received March 27, 2015; Accepted April 02, 2015; Published June 25, 2015; Copyright: 2015 P. Santhiya et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Cite This Article: Santhiya, P., Jayasankar, T. (2015). Mfcc and gmm based tamil language speaker identification system. Advances in Engineering & Scientific Research, 1(1). 1-6 Speaker identification is a process of automatically identifying who is speaking on the basis of the individual information included in speech waves. Speaker identification is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases. Speaker identification mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker s voice signal that can later be used to represent that speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database. In this paper we present the overview of approaches of feature extraction using Mel Frequency Cepstrum Coefficients, which is most widely used feature in speaker identification system and feature matching GMM technique for Tamil speaker identification. KEY WORDS -Mel Frequency Cepstrum Coefficients, GMM, feature matching, feature extraction,dct. INTRODUCTION Speaker identification (SI) refers to the process of identifying an individual by extracting and processing information from his/her speech.it is a task of finding the best-matching speaker for unknown speaker from a database of known speakers. It is mainly a part of the speech processing. The SI system enables people to have secure information and property access. Speaker identification method can be divided into two categories. In open set SI,a reference model for the unknown speaker may not exist and, thus, an additional decision alternative is required if unknown does not match any of the models. On the other hand, in closed set SI,a set of N distinct speaker models may be stored in the identification system by extracting parameters from the speech samples of N speakers. In speaker identification task, similar parameters from new speech input are extracted first and then decide which one of the N known speakers mostly matches with the input speech parameters[10]. Speaker identification system divided into two methods: text independent and text dependent methods [1].Although text- dependent method requires speaker to provide utterances of the key words or sentences which have the same text for both training and identification trials. But in text- independent method utterances of the key words or sentences provide by speaker during identification trials may be independent of training phase[2]. SPEAKER IDENTIFICATION SYSTEM MODULE Speaker identification system is composed of the following modules: Front-end processing Speaker modelling 1

2 Speaker database Decision logic Figure 1: Block Diagram of Speaker identification system characteristics in the acoustic signal. The main aim of speaker identification is comparing a speech. Speaker database The speaker models are stored here. These models areobtained for each speaker by using feature vector extracted from each speaker.thesemodels are used for identification of unknown speaker during the testing phase. Decision logic It makes the final decision about the identity of the speaker by comparing unknown speaker to all models in the data base and selecting the best matching model. Feature extractionmodule Front- end processing This step is the first step to create feature vectors. It is the signal processing part, which converts the sampled speech signal into set of feature vectors, which characterize the properties of speech that can separate different speakers. Front-end processing is performed both in training and testing phases. The objective in the front-end processing is to modify the speech signal, so that it will be more suitable for feature extraction analysis. Thefront-end processing operation based on noise cancelling, framing, windowing and pre-emphasis. MFCC is one of the important feature extraction techniques for speaker identification technique.the goal of feature extraction is to find a set of properties of an utterance that have acoustic correlations to the speechsignal, that is parameters that can somehow be computed or estimated through processing of the signal waveform. Such parameters are termed as features. And also Speaker Identification System includes the process such as[7] Figure 2:Steps involved in Speaker identification system The goal of feature extraction is to find a set of properties of an utterance that have acoustic correlations to the speech-signal, that is parameters that can somehow be computed or estimated through processing of the signal waveform. Such parameters are termed as features. It includes the process of measuring some important characteristic of the signal such as energy or frequency response, augmenting these measurements with some perceptually meaningful derived measurements and statically conditioning these numbers to form observation vectors. Pre-processing Feature extraction Feature matching 2 Modeling The objective of modeling technique is to generate models for each speaker using specific feature vector extracted from each speaker. It performs a reduction of feature data by modelling the distributions of the feature vectors. Thespeaker recognization is also divided into two parts that means speaker dependant and speaker independent. In the speaker independent mode of the speech recognization the computer should ignore the speaker specific characteristics of the speech signal and extract the intended message.on the other hand in case of speaker dependent mode speechrecognization machine should extract speaker Pre-processing Noise Removal Noise leads to degrade the performance of speaker identification system.the de-noise process done by wavelet decomposition technique. The de-noising process consists of decomposing the original signal, thresholding the detail coefficients, and reconstructing the signal. The decomposition portion of de-noising is accomplished via the DWT. The Discrete Wavelet Transform (DWT) is commonly employed using dyadic multirate filter banks, which are sets of filters that divide a signal frequency band into sub bands. These filter banks are comprised of lowpass, high-pass, or bandpass filters. If the filter banks are

3 wavelet filter banks that consist of special low-pass and high-pass wavelet filters, then the outputs of the low-pass filter are the approximation coefficients. Also, the outputs of the high-pass filter are the detail coefficients. The process of obtaining the approximation and detail coefficients is called decomposition. If a threshold operation is applied to the output of the DWT and wavelet coefficients that are below a specified value are removed, then the system will perform a "de-noising" function. There are two different threshold operations. In the first, hard thresholding, coefficients whose absolute values are lower than the threshold are set to zero. Hard thresholding is extended by the second technique, soft thresholding, by shrinking the remaining nonzero coefficients toward zero. We have picked Daubechies 4 (db4) as our analysis wavelet, three-level decomposition. The three-level decomposition provides sufficient noise reduction. axis following mel scale. To obtain melcepstrum, the voice signal is windowed first using analysis window and then Discrete Fourier Transform is computed. The main purpose of MFCC is to mimic the behaviour of human ears.mel frequency cepstral coefficient estimation includes following process.mfccprocess subdivided into five phases or blocks[7] [8]. Figure 4: Block diagram of MFCC Figure 3: Wavelet Decomposition The a 3 (n) trace represents the third-level approximation coefficients, which are the high-scale, low-frequency components. The approximation coefficients a 3 (n) is similar to the original signal. The other three waveforms (d 3 (n), d 2 (n), and d 1 (n)) are the detail coefficients, which are low-scale, high-frequency components. Mel frequency cepstral coefficient estimation After the process of removing background noises from voice signal has finish, theprocess of feature extraction will begin. Feature extraction is a process of obtaining different features of voice signal such as amplitude, pitch and the vocal tract. It is a task of finding parameter set obtained from the input voice signal. The extracted features should have some criteria in dealing with the speech signal such as [11] : Stable over time Should occur frequently and naturally in speech Should not be susceptible to mimicry Easy to measure extracted speech features Shows little fluctuation from one speaking environment to another Discriminate between speakers while being tolerant of intra speaker variability s Mel Frequency Cepstrum Coefficients (MFCC) to extract features in the voice signal. MFCC focuses on series of calculation that uses cepstrum with a nonlinear frequency Frame blocking Speech is a non-stationary signal. If the frame is too long, signal properties may change too much across the window, affecting the time resolution adversely. If the frame is too short, resolution of narrowband components will be sacrificed, affecting the frequency resolution adversely. There is a trade-off between time resolution and frequency resolution. For example if We choose number of samples in each frame as 256, with the number of samples overlapping between adjacent frames is 128. Overlapping frames are used to capture information that may occur at the frame boundaries. Number of frames is obtained by dividing the total number of samples in the input speech file by 128. For covering all samples of input, last frame may require zero padding. All frames are stored as rows in one single matrix with number of rows equal to number of frames and number of columns equal to 256, which is also equal to the frame width. Figure 5:Frame Of Input Signal 3

4 Windowing The next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame[9]. If we define the window as Figure 6: Windowed Input Signal W (n), 0 n N-1, where N is the number of samples in each frame, then the result of windowing is the signal. y 1 (n) = x 1 (n)*w(n) 0 n N-1 (1) Typically the Hamming window is used, which has the form: W(n)= cos(2πn/N-1) 0 n N-1(2) FFT block Spectral information means the energy levels at different frequencies in the given window. Time domain data is converted into frequency domain to obtain the spectral information. Time domain data is converted to frequency domain by applying Discrete Fourier Transform (DFT) on it. Equation (3) represents DFT. Mel frequency wrapping block Mel-frequency analysis of speech is based on human perception experiments. It has been proved that human ears are more sensitive and have higher resolution to low frequency compared to high frequency. Hence, the filter bank is designed to emphasize the low frequency over the high frequency. Also the voice signal does not follow the linear frequency scale used in FFT. Hence, a perceptual scale of pitches equal in distance, namely Mel scale is used for feature extraction. Mel scale frequency is proportional to the logarithm of the linear frequency, reflecting the human perception. We use log because our ears work in decibels. Figure 2Figure 3Figure 4 shows frequencies in Mel scale plotted against frequencies in linear scale. Equation (5) is used to convert linear scale frequency into Mel scale frequency [3]. (5) Triangular band pass filters or Gaussian filter are used to extract the spectral envelope, which is constituted by dominant frequency components in the speech signal. Thus, Mel-frequency filters are triangular band pass filters or Gaussian filters non-uniformly spaced on the linear frequency axis and uniformly spaced on the Mel frequency axis, with more number of filters in the low frequency region and less number of filters in the high frequency region. Figure 7:Plot of mel frequencies v/s linear frequencies 4, (3) For example,here, x(n) represents input frame of 256 samples and X(k) represents its equivalent DFT. We use 256-point FFT algorithm to convert each frame of 256 samples into its equivalent DFT. FFT output is a set of complex numbers i.e. real and imaginary parts. Speech recognition systems deal with real data. Hence, complex value is always ignored. If we assume the real and imaginary parts of X(k) as Re(X(k)) and Im(X(k)), then the spectral magnitude of the speech signal can be obtained by using equation ( 4). Spectral magnitudes of each frame are stored as rows in one single matrix with number of rows equal to number of frames a number of columns equal to 256, which is also equal to the frame width. = ( ) ( ) (4) Magnitude response of each filter is equal to unity at the Centre and decreases linearly to zero at the Centre frequencies of two adjacent filters. We use Mel frequency filter bank with 20 triangular or Gaussian overlapping filters. It is observed that as m increases, the difference between centres of two adjacent filters increases in linear scale and remains the same in Mel scale. We use 20 filters. Hence, Q = 20. FFT spectrum is passed through Mel filters to obtain Mel spectrum. Mel frequency filter bank is applied to every 256 samples frame and the filtered response is computed. In frequency domain, filtering is obtained by

5 multiplying the FFT of signal and transfer function of the filter on element by element basis. Logarithm of filter energies Human ears smooth the spectrum and use the logarithmic scale approximately. We use equation (6) to compute the log-energy i.e. logarithm of the sum of filtered components for each filter The value of m ranges from 1 to Q, where Q is the number of filters. Thus, each bin per frame per filter holds the log-energy obtained by computing logarithm of weighted sum of spectral magnitudes in that filter-bank channel. Hence, we get 20 numeric values for each frame at the output of this stage. Output of this stage is stored in a matrix with number of rows equal to number of frames and number of columns equal to 20 i.e. number of filters in the filter bank. DCT block The discrete cosine transform (DCT) converts the log power spectrum Mel frequency domain into time domain. DCT gathers most of the information of the signal to its lower order coefficients, resulting in significant reduction in computational cost. Equation (7) represents the discrete cosine transform. (7) Where c(l) represents Mel Frequency Cepstral Coefficient The value of l ranges between 8 and 19. We choose l as 19. Hence, we obtain 19 coefficients for each frame. At the output of this stage, we get a matrix with number of rows equal to the number of frames and number of columns equal to l = 19. Thus, cepstral analysis is performed on Mel-spectrum to obtain Mel Frequency Cepstrum Coefficients (MFCC). Figure 8: Cepstrum of speech sample Feature matching module The use of Gaussian Mixture models for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker dependent spectral shapes and the capability of Gaussian mixtures to model arbitrary densities[5] [6]. A GMM is the weighed sum of M component densities, given by the equation, P( /) = (8) Where is a sequence of feature vectors from the audio data, is D dimensional speech feature vector,bi( ), i=1...m are component densities and pi,i=1...m are the mixture weights.each component density is a D variate Gaussian function of the form, (9) with mean vector, and covariance matrix Σi. The mixture weights are such that (10) For speaker identification, each speaker is represented by a GMM i, which is completely parameterized by its mixture weights, means and covariance matrices collectively represented as i = {p i,,σi } (11) For computational ease and improved performance, the covariance matrices are constrained to be diagonal. There are two principal motivations for using GMMs to model speaker identity. The first is that the components of such a multi-modal density may represent some underlying set of acoustic classes. It is reasonable to assume that the acoustic space corresponding to a speakers voice can be characterized by a set of acoustic classes representing some broad phonetic events such as vowels, nasals or fricatives. These acoustic classes reflect some general speakerdependent vocal tract configurations that are useful for characterizing speaker identity. The spectral shape of the i th acoustic class can in turn be represented by the mean i and covariance matrix Σi. Because all the training or testing speech is unlabelled, the acoustic classes are hidden in that the class of an observation is unknown. The second motivation for using Gaussian mixture densities for speaker identification is that a linear combination of Gaussian basis functions is capable of modelling a large class of sample distributions. A GMM can form smooth approximations to arbitrarily shaped densities. There are several techniques that can be used to estimate the parameters of a GMM,, which describes the distribution of the training feature vectors. By far the most popular and well-established is Maximum Likelihood (ML) estimation. 5 These GMMs are trained separately on each speaker s enrolment data using the Expectation Maximization(EM)

6 algorithm. The update equations that guarantee a monotonic increase in the model s likelihood value are: Mixture Weights: Means: Variances: = ) (12) = (13) 2 = - 2 (14) 2 Where σ i, x t and i are elements of 2, and respectively.the a posterriori probability for acoustic class i given by p = (15) In speaker identification, given a group of speakers S = {1, 2...M}, represented by GMMs1,2,3 s,the objective is to find the speaker model which has the maximum a posteriori probability for a given test sequence, = arg (16) Assuming that all speakers are equally likely and that the observations are independent, and since p(x) is same for all speakers, this simplifies to = arg ( )(17) Each GMM outputs a probability for each frame, which is multiplied across all the frames. The classifier makes a decision based on these product posterior probabilities. And GMM, Journal of Research in Electrical and Electronics Engineering (ISTP-JREEE), Volume 3, Issue 2, March 2014 [4] Seiichi Nakagawa, Longbiao Wang and Shinji Ohtsuka, Speaker Identification and Verification by combining MFCC and Phase Information, IEEE transactions on Audio, Speech, and Language Processing, Vol. 20, No 4, May [5] D. A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models, Speech Communication., Vol. 17,No. 1 2, pp , [6] D. A. Reynolds, T. F. Quatieri, and R. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Processing,Vol.10, No. 1 3, pp , [7] Md.RashidulHasan, Mustafa Jamil, Md. GolamRabbani Md. SaifurRahman,"Speaker Identification Using Mel Frequency Cepstral Coefficients",3rd International Conference on Electrical &ComputerEngineering ICECE, Dhaka, Bangladesh, December [8] Maxim Sidorov, Alexander Schmitt, Sergey Zablotskiy, Wolfgang Minker, Institute of Communications Engineering,University of Ulm, Germany, Survey of Automated Speaker Identification Methods, th International Conference on Intelligent Environments. [9] M.G.Sumithra&A.K.DevikaDepartment of Elec-tronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, A Study on Feature Extraction Techniques for Text Independent Speaker Identification, 2012 [10] Alfredo Maesa,FabioGarzia, Text independent Automatic Recognition using Mel frequency cepstrum coefficient and Gaussian Mixture Model, IEEE Proceedings Volume 3,No-4,OCT [11] Nisha V.S, M.Jayasheela, Survey on Feature Extraction and Matching Techniques for Speaker Recognition Systems, International Journal of Advanced Research in Electronics and Communica-tion Engineering (IJARECE) Volume 2, Issue 3, March CONCLUSION The performances of speaker identification system using various feature extraction and matching techniques. MFCC algorithm is used in our system because it has least false acceptance ratio. I n order to improve system performance and also to achieve high accuracy GMM model can be used in feature matching technique. The speakers were trained and tested by using MFCC and GMM model. They give better identification rate for speaker features. In future work, this technique integrated the pitch information with MFCC and also to analyse the speaker identification system performance in the presence of noise. REFERENCES: 6 [1] Douglas A. Reynolds and Richard C. Rose, Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models, IEEE Transactions on Speech and Audio Processing, January 1995, Vol. 3, No. 1. [2] Joseph P.Campell.JR, Speaker Recognition: ATutorial. IEEE ProcVol 85, No 9, September1997. [3] AshutoshParab, JoyebMulla, PankajBhadoria, and VikramBangar, Speaker Recognition Using MFCC

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds

More information

Discrete Fourier Transform (DFT)

Discrete Fourier Transform (DFT) Amplitude Amplitude Discrete Fourier Transform (DFT) DFT transforms the time domain signal samples to the frequency domain components. DFT Signal Spectrum Time Frequency DFT is often used to do frequency

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23 Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

FPGA implementation of DWT for Audio Watermarking Application

FPGA implementation of DWT for Audio Watermarking Application FPGA implementation of DWT for Audio Watermarking Application Naveen.S.Hampannavar 1, Sajeevan Joseph 2, C.B.Bidhul 3, Arunachalam V 4 1, 2, 3 M.Tech VLSI Students, 4 Assistant Professor Selection Grade

More information

VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW

VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW ANJALI BALA * Kurukshetra University, Department of Instrumentation & Control Engineering., H.E.C* Jagadhri, Haryana, 135003, India sachdevaanjali26@gmail.com

More information

Implementing Speaker Recognition

Implementing Speaker Recognition Implementing Speaker Recognition Chase Zhou Physics 406-11 May 2015 Introduction Machinery has come to replace much of human labor. They are faster, stronger, and more consistent than any human. They ve

More information

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Separating Voiced Segments from Music File using MFCC, ZCR and GMM Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.

More information

Audio Restoration Based on DSP Tools

Audio Restoration Based on DSP Tools Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Text and Language Independent Speaker Identification By Using Short-Time Low Quality Signals

Text and Language Independent Speaker Identification By Using Short-Time Low Quality Signals Text and Language Independent Speaker Identification By Using Short-Time Low Quality Signals Maurizio Bocca*, Reino Virrankoski**, Heikki Koivo* * Control Engineering Group Faculty of Electronics, Communications

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks SGN- 14006 Audio and Speech Processing Pasi PerQlä SGN- 14006 2015 Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks Slides for this lecture are based on those created by Katariina

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

CS 188: Artificial Intelligence Spring Speech in an Hour

CS 188: Artificial Intelligence Spring Speech in an Hour CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

Audio and Speech Compression Using DCT and DWT Techniques

Audio and Speech Compression Using DCT and DWT Techniques Audio and Speech Compression Using DCT and DWT Techniques M. V. Patil 1, Apoorva Gupta 2, Ankita Varma 3, Shikhar Salil 4 Asst. Professor, Dept.of Elex, Bharati Vidyapeeth Univ.Coll.of Engg, Pune, Maharashtra,

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

Perceptive Speech Filters for Speech Signal Noise Reduction

Perceptive Speech Filters for Speech Signal Noise Reduction International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science

More information

A Wavelet Based Approach for Speaker Identification from Degraded Speech

A Wavelet Based Approach for Speaker Identification from Degraded Speech International Journal of Communication Networks and Information Security (IJCNIS) Vol., No. 3, December A Wavelet Based Approach for Speaker Identification from Degraded Speech A. Shafik, S. M. Elhalafawy,

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Optical Channel Access Security based on Automatic Speaker Recognition

Optical Channel Access Security based on Automatic Speaker Recognition Optical Channel Access Security based on Automatic Speaker Recognition L. Zão 1, A. Alcaim 2 and R. Coelho 1 ( 1 ) Laboratory of Research on Communications and Optical Systems Electrical Engineering Department

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

The Delta-Phase Spectrum with Application to Voice Activity Detection and Speaker Recognition

The Delta-Phase Spectrum with Application to Voice Activity Detection and Speaker Recognition 1 The Delta-Phase Spectrum with Application to Voice Activity Detection and Speaker Recognition Iain McCowan Member IEEE, David Dean Member IEEE, Mitchell McLaren Student Member IEEE, Robert Vogt Member

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido The Discrete Fourier Transform Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido CCC-INAOE Autumn 2015 The Discrete Fourier Transform Fourier analysis is a family of mathematical

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System Jordi Luque and Javier Hernando Technical University of Catalonia (UPC) Jordi Girona, 1-3 D5, 08034 Barcelona, Spain

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

VQ Source Models: Perceptual & Phase Issues

VQ Source Models: Perceptual & Phase Issues VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu

More information

Identification of disguised voices using feature extraction and classification

Identification of disguised voices using feature extraction and classification Identification of disguised voices using feature extraction and classification Lini T Lal, Avani Nath N.J, Dept. of Electronics and Communication, TKMIT, Kollam, Kerala, India linithyvila23@gmail.com,

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

HIGH RESOLUTION SIGNAL RECONSTRUCTION

HIGH RESOLUTION SIGNAL RECONSTRUCTION HIGH RESOLUTION SIGNAL RECONSTRUCTION Trausti Kristjansson Machine Learning and Applied Statistics Microsoft Research traustik@microsoft.com John Hershey University of California, San Diego Machine Perception

More information

Binaural Speaker Recognition for Humanoid Robots

Binaural Speaker Recognition for Humanoid Robots Binaural Speaker Recognition for Humanoid Robots Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader Université Pierre et Marie Curie Institut des Systèmes Intelligents et de Robotique, CNRS UMR 7222

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER X. SPEECH ANALYSIS Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER Most vowel identifiers constructed in the past were designed on the principle of "pattern matching";

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Chapter 5 Window Functions. periodic with a period of N (number of samples). This is observed in table (3.1).

Chapter 5 Window Functions. periodic with a period of N (number of samples). This is observed in table (3.1). Chapter 5 Window Functions 5.1 Introduction As discussed in section (3.7.5), the DTFS assumes that the input waveform is periodic with a period of N (number of samples). This is observed in table (3.1).

More information

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Journal of Information & Computational Science 8: 14 (2011) 3027 3034 Available at http://www.joics.com An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Jianguo JIANG

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION Mr. Jaykumar. S. Dhage Assistant Professor, Department of Computer Science & Engineering

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Roberto Togneri (Signal Processing and Recognition Lab)

Roberto Togneri (Signal Processing and Recognition Lab) Signal Processing and Machine Learning for Power Quality Disturbance Detection and Classification Roberto Togneri (Signal Processing and Recognition Lab) Power Quality (PQ) disturbances are broadly classified

More information

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks C. S. Blackburn and S. J. Young Cambridge University Engineering Department (CUED), England email: csb@eng.cam.ac.uk

More information

Advanced Music Content Analysis

Advanced Music Content Analysis RuSSIR 2013: Content- and Context-based Music Similarity and Retrieval Titelmasterformat durch Klicken bearbeiten Advanced Music Content Analysis Markus Schedl Peter Knees {markus.schedl, peter.knees}@jku.at

More information

ICA & Wavelet as a Method for Speech Signal Denoising

ICA & Wavelet as a Method for Speech Signal Denoising ICA & Wavelet as a Method for Speech Signal Denoising Ms. Niti Gupta 1 and Dr. Poonam Bansal 2 International Journal of Latest Trends in Engineering and Technology Vol.(7)Issue(3), pp. 035 041 DOI: http://dx.doi.org/10.21172/1.73.505

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Analysis of LMS Algorithm in Wavelet Domain

Analysis of LMS Algorithm in Wavelet Domain Conference on Advances in Communication and Control Systems 2013 (CAC2S 2013) Analysis of LMS Algorithm in Wavelet Domain Pankaj Goel l, ECE Department, Birla Institute of Technology Ranchi, Jharkhand,

More information