Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices

Size: px
Start display at page:

Download "Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices"

Transcription

1 Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Hemant A.Patil 1, Pallavi N. Baljekar T. K. Basu 3 1 Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat,India. Manipal Institute of Technology (MIT), Manipal, Karnataka, India. 3 Institute of Technology and Marine Engineering (ITME), Amira, West Bengal, India. hemant_patil@daiict.ac.in, pallavi.baljekar@learner.manipal.edu, basutk0@yahoo.co.in Abstract. In this paper, various temporal features (i.e., zero crossing rate and short-time energy) and spectral features (spectral flux and spectral centroid) have been derived from the Teager energy operator (TEO) profile of the speech waveform. The efficacy of these features has been analyzed for the classification of normal and dysphonic voices by comparing their performance with the features derived from the linear prediction (LP) residual and the speech waveform. In addition, the effectiveness of fusing these features with state-ofthe-art Mel frequency cepstral coefficients (MFCC) feature-set has also been investigated to understand whether these features provide complementary results. The classifier that has been used is the nd order polynomial classifier, with experiments being carried out on a subset of the Massachusetts Eye and Ear Infirmary (MEEI) database. Keywords: Dysphonia, TEO, LP residual, zero-crossing rate, short-time energy, spectral flux, spectral centroid and polynomial classifier. 1 Introduction The main motivation for investigating features for dysphonia detection is to build a robust and reliable system for non-intrusive evaluation of a patient s voice to detect pathologies in the larynx and the vocal tract. Pathologies such as vocal nodules, cysts, polyps, etc. are nodular masses which are present either on the glottis or along the walls of the vocal tract. As a result, they change the airflow properties through the glottis, either due to increased mass of the vocal folds which causes a change in the periodicity of vibration of the vocal folds, or due to the incomplete closure of the vocal folds due to the presence of these masses on the edge of the vocal folds. On the other hand, pathologies such as paralysis, which is caused due to damage to the recurrent and/or superior laryngeal nerve, effects the motor function of the larynx and thus causes asymmetric vibration of the vocal folds, which may cause transient or permanent diplophonia. Thus, the result of the presence of these pathologies is to modify the airflow properties, especially at the source. Thus, many of the parameters

2 which have been developed for voice pathology detection have been derived from the linear prediction (LP) residual [1] or the Electroglottograph (EGG) [1] which are considered to be representative of the airflow properties at the glottis. These features characterize the variability at the source either in amplitude represented as shimmer [1] or fundamental frequency (i.e., the pitch) called jitter [1].In pathological voices, due to incomplete closure of the vocal folds there is an escape of air, which increases turbulence perceived in the voice. Thus, apart from these perturbation measures (i.e., shimmer and jitter) there are also various noise measures [] that have been derived from the speech signal to exploit this perceived turbulence in the voice. In this paper, an attempt is made to derive features from the Teager energy operator (TEO) profile of the speech signal, which captures the glottal airflow properties in a more effective manner by also accounting for the nonlinear sources of voice production namely the vortices. In this study, four features (two temporal features and two spectral features) have been used. These features have also been extracted from the speech signal and the LP residual to compare the performance of the TEO profile. The organization of the paper is as follows. In Section, robustness of the TEO in capturing the source related information is discussed. In Section 3, the computational details of the features used are briefly discussed. Section 4 gives details of the experimental setup and describes the experiments conducted and the results obtained. Finally, Section 5 summarizes our findings and discusses future research directions. TEO as Source Information The TEO was first proposed by the Teagers in [3]. The Teagers showed that the airflow is not laminar as assumed by the linear source-filter theory, but separates into various paths leading to the generation of vortices which provide the excitation to the vocal tract during the closed phase. The TEO is an operator which captures the energy of these vortices. It is proportional to the square of amplitude and frequency. It is defined in both the discrete and continuous domain. For the discrete case it is defined as: ψ x n = x n x n 1. x n+ 1 Aω { ( )} ( ) ( ) ( ) for small values of ω i.e., s i n( ω ) ω (1). Fig. 1(a) and (b) depict the speech signal and corresponding differenced EGG taken from the ARTIC-CMU database [4]. Fig. 1(c) shows the corresponding TEO profile. It is interesting to see in this figure,that the peaks in the TEO profile are in close proximity of locations corresponding to the differenced EGG waveform which corresponds to the glottal closure instants (GCI), indicating that the TEO successfully captures the airflow properties at the glottis (in particular glottal activity). Moreover, the height of the peaks in the TEO profile is correlated to the peaks in the differenced EGG waveform, thus proving that the TEO profile of speech is a robust indicator of the airflow properties at the source, i.e., the glottis. Fig., depicts the TEO profile for a normal and pathological speaker suffering from vocal nodules taken from Massachusetts Eye and Ear Infirmary (MEEI)

3 database. As can be seen, for a normal speaker, due to complete glottal closure, there is not much turbulence at the source which is reflected in the regularity of the TEO profile peaks. On the other hand, in the case of the pathological voice, due to incomplete closure, there is increased turbulence at the source, which is reflected in the irregular structure in the running estimate of signal energy via TEO profile. This evidence again reiterates the fact that TEO is very good at capturing the airflow properties at the glottal source. Fig. 1. TEO as a source feature: (a) speech signal (b) differenced EGG pulses corresponding to GCI (c) corresponding TEO profile of speech. Fig.. TEO profile of speech for (a) normal phonation and (b) person suffering from vocal nodules. Two prominent differences that can be observed in the plots in Fig. are, firstly the peaks for the normal phonation are all almost the same height, whereas the peaks in the TEO profile for the pathological case are more non-uniform in height, showing greater variability in the energy at the GCI and thus showing greater amplitude variability. Secondly, the zero-crossing rate (ZCR) at the GCI seems to be uniform in case of the normal speech, which has hardly any zero-crossings between the GCI, whereas, in case of the pathological voice, there are increased number of zero crossings, especially in between the GCI due to the escape of air caused by the incomplete closure of vocal folds thus causing an increased turbulence perceived in the speech. Thus, we need to extract suitable temporal and spectral features which exploit these characteristics and give high classification accuracy.

4 3 Features Used The main requirements for selecting the features was firstly, it should exploit the characteristics of a pathological voice, i.e., increased noise components, breakdown of periodic structure, greater high frequency components and greater amplitude variations. Secondly, it should be simple to compute and reproduce by anyone and lastly, we wanted these features to be independent of determination of the pitch period. Thus, for this application we used four conventional features, two temporal features, the zero-crossing rate (ZCR) and short-time energy (STE) and two spectral features, spectral flux (SF) and spectral centroid (SC). The performance of these features was also compared with Mel frequency cepstral coefficients (MFCC) derived from the speech signal, since this is the state-of-the art feature which has given very good classification accuracy [5]. This section describes the computational details of these temporal and spectral features. Short-time Average Zero Crossing Rate (ZCR). As defined in [6] a zero-crossing is said to occur if there is an algebraic sign change between two consecutive samples in a speech signal. Thus, the rate at which the zero crossings occur is an indication of the frequency of the signal. It is defined as, 1 Z n = sgn[x(m)] sgn[x(m 1 )].w(n m), () N m= where, Z n is the ZCR for n th frame, N is the frame length and w(n) is the Hamming window. ZCR has been used for voiced/unvoiced detection [6], since it is known that unvoiced speech has higher frequency content. Thus, as it has been shown in various previous studies that the pathological voices have a much higher frequency content than the normal speech signals [1], we thought this feature would be suitable as an indication of the frequency content of a signal and the presence of noise components, since higher frequency components are mainly attributed to the presence of noise. Short-Time Energy (STE). It is known that the pathological speech has greater amplitude variations as compared to the normal voices, and was the major motivation for using the shimmer parameter in earlier studies [1]. The short-time energy is another parameter which has been used to account for the amplitude variations, especially to distinguish voiced/unvoiced speech [6]. In this paper, we wanted to investigate, whether it could capture the difference in amplitude variations in the pathological speech as compared to the normal speech. The short-time energy is defined as: En = [ x(m).w(n m) ], (3) () m= where E n is the energy of the n th frame, x(m) is the signal and w(n) is the Hamming window. Spectral Centroid (SC). The spectral centroid measures the brightness of sound, i.e., it measures where most of the power in a speech segment is located. It has been

5 previously used for detection of clinical depression [7] and for speech recognition where the spectral centroid in various sub-bands was obtained, and it was found that this feature is similar to the formant frequencies and provides complementary information to cepstral features and is also robust to noise [8]. It is the weighted average of the frequency of the spectrum and thus would give us an idea as to what frequency range most of the power of spectrum would lie in. We wanted to analyze if there is a shift in the spectral centroid for the pathological voice towards the higher frequencies. The spectral centroid is defined as: N 1 N 1 = ( ) ( ) ( ), (4) k= 0 k= 0 SC X k F k X k where X(k) represents the weighted frequency value, or magnitude of bin number k, and F(k) represents the center frequency of that bin. Spectral Flux (SF). The spectral flux is a defined as the difference in the power spectra of two consecutive speech frames. Thus, it measures the frame-to-frame variability in the spectral shape. It has been previously used for detecting depression [7] and for speaker recognition [9], among other applications. Since the pathological voice is known to be less periodic than the normal signals, it is expected that a pathological voice signal will have higher frequency variations, which motivated the use of jitter parameter. We thought this parameter would capture this breakdown of periodic structure and variation in the frequency content in the pathological voice. The spectral flux is defines as: ( N ) 1, k= N SF( n) = H ( X ( n, k) ( X ( n 1, k) ) (5) H ( x) = is the half- where X ( n, k ) is the k th frequency bin of the n th frame and wave rectifier function. x+ x 4 Experimental Results The corpus used for the experiments is the commercially available MEEI database [10]. For this work, a subset of 173 pathological and 53 normal speakers was used according to []. The MEEI database consists of samples either at 50 khz or 5 khz, hence all samples were downsampled to 5 khz sampling frequency. Since the number of pathological samples is approximately 3 times the normal samples, 1 s of pathological data for patients and 3 s of normal data of sustained phonation /ah/ for control people were used for training and testing. The signals were blocked into frames of 56 samples corresponding to 10. ms. The features extracted per frame were given to a nd order polynomial classifier to generate the true and false scores [11]. A 4-fold cross-validation scheme repeated 1 times giving a total of 48 trials was carried out, using 75% samples for training and 5% for testing, with the training

6 and testing subsets kept independent of each other. The classification accuracy (Ac), was calculated as an average for all these 48 trials (i.e., 688 genuine and 688 impostor trials). (a) (b) (c) (d) (c) Fig. 3. DET plots for comparison of features derived from LP residual, TEO profile of speech and speech waveform for following features: (a) ZCR (b) STE (c) SC (d) SF Table 1. A Comparison of the Temporal and Spectral features derived from the speech waveform, the LP residual (LP Res) and the TEO profile of speech. Features EER Ac ( %) (%) ZCR-Speech ZCR-LP Res ZCR-TEO STE-Speech STE-LP Res STE-TEO SC-Speech SC-LP Res SC-TEO SF-Speech SF-LP Res SF-TEO Table. Comparison of results of fusing the features derived from TEO profile with MFCC. Features EER Ac (%) (%) MFCC MFCC+ZCR MFCC+STE MFCC+SF MFCC+SC MFCC+ZCR+ STE+SC+SF

7 (a) (b) (c) Fig. 4. DET curves for feature-level fusion of MFCC with (a) ZCR, (b) SC and (c) ZCR+STE+SF+SC The detection error tradeoff (DET) plots [1] comparing the performance of the speech waveform, its TEO profile and the LP residual has been plotted for each of the 4 features in Fig 3 and the corresponding equal error rate (EER) values and accuracies have been listed in Table 1. As can be seen, in 3 of the 4 cases the TEO profile performs the best in comparison to the LP residual and the speech waveform. Only in case of the spectral flux feature the LP residual performs better. From among the 4 features we can see that the ZCR feature performs the best. Apart from this we also investigated whether these features provided complementary information to the MFCC features computed according to [13]. The values have been listed in Table. From Table we see that by fusing each feature separately with MFCC, ZCR and SC gives the lowest EER. However, on plotting the DET curves shown in Fig 4(a) and (b) for the two it was seen that for the SC parameter, certain points came very close to the DET curve for MFCC, while in case of ZCR the two DET curves were separated from each other at all points. Thus we see that as expected from the results of the individual features, ZCR performs the best when fused with MFCC followed by SC and STE, while fusion with SF actually reduces the classification accuracy, thus showing that it does not give any complementary information. We also fused MFCC with the ZCR, STE, SF and SC. Fig 4(c) depicts the DET plot of the fusion of MFCC+ZCR+STE+SC+SF. It can be observed that there not much of an improvement in EER in Fig 4(c) as compared to 4(a). Thus, even though, this fusion did increase the classification accuracy but this increase wasn t very much more than ZCR+MFCC. So, the increase in accuracy doesn t outweigh the increase in computation and so the best feature from our analysis is the ZCR feature which does well both individually as well as when fused with MFCC. The reason for the good performance by ZCR can be understood intuitively from the TEO profile plots shown in Fig. for the sustained phonation of normal person and a person suffering from vocal nodules. As can be observed from the plots, the ZCR for the normal person is almost periodic, with zero-crossings occurring approximately at each GCI. Whereas, for the pathological voice due to increased

8 turbulence at the source, there are lot of spurious zero-crossings between the peaks as well. Thus, due to this reason, the pathological voices show a significantly high ZCR and thus this feature performs the best in this dichotomy. 5 Summary and Conclusion In this paper, it is observed that most of the spectral and temporal features derived from the TEO profile of speech signal give a relatively better performance as compared to the same features derived from either the LP residual or the speech waveform. Consequently, we believe that the TEO is a more effective operator to capture the glottal airflow properties as compared to the LP residual. The advantage of proposed method is that it uses simple features, which can be implemented and computed easily and is independent of adhoc pitch estimation methods. Moreover, we have shown that most of the proposed features do provide some complementary information to the MFCC features and increase the classification accuracy by almost 1% in most cases, and decrease the EER by 1% as well. From this work, we infer that the ZCR gave the best relative performance and hence we would like to investigate this feature further by analyzing the ZCR in different frequency bands, and changing the analysis window length and degree of polynomial classifier, to obtain the best performance. Moreover, it is evident that the characteristics of the TEO profile in the vicinity of the GCI play a significant role and thus in the future we would like to develop speech features which capture this information for voice pathology classification. Acknowledgements Authors would like to thank the authorities of DA-IICT Gandhinagar for their kind support to carry out this work. They would also like to thank ECE Dept. Manipal Institute of Technology, India for providing the MEEI database, without which this work would not have been possible. References 1. Davis, S.B.: Acoustic Characteristics of Normal and Pathological Voices. Haskins Laboratories: Status Report on Speech Research, vol. 54, pp (1978). Parsa, V., Jamieson, D.G.: Identification of Pathological Voices Using Glottal Noise Measures. J. Speech, Language, Hearing Res., vol.43, no., pp , (000) 3. Teager, H. M., Teager, S. M., Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract. In Speech Production and Speech Modelling, W.J. Hardcastle, and A. Marchal (eds.). pp , Kluwer, Netherlands (1990) 4. CMU-ARCTIC speech synthesis databases, 5. Markaki, M., Stylianou, Y., Arias-Londoño, J.D., Godino-Llorente, J.I.: Dysphonia Detection Based on Modulation Spectral Features and Cepstral Coefficients. In IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp (010)

9 6. Rabiner, L. R., Schafer R. W.: Digital Processing of Speech Signals. Prentice- Hall, Englewood Cliffs, NJ. (1978). 7. Low, L.S.A., Maddage, N.C., Lech, M., Sheeber, L., Allen, N.: Influence of Acoustic Low-Level Descriptors in the Detection of Clinical Depression in Adolescents. In IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp (010) 8. Paliwal, K.K.: Spectral Subband Centroid Features for Speech Recognition. In IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), (1998) 9. Hossienzadeh, D., Krishnan, S.: Combining Vocal Source and MFCC Features for Enhanced Speaker Recognition Performance using GMMs. In Proc of IEEE 9th Workshop on Multimedia Signal Processing, pp (007) 10. Kay Elemetrics Corp, Disordered Voice Database Model 4337, Version 1.03, Massachusetts Eye and Ear Infirmary Voice and Speech Lab. (00) 11. Campbell, W. M., Assaleh, K. T., Broun, C. C.: Speaker Recognition with Polynomial Classifiers. In: IEEE Transactions on speech and audio processing, vol. 10, no. 4, pp (00) 1. Martin, A.F., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET Curve in Assessment of Detection Task Performance. In: Proc. Eurospeech 97, Vol. 4, pp Rhodes, Greece (1997) 13. Davis, S. B., Mermelstein, P.: Comparison on Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences. In: IEEE, Transactions on Acoustics, Speech, And Signal Processing, vol. ASSP-8, no.4, pp (1980)

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

Advances in Speech Signal Processing for Voice Quality Assessment

Advances in Speech Signal Processing for Voice Quality Assessment Processing for Part II University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Bilbao, 2011 September 1 Multi-linear Algebra Features selection 2 Introduction Application:

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE 1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

For Review Only. Voice Pathology Detection and Discrimination based on Modulation Spectral Features

For Review Only. Voice Pathology Detection and Discrimination based on Modulation Spectral Features is obtained. Based on the second approach, spectral related features have been defined such as the spectral flatness of the inverse filter (SFF) and the spectral flatness of the residue signal (SFR) [].

More information

Basic Characteristics of Speech Signal Analysis

Basic Characteristics of Speech Signal Analysis www.ijird.com March, 2016 Vol 5 Issue 4 ISSN 2278 0211 (Online) Basic Characteristics of Speech Signal Analysis S. Poornima Assistant Professor, VlbJanakiammal College of Arts and Science, Coimbatore,

More information

Voice Pathology Detection and Discrimination based on Modulation Spectral Features

Voice Pathology Detection and Discrimination based on Modulation Spectral Features Voice Pathology Detection and Discrimination based on Modulation Spectral Features Maria Markaki, Student Member, IEEE, and Yannis Stylianou, Member, IEEE 1 Abstract In this paper, we explore the information

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

Discriminative methods for the detection of voice disorders 1

Discriminative methods for the detection of voice disorders 1 ISCA Archive http://www.isca-speech.org/archive ITRW on Nonlinear Speech Processing (NOLISP 05) Barcelona, Spain April 19-22, 2005 Discriminative methods for the detection of voice disorders 1 Juan Ignacio

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Digital Signal Processing

Digital Signal Processing COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Significance of Teager Energy Operator Phase for Replay Spoof Detection

Significance of Teager Energy Operator Phase for Replay Spoof Detection Significance of Teager Energy Operator Phase for Replay Spoof Detection Prasad A. Tapkir and Hemant A. Patil Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology,

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Perturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi

Perturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi Perturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi Abstract Voices from patients with voice disordered tend to be less periodic and contain larger perturbations.

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

ScienceDirect. Accuracy of Jitter and Shimmer Measurements

ScienceDirect. Accuracy of Jitter and Shimmer Measurements Available online at www.sciencedirect.com ScienceDirect Procedia Technology 16 (2014 ) 1190 1199 CENTERIS 2014 - Conference on ENTERprise Information Systems / ProjMAN 2014 - International Conference on

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

ENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS

ENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS ENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS Hania Maqsood 1, Jon Gudnason 2, Patrick A. Naylor 2 1 Bahria Institue of Management

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

EVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT

EVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT EVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT Dushyant Sharma, Patrick. A. Naylor Department of Electrical and Electronic Engineering, Imperial

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

651 Analysis of LSF frame selection in voice conversion

651 Analysis of LSF frame selection in voice conversion 651 Analysis of LSF frame selection in voice conversion Elina Helander 1, Jani Nurminen 2, Moncef Gabbouj 1 1 Institute of Signal Processing, Tampere University of Technology, Finland 2 Noia Technology

More information

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

Research Article Jitter Estimation Algorithms for Detection of Pathological Voices

Research Article Jitter Estimation Algorithms for Detection of Pathological Voices Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 29, Article ID 567875, 9 pages doi:1.1155/29/567875 Research Article Jitter Estimation Algorithms for Detection of

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Glottal source model selection for stationary singing-voice by low-band envelope matching

Glottal source model selection for stationary singing-voice by low-band envelope matching Glottal source model selection for stationary singing-voice by low-band envelope matching Fernando Villavicencio Yamaha Corporation, Corporate Research & Development Center, 3 Matsunokijima, Iwata, Shizuoka,

More information

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri

More information

Speech Coding using Linear Prediction

Speech Coding using Linear Prediction Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

NCCF ACF. cepstrum coef. error signal > samples

NCCF ACF. cepstrum coef. error signal > samples ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH A. Stráník, R. Čmejla Department of Circuit Theory, Faculty of Electrical Engineering, CTU in Prague Abstract Acoustic

More information

QUANTILE BASED NOISE ESTIMATION FOR SPECTRAL SUBTRACTION OF SELF LEAKAGE NOISE IN ELECTROLARYNGEAL SPEECH

QUANTILE BASED NOISE ESTIMATION FOR SPECTRAL SUBTRACTION OF SELF LEAKAGE NOISE IN ELECTROLARYNGEAL SPEECH International Conference on Systemics, Cybernetics and Informatics, February 12 15, 2004 QUANTILE BASED NOISE ESTIMATION FOR SPECTRAL SUBTRACTION OF SELF LEAKAGE NOISE IN ELECTROLARYNGEAL SPEECH Santosh

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

CHAPTER 3. ACOUSTIC MEASURES OF GLOTTAL CHARACTERISTICS 39 and from periodic glottal sources (Shadle, 1985; Stevens, 1993). The ratio of the amplitude of the harmonics at 3 khz to the noise amplitude in

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION by DARYUSH MEHTA B.S., Electrical Engineering (23) University of Florida SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING

More information

Source-Filter Theory 1

Source-Filter Theory 1 Source-Filter Theory 1 Vocal tract as sound production device Sound production by the vocal tract can be understood by analogy to a wind or brass instrument. sound generation sound shaping (or filtering)

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Analysis/synthesis coding

Analysis/synthesis coding TSBK06 speech coding p.1/32 Analysis/synthesis coding Many speech coders are based on a principle called analysis/synthesis coding. Instead of coding a waveform, as is normally done in general audio coders

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Feature Selection and Extraction of Audio Signal

Feature Selection and Extraction of Audio Signal Feature Selection and Extraction of Audio Signal Jasleen 1, Dawood Dilber 2 P.G. Student, Department of Electronics and Communication Engineering, Amity University, Noida, U.P, India 1 P.G. Student, Department

More information

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES Clemson University TigerPrints All Dissertations Dissertations 5-2012 GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES Yiqiao Chen Clemson University, rls_lms@yahoo.com

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Temporally Weighted Linear Prediction Features for Speaker Verification in Additive Noise

Temporally Weighted Linear Prediction Features for Speaker Verification in Additive Noise Temporally Weighted Linear Prediction Features for Speaker Verification in Additive Noise Rahim Saeidi 1, Jouni Pohjalainen 2, Tomi Kinnunen 1 and Paavo Alku 2 1 School of Computing, University of Eastern

More information

CHARACTERIZATION OF PATHOLOGICAL VOICE SIGNALS BASED ON CLASSICAL ACOUSTIC ANALYSIS

CHARACTERIZATION OF PATHOLOGICAL VOICE SIGNALS BASED ON CLASSICAL ACOUSTIC ANALYSIS CHARACTERIZATION OF PATHOLOGICAL VOICE SIGNALS BASED ON CLASSICAL ACOUSTIC ANALYSIS Robert Rice Brandt 1, Benedito Guimarães Aguiar Neto 2, Raimundo Carlos Silvério Freire 3, Joseana Macedo Fechine 4,

More information

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information