Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices
|
|
- Blaze Richard
- 5 years ago
- Views:
Transcription
1 Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Hemant A.Patil 1, Pallavi N. Baljekar T. K. Basu 3 1 Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat,India. Manipal Institute of Technology (MIT), Manipal, Karnataka, India. 3 Institute of Technology and Marine Engineering (ITME), Amira, West Bengal, India. hemant_patil@daiict.ac.in, pallavi.baljekar@learner.manipal.edu, basutk0@yahoo.co.in Abstract. In this paper, various temporal features (i.e., zero crossing rate and short-time energy) and spectral features (spectral flux and spectral centroid) have been derived from the Teager energy operator (TEO) profile of the speech waveform. The efficacy of these features has been analyzed for the classification of normal and dysphonic voices by comparing their performance with the features derived from the linear prediction (LP) residual and the speech waveform. In addition, the effectiveness of fusing these features with state-ofthe-art Mel frequency cepstral coefficients (MFCC) feature-set has also been investigated to understand whether these features provide complementary results. The classifier that has been used is the nd order polynomial classifier, with experiments being carried out on a subset of the Massachusetts Eye and Ear Infirmary (MEEI) database. Keywords: Dysphonia, TEO, LP residual, zero-crossing rate, short-time energy, spectral flux, spectral centroid and polynomial classifier. 1 Introduction The main motivation for investigating features for dysphonia detection is to build a robust and reliable system for non-intrusive evaluation of a patient s voice to detect pathologies in the larynx and the vocal tract. Pathologies such as vocal nodules, cysts, polyps, etc. are nodular masses which are present either on the glottis or along the walls of the vocal tract. As a result, they change the airflow properties through the glottis, either due to increased mass of the vocal folds which causes a change in the periodicity of vibration of the vocal folds, or due to the incomplete closure of the vocal folds due to the presence of these masses on the edge of the vocal folds. On the other hand, pathologies such as paralysis, which is caused due to damage to the recurrent and/or superior laryngeal nerve, effects the motor function of the larynx and thus causes asymmetric vibration of the vocal folds, which may cause transient or permanent diplophonia. Thus, the result of the presence of these pathologies is to modify the airflow properties, especially at the source. Thus, many of the parameters
2 which have been developed for voice pathology detection have been derived from the linear prediction (LP) residual [1] or the Electroglottograph (EGG) [1] which are considered to be representative of the airflow properties at the glottis. These features characterize the variability at the source either in amplitude represented as shimmer [1] or fundamental frequency (i.e., the pitch) called jitter [1].In pathological voices, due to incomplete closure of the vocal folds there is an escape of air, which increases turbulence perceived in the voice. Thus, apart from these perturbation measures (i.e., shimmer and jitter) there are also various noise measures [] that have been derived from the speech signal to exploit this perceived turbulence in the voice. In this paper, an attempt is made to derive features from the Teager energy operator (TEO) profile of the speech signal, which captures the glottal airflow properties in a more effective manner by also accounting for the nonlinear sources of voice production namely the vortices. In this study, four features (two temporal features and two spectral features) have been used. These features have also been extracted from the speech signal and the LP residual to compare the performance of the TEO profile. The organization of the paper is as follows. In Section, robustness of the TEO in capturing the source related information is discussed. In Section 3, the computational details of the features used are briefly discussed. Section 4 gives details of the experimental setup and describes the experiments conducted and the results obtained. Finally, Section 5 summarizes our findings and discusses future research directions. TEO as Source Information The TEO was first proposed by the Teagers in [3]. The Teagers showed that the airflow is not laminar as assumed by the linear source-filter theory, but separates into various paths leading to the generation of vortices which provide the excitation to the vocal tract during the closed phase. The TEO is an operator which captures the energy of these vortices. It is proportional to the square of amplitude and frequency. It is defined in both the discrete and continuous domain. For the discrete case it is defined as: ψ x n = x n x n 1. x n+ 1 Aω { ( )} ( ) ( ) ( ) for small values of ω i.e., s i n( ω ) ω (1). Fig. 1(a) and (b) depict the speech signal and corresponding differenced EGG taken from the ARTIC-CMU database [4]. Fig. 1(c) shows the corresponding TEO profile. It is interesting to see in this figure,that the peaks in the TEO profile are in close proximity of locations corresponding to the differenced EGG waveform which corresponds to the glottal closure instants (GCI), indicating that the TEO successfully captures the airflow properties at the glottis (in particular glottal activity). Moreover, the height of the peaks in the TEO profile is correlated to the peaks in the differenced EGG waveform, thus proving that the TEO profile of speech is a robust indicator of the airflow properties at the source, i.e., the glottis. Fig., depicts the TEO profile for a normal and pathological speaker suffering from vocal nodules taken from Massachusetts Eye and Ear Infirmary (MEEI)
3 database. As can be seen, for a normal speaker, due to complete glottal closure, there is not much turbulence at the source which is reflected in the regularity of the TEO profile peaks. On the other hand, in the case of the pathological voice, due to incomplete closure, there is increased turbulence at the source, which is reflected in the irregular structure in the running estimate of signal energy via TEO profile. This evidence again reiterates the fact that TEO is very good at capturing the airflow properties at the glottal source. Fig. 1. TEO as a source feature: (a) speech signal (b) differenced EGG pulses corresponding to GCI (c) corresponding TEO profile of speech. Fig.. TEO profile of speech for (a) normal phonation and (b) person suffering from vocal nodules. Two prominent differences that can be observed in the plots in Fig. are, firstly the peaks for the normal phonation are all almost the same height, whereas the peaks in the TEO profile for the pathological case are more non-uniform in height, showing greater variability in the energy at the GCI and thus showing greater amplitude variability. Secondly, the zero-crossing rate (ZCR) at the GCI seems to be uniform in case of the normal speech, which has hardly any zero-crossings between the GCI, whereas, in case of the pathological voice, there are increased number of zero crossings, especially in between the GCI due to the escape of air caused by the incomplete closure of vocal folds thus causing an increased turbulence perceived in the speech. Thus, we need to extract suitable temporal and spectral features which exploit these characteristics and give high classification accuracy.
4 3 Features Used The main requirements for selecting the features was firstly, it should exploit the characteristics of a pathological voice, i.e., increased noise components, breakdown of periodic structure, greater high frequency components and greater amplitude variations. Secondly, it should be simple to compute and reproduce by anyone and lastly, we wanted these features to be independent of determination of the pitch period. Thus, for this application we used four conventional features, two temporal features, the zero-crossing rate (ZCR) and short-time energy (STE) and two spectral features, spectral flux (SF) and spectral centroid (SC). The performance of these features was also compared with Mel frequency cepstral coefficients (MFCC) derived from the speech signal, since this is the state-of-the art feature which has given very good classification accuracy [5]. This section describes the computational details of these temporal and spectral features. Short-time Average Zero Crossing Rate (ZCR). As defined in [6] a zero-crossing is said to occur if there is an algebraic sign change between two consecutive samples in a speech signal. Thus, the rate at which the zero crossings occur is an indication of the frequency of the signal. It is defined as, 1 Z n = sgn[x(m)] sgn[x(m 1 )].w(n m), () N m= where, Z n is the ZCR for n th frame, N is the frame length and w(n) is the Hamming window. ZCR has been used for voiced/unvoiced detection [6], since it is known that unvoiced speech has higher frequency content. Thus, as it has been shown in various previous studies that the pathological voices have a much higher frequency content than the normal speech signals [1], we thought this feature would be suitable as an indication of the frequency content of a signal and the presence of noise components, since higher frequency components are mainly attributed to the presence of noise. Short-Time Energy (STE). It is known that the pathological speech has greater amplitude variations as compared to the normal voices, and was the major motivation for using the shimmer parameter in earlier studies [1]. The short-time energy is another parameter which has been used to account for the amplitude variations, especially to distinguish voiced/unvoiced speech [6]. In this paper, we wanted to investigate, whether it could capture the difference in amplitude variations in the pathological speech as compared to the normal speech. The short-time energy is defined as: En = [ x(m).w(n m) ], (3) () m= where E n is the energy of the n th frame, x(m) is the signal and w(n) is the Hamming window. Spectral Centroid (SC). The spectral centroid measures the brightness of sound, i.e., it measures where most of the power in a speech segment is located. It has been
5 previously used for detection of clinical depression [7] and for speech recognition where the spectral centroid in various sub-bands was obtained, and it was found that this feature is similar to the formant frequencies and provides complementary information to cepstral features and is also robust to noise [8]. It is the weighted average of the frequency of the spectrum and thus would give us an idea as to what frequency range most of the power of spectrum would lie in. We wanted to analyze if there is a shift in the spectral centroid for the pathological voice towards the higher frequencies. The spectral centroid is defined as: N 1 N 1 = ( ) ( ) ( ), (4) k= 0 k= 0 SC X k F k X k where X(k) represents the weighted frequency value, or magnitude of bin number k, and F(k) represents the center frequency of that bin. Spectral Flux (SF). The spectral flux is a defined as the difference in the power spectra of two consecutive speech frames. Thus, it measures the frame-to-frame variability in the spectral shape. It has been previously used for detecting depression [7] and for speaker recognition [9], among other applications. Since the pathological voice is known to be less periodic than the normal signals, it is expected that a pathological voice signal will have higher frequency variations, which motivated the use of jitter parameter. We thought this parameter would capture this breakdown of periodic structure and variation in the frequency content in the pathological voice. The spectral flux is defines as: ( N ) 1, k= N SF( n) = H ( X ( n, k) ( X ( n 1, k) ) (5) H ( x) = is the half- where X ( n, k ) is the k th frequency bin of the n th frame and wave rectifier function. x+ x 4 Experimental Results The corpus used for the experiments is the commercially available MEEI database [10]. For this work, a subset of 173 pathological and 53 normal speakers was used according to []. The MEEI database consists of samples either at 50 khz or 5 khz, hence all samples were downsampled to 5 khz sampling frequency. Since the number of pathological samples is approximately 3 times the normal samples, 1 s of pathological data for patients and 3 s of normal data of sustained phonation /ah/ for control people were used for training and testing. The signals were blocked into frames of 56 samples corresponding to 10. ms. The features extracted per frame were given to a nd order polynomial classifier to generate the true and false scores [11]. A 4-fold cross-validation scheme repeated 1 times giving a total of 48 trials was carried out, using 75% samples for training and 5% for testing, with the training
6 and testing subsets kept independent of each other. The classification accuracy (Ac), was calculated as an average for all these 48 trials (i.e., 688 genuine and 688 impostor trials). (a) (b) (c) (d) (c) Fig. 3. DET plots for comparison of features derived from LP residual, TEO profile of speech and speech waveform for following features: (a) ZCR (b) STE (c) SC (d) SF Table 1. A Comparison of the Temporal and Spectral features derived from the speech waveform, the LP residual (LP Res) and the TEO profile of speech. Features EER Ac ( %) (%) ZCR-Speech ZCR-LP Res ZCR-TEO STE-Speech STE-LP Res STE-TEO SC-Speech SC-LP Res SC-TEO SF-Speech SF-LP Res SF-TEO Table. Comparison of results of fusing the features derived from TEO profile with MFCC. Features EER Ac (%) (%) MFCC MFCC+ZCR MFCC+STE MFCC+SF MFCC+SC MFCC+ZCR+ STE+SC+SF
7 (a) (b) (c) Fig. 4. DET curves for feature-level fusion of MFCC with (a) ZCR, (b) SC and (c) ZCR+STE+SF+SC The detection error tradeoff (DET) plots [1] comparing the performance of the speech waveform, its TEO profile and the LP residual has been plotted for each of the 4 features in Fig 3 and the corresponding equal error rate (EER) values and accuracies have been listed in Table 1. As can be seen, in 3 of the 4 cases the TEO profile performs the best in comparison to the LP residual and the speech waveform. Only in case of the spectral flux feature the LP residual performs better. From among the 4 features we can see that the ZCR feature performs the best. Apart from this we also investigated whether these features provided complementary information to the MFCC features computed according to [13]. The values have been listed in Table. From Table we see that by fusing each feature separately with MFCC, ZCR and SC gives the lowest EER. However, on plotting the DET curves shown in Fig 4(a) and (b) for the two it was seen that for the SC parameter, certain points came very close to the DET curve for MFCC, while in case of ZCR the two DET curves were separated from each other at all points. Thus we see that as expected from the results of the individual features, ZCR performs the best when fused with MFCC followed by SC and STE, while fusion with SF actually reduces the classification accuracy, thus showing that it does not give any complementary information. We also fused MFCC with the ZCR, STE, SF and SC. Fig 4(c) depicts the DET plot of the fusion of MFCC+ZCR+STE+SC+SF. It can be observed that there not much of an improvement in EER in Fig 4(c) as compared to 4(a). Thus, even though, this fusion did increase the classification accuracy but this increase wasn t very much more than ZCR+MFCC. So, the increase in accuracy doesn t outweigh the increase in computation and so the best feature from our analysis is the ZCR feature which does well both individually as well as when fused with MFCC. The reason for the good performance by ZCR can be understood intuitively from the TEO profile plots shown in Fig. for the sustained phonation of normal person and a person suffering from vocal nodules. As can be observed from the plots, the ZCR for the normal person is almost periodic, with zero-crossings occurring approximately at each GCI. Whereas, for the pathological voice due to increased
8 turbulence at the source, there are lot of spurious zero-crossings between the peaks as well. Thus, due to this reason, the pathological voices show a significantly high ZCR and thus this feature performs the best in this dichotomy. 5 Summary and Conclusion In this paper, it is observed that most of the spectral and temporal features derived from the TEO profile of speech signal give a relatively better performance as compared to the same features derived from either the LP residual or the speech waveform. Consequently, we believe that the TEO is a more effective operator to capture the glottal airflow properties as compared to the LP residual. The advantage of proposed method is that it uses simple features, which can be implemented and computed easily and is independent of adhoc pitch estimation methods. Moreover, we have shown that most of the proposed features do provide some complementary information to the MFCC features and increase the classification accuracy by almost 1% in most cases, and decrease the EER by 1% as well. From this work, we infer that the ZCR gave the best relative performance and hence we would like to investigate this feature further by analyzing the ZCR in different frequency bands, and changing the analysis window length and degree of polynomial classifier, to obtain the best performance. Moreover, it is evident that the characteristics of the TEO profile in the vicinity of the GCI play a significant role and thus in the future we would like to develop speech features which capture this information for voice pathology classification. Acknowledgements Authors would like to thank the authorities of DA-IICT Gandhinagar for their kind support to carry out this work. They would also like to thank ECE Dept. Manipal Institute of Technology, India for providing the MEEI database, without which this work would not have been possible. References 1. Davis, S.B.: Acoustic Characteristics of Normal and Pathological Voices. Haskins Laboratories: Status Report on Speech Research, vol. 54, pp (1978). Parsa, V., Jamieson, D.G.: Identification of Pathological Voices Using Glottal Noise Measures. J. Speech, Language, Hearing Res., vol.43, no., pp , (000) 3. Teager, H. M., Teager, S. M., Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract. In Speech Production and Speech Modelling, W.J. Hardcastle, and A. Marchal (eds.). pp , Kluwer, Netherlands (1990) 4. CMU-ARCTIC speech synthesis databases, 5. Markaki, M., Stylianou, Y., Arias-Londoño, J.D., Godino-Llorente, J.I.: Dysphonia Detection Based on Modulation Spectral Features and Cepstral Coefficients. In IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp (010)
9 6. Rabiner, L. R., Schafer R. W.: Digital Processing of Speech Signals. Prentice- Hall, Englewood Cliffs, NJ. (1978). 7. Low, L.S.A., Maddage, N.C., Lech, M., Sheeber, L., Allen, N.: Influence of Acoustic Low-Level Descriptors in the Detection of Clinical Depression in Adolescents. In IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pp (010) 8. Paliwal, K.K.: Spectral Subband Centroid Features for Speech Recognition. In IEEE Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP), (1998) 9. Hossienzadeh, D., Krishnan, S.: Combining Vocal Source and MFCC Features for Enhanced Speaker Recognition Performance using GMMs. In Proc of IEEE 9th Workshop on Multimedia Signal Processing, pp (007) 10. Kay Elemetrics Corp, Disordered Voice Database Model 4337, Version 1.03, Massachusetts Eye and Ear Infirmary Voice and Speech Lab. (00) 11. Campbell, W. M., Assaleh, K. T., Broun, C. C.: Speaker Recognition with Polynomial Classifiers. In: IEEE Transactions on speech and audio processing, vol. 10, no. 4, pp (00) 1. Martin, A.F., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET Curve in Assessment of Detection Task Performance. In: Proc. Eurospeech 97, Vol. 4, pp Rhodes, Greece (1997) 13. Davis, S. B., Mermelstein, P.: Comparison on Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences. In: IEEE, Transactions on Acoustics, Speech, And Signal Processing, vol. ASSP-8, no.4, pp (1980)
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationPerformance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationVoiced/nonvoiced detection based on robustness of voiced epochs
Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationDetermination of instants of significant excitation in speech using Hilbert envelope and group delay function
Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,
More informationAdvances in Speech Signal Processing for Voice Quality Assessment
Processing for Part II University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Bilbao, 2011 September 1 Multi-linear Algebra Features selection 2 Introduction Application:
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationSub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech
Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationA Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)
More informationSPEECH AND SPECTRAL ANALYSIS
SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs
More informationI D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationEpoch Extraction From Emotional Speech
Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract
More informationEpoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE
1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract
More informationIntroducing COVAREP: A collaborative voice analysis repository for speech technologies
Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationFor Review Only. Voice Pathology Detection and Discrimination based on Modulation Spectral Features
is obtained. Based on the second approach, spectral related features have been defined such as the spectral flatness of the inverse filter (SFF) and the spectral flatness of the residue signal (SFR) [].
More informationBasic Characteristics of Speech Signal Analysis
www.ijird.com March, 2016 Vol 5 Issue 4 ISSN 2278 0211 (Online) Basic Characteristics of Speech Signal Analysis S. Poornima Assistant Professor, VlbJanakiammal College of Arts and Science, Coimbatore,
More informationVoice Pathology Detection and Discrimination based on Modulation Spectral Features
Voice Pathology Detection and Discrimination based on Modulation Spectral Features Maria Markaki, Student Member, IEEE, and Yannis Stylianou, Member, IEEE 1 Abstract In this paper, we explore the information
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationVoice Excited Lpc for Speech Compression by V/Uv Classification
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech
More informationDiscriminative methods for the detection of voice disorders 1
ISCA Archive http://www.isca-speech.org/archive ITRW on Nonlinear Speech Processing (NOLISP 05) Barcelona, Spain April 19-22, 2005 Discriminative methods for the detection of voice disorders 1 Juan Ignacio
More informationDigital Speech Processing and Coding
ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/
More informationDERIVATION OF TRAPS IN AUDITORY DOMAIN
DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.
More informationIMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey
Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical
More informationI D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationINTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006
1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular
More informationCepstrum alanysis of speech signals
Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP
More informationDigital Signal Processing
COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationSignificance of Teager Energy Operator Phase for Replay Spoof Detection
Significance of Teager Energy Operator Phase for Replay Spoof Detection Prasad A. Tapkir and Hemant A. Patil Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology,
More informationIsolated Digit Recognition Using MFCC AND DTW
MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationPerturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi
Perturbation analysis using a moving window for disordered voices JiYeoun Lee, Seong Hee Choi Abstract Voices from patients with voice disordered tend to be less periodic and contain larger perturbations.
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation
More informationPitch Period of Speech Signals Preface, Determination and Transformation
Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com
More informationRelative phase information for detecting human speech and spoofed speech
Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationScienceDirect. Accuracy of Jitter and Shimmer Measurements
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 16 (2014 ) 1190 1199 CENTERIS 2014 - Conference on ENTERprise Information Systems / ProjMAN 2014 - International Conference on
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More informationQuantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS
ENHANCED ROBUSTNESS TO UNVOICED SPEECH AND NOISE IN THE DYPSA ALGORITHM FOR IDENTIFICATION OF GLOTTAL CLOSURE INSTANTS Hania Maqsood 1, Jon Gudnason 2, Patrick A. Naylor 2 1 Bahria Institue of Management
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationEVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT
EVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT Dushyant Sharma, Patrick. A. Naylor Department of Electrical and Electronic Engineering, Imperial
More informationConverting Speaking Voice into Singing Voice
Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech
More informationSYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE
SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),
More informationSynthesis Algorithms and Validation
Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationNOISE ESTIMATION IN A SINGLE CHANNEL
SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina
More informationLinguistic Phonetics. Spectral Analysis
24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More information651 Analysis of LSF frame selection in voice conversion
651 Analysis of LSF frame selection in voice conversion Elina Helander 1, Jani Nurminen 2, Moncef Gabbouj 1 1 Institute of Signal Processing, Tampere University of Technology, Finland 2 Noia Technology
More informationElectronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis
International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate
More informationAudio Fingerprinting using Fractional Fourier Transform
Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,
More informationResearch Article Jitter Estimation Algorithms for Detection of Pathological Voices
Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 29, Article ID 567875, 9 pages doi:1.1155/29/567875 Research Article Jitter Estimation Algorithms for Detection of
More informationINTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN
More informationChapter IV THEORY OF CELP CODING
Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,
More informationGlottal source model selection for stationary singing-voice by low-band envelope matching
Glottal source model selection for stationary singing-voice by low-band envelope matching Fernando Villavicencio Yamaha Corporation, Corporate Research & Development Center, 3 Matsunokijima, Iwata, Shizuoka,
More informationLearning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives
Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri
More informationSpeech Coding using Linear Prediction
Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through
More informationNOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or
NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying
More informationNCCF ACF. cepstrum coef. error signal > samples
ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based
More informationWavelet Speech Enhancement based on the Teager Energy Operator
Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose
More informationMODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS
MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,
More informationAN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH
AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH A. Stráník, R. Čmejla Department of Circuit Theory, Faculty of Electrical Engineering, CTU in Prague Abstract Acoustic
More informationQUANTILE BASED NOISE ESTIMATION FOR SPECTRAL SUBTRACTION OF SELF LEAKAGE NOISE IN ELECTROLARYNGEAL SPEECH
International Conference on Systemics, Cybernetics and Informatics, February 12 15, 2004 QUANTILE BASED NOISE ESTIMATION FOR SPECTRAL SUBTRACTION OF SELF LEAKAGE NOISE IN ELECTROLARYNGEAL SPEECH Santosh
More informationL19: Prosodic modification of speech
L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture
More informationROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE
- @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu
More informationEC 6501 DIGITAL COMMUNICATION UNIT - II PART A
EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing
More informationPerformance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment
BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity
More informationCHAPTER 3. ACOUSTIC MEASURES OF GLOTTAL CHARACTERISTICS 39 and from periodic glottal sources (Shadle, 1985; Stevens, 1993). The ratio of the amplitude of the harmonics at 3 khz to the noise amplitude in
More informationChange Point Determination in Audio Data Using Auditory Features
INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features
More informationASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA
ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION by DARYUSH MEHTA B.S., Electrical Engineering (23) University of Florida SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING
More informationSource-Filter Theory 1
Source-Filter Theory 1 Vocal tract as sound production device Sound production by the vocal tract can be understood by analogy to a wind or brass instrument. sound generation sound shaping (or filtering)
More informationEnhanced Waveform Interpolative Coding at 4 kbps
Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationAnalysis/synthesis coding
TSBK06 speech coding p.1/32 Analysis/synthesis coding Many speech coders are based on a principle called analysis/synthesis coding. Instead of coding a waveform, as is normally done in general audio coders
More informationSpeech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065
Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationFeature Selection and Extraction of Audio Signal
Feature Selection and Extraction of Audio Signal Jasleen 1, Dawood Dilber 2 P.G. Student, Department of Electronics and Communication Engineering, Amity University, Noida, U.P, India 1 P.G. Student, Department
More informationGLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES
Clemson University TigerPrints All Dissertations Dissertations 5-2012 GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES Yiqiao Chen Clemson University, rls_lms@yahoo.com
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationTemporally Weighted Linear Prediction Features for Speaker Verification in Additive Noise
Temporally Weighted Linear Prediction Features for Speaker Verification in Additive Noise Rahim Saeidi 1, Jouni Pohjalainen 2, Tomi Kinnunen 1 and Paavo Alku 2 1 School of Computing, University of Eastern
More informationCHARACTERIZATION OF PATHOLOGICAL VOICE SIGNALS BASED ON CLASSICAL ACOUSTIC ANALYSIS
CHARACTERIZATION OF PATHOLOGICAL VOICE SIGNALS BASED ON CLASSICAL ACOUSTIC ANALYSIS Robert Rice Brandt 1, Benedito Guimarães Aguiar Neto 2, Raimundo Carlos Silvério Freire 3, Joseana Macedo Fechine 4,
More informationYOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION
American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University
More informationHungarian Speech Synthesis Using a Phase Exact HNM Approach
Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationVocoder (LPC) Analysis by Variation of Input Parameters and Signals
ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More information