Exploring Modulation Spectrum Features for Speech-Based Depression Level Classification

Size: px
Start display at page:

Download "Exploring Modulation Spectrum Features for Speech-Based Depression Level Classification"

Transcription

1 INTERSPEECH 2014 Exploring Modulation Spectrum for Speech-Based Depression Level Classification Elif Bozkurt 1, Orith Toledo-Ronen 2, Alexander Sorin 2,Ron Hoory 2 1 Multimedia, Vision and Graphics Laboratory, Koç University, Istanbul, Turkey 2 IBM Research Haifa, Haifa University Mount Carmel, Haifa, Israel ebozkurt@ku.edu.tr, {oritht, sorin, hoory}@il.ibm.com Abstract In this paper, we propose a Modulation Spectrum-based manageable feature set for detection of depressed speech. Modulation Spectrum (MS) is obtained from the conventional speech spectrogram by spectral analysis along the temporal trajectories of the acoustic frequency bins. While MS representation of speech provides rich and high-dimensional joint frequency information, extraction of discriminative features from it remains as an open question. We propose a lower dimensional representation, which first employs a Melfrequency filterbank in the acoustic frequency domain and Discrete Cosine Transform in the modulation frequency domain, and then applies feature selection in both domains. We compare and fuse the proposed feature set with other complementary prosodic and spectral features at the feature and decision levels. In our experiments, we use Support Vector Machines for discriminating the depressed speech in a speaker-independent fashion. Feature-level fusion of the proposed MS-based features with other prosodic and spectral features after dimension reduction provides up to ~9% improvement over the baseline results and also correlates the most with clinical ratings of patients depression level. Index Terms: depression assessment, modulation spectrum, prosody, feature fusion, decision fusion 1. Introduction Characterization of emotional expression of speech and its relation to the overall state of the speaker is a challenging task, yet one that would provide new avenues for health care technologies. While emotions are a part of everyday communication, emotional or mood disorders such as clinical depression remain as a critical public health concern [1]. There is a large foundation of research to suggest that the analysis of voice patterns can lead to the formation of objective analysis tools for the characterization of depression in speech [2, 3]. One of the goals of this research is to find objectively measureable speech features that can distinguish speaking patterns of individuals with a diagnosis of clinical depression on a speaker independent basis. We particularly focus on Modulation Spectrum (MS) features, which provide long-term dynamic characteristics of the speech signal [4, 5, 6]. In its original definition, MS is a high dimensional representation. We employ Mel filterbank and Discrete Cosine Transformation (DCT) for lower dimensional representations in acoustic and modulation frequency domains, respectively. We hypothesize that energy modulations in particular frequency ranges may be more discriminative for depressive speech recognition and experimentally select a joint subset of Mel and DCT bins for better performance. As a secondary goal, we wish to explore how the MS-based feature set compares with the commonly used prosodic and spectral features, and whether these features have fusion potential at the feature and decision levels. Experiments for two-class depression classification problem (depressed vs. non-depressed) are performed using Support Vector Machine (SVM) classifiers implemented in speaker-independent configurations on the free speech recordings of the Mundt dataset referred in [3] Related Work The perceptual qualities of depression in voice have been most commonly studied with regard to prosodic and vocal tract perturbations [7,8,9,10]. Studies have shown that the second formant location is most affected by depressive speech. Patients with major depressive disorder had decreased second formant (F2) measurements [7]. Energy variability has been shown to decrease with increasing levels of depression [10,]. Speech-rate as combined phone duration measures [12], statistics of pitch and energy features [13], and voice quality measures [14] have also been useful for detecting depression symptoms. Spectral features such as Mel frequency Cepstral Coefficients (MFCCs), power spectral density, and spectral tilt also potentially include useful information in the classification of depression [8,15,16,17]. In addition, glottal measures have also been analyzed [17,18]. More recently, Cummins et al. [] investigated the effects of depression on speech by analyzing MS features on the trisyllabic sequence PATAKA recordings of the Mundt dataset [3] used in our study. Authors apply log mean subtraction along each acoustic frequency during MS features extraction and report 66.9% weighted accuracy using 10-fold CV for the two-class depression recognition problem. In a later study, authors investigate covariance structure of a Gaussian Mixture Model (GMM) to capture depression based information [19] on Grandfather read-speech passages of the same dataset. The best classification result for the two-class depression recognition problem is presented as 68.6% when variance and weight parameters of Gaussian are updated during adaptation. Sturim et al. also investigate the free speech recordings of the same dataset in leave-one-recording-out fashion. They focus on depression severity as a class distinction and apply joint factor analysis with Wiener filtering for modeling speaker and channel variation [20]. Authors test their system using MFCCs and shifted delta cepstral features modeled with GMMs. The proposed system brings 20-30% of equal error rate gain for two-class depression classification task. The rest of the paper is organized as follows. In Section 2, we summarize the MS features extraction steps. In Section 3, we set up the depression classification problem. In Section 4, we describe our results with baseline and proposed features. Finally, in Section 5, we provide conclusions and projections to future work. Copyright 2014 ISCA September 2014, Singapore

2 2. Modulation Spectrum Modulation spectral analysis tries to capture long-term dynamics within an acoustic signal, which is typically a twodimensional joint acoustic frequency and modulation frequency representation [4,5,6]. Acoustic frequency means the frequency variable of conventional spectrogram derived from short term Fourier transform (STFT) whereas, modulation frequency captures time-varying information through temporal modulation of the signal. The computation of joint acoustic-modulation frequency spectrum is carried out in two phases. First, speech spectrogram is computed using N A -point FFT (Fast Fourier Transform) on each pre-emphasized, Hamming windowed overlapping frame. Let S[n, k] denote the STFT of speech signal as a function of frame index n and acoustic frequency index k (0 k N A /2). The modulation spectrum is derived from the analysis of magnitude spectrogram, S[n,k], within longer duration windows (of length M frames) with some overlap. The windows correspond to a two-dimensional time frequency context, e.g. starting from frame n 0 and having a length of M frames, consists of all frequency bands within the time interval [n 0, n 0+M-1 ]. The temporal trajectory of the k th frequency band within time-frequency context is denoted as T(n 0,M, k) = ( S(n 0, k), S(n 0+1, k),, S(n 0+M-1, k) ) (1) The second N M -point FFT is then applied on meannormalized, Mel-filtered, and Hamming windowed T(n 0,M, k) to produce the modulation spectrum MS(n 0,M,k,q), where q is the modulation frequency index and (0 q N M /2). In our set up, a standard N component Mel filterbank is used to effectively reduce both dimensionality of acoustic frequency domain and correlations between the frequency sub-bands. Additionally, Discrete Cosine Transformation (DCT) is applied to each modulation spectrum MS(n 0,M,k,q) for reducing the modulation frequency domain dimensionality yielding a N M /2+1-dimensional vector of DCT coefficients for each acoustic bin. We retain the lowest D coefficients, including the DC coefficient which preserves the most significant signal energy. The frame-level MS features have N-by-D dimensionality. 3. Experimental Setup We use in-clinic speech recordings of the database originally collected by Mundt et al. for depression severity study [3]. The database contains voice samples from 35 patients (20 F/15 M, ages from 20 to 68 years) subject to depression treatment over a six week period. Depression severity level of participants was observed during clinical interviews at one-week intervals and evaluated using the Hamilton Rating Scale for Depression (HAMD) over the course of treatment [21]. HAMD assessment has 17 symptom sub-topics with scores for each. We use the total HAMD score of individual ratings as the ground truth in defining classes in our study. Recordings with total HAMD score of greater and equal to 17 are assigned the category depressed () and rest of the recordings are assigned the category non-depressed (non-). In this study, 257 samples of free speech recordings are labeled as non-depressed and remaining 2 are labeled as depressed, respectively. Speech features in our study may be considered in two main categories: prosodic and spectral. While these categories are not all inclusive of measurable speech features, they will form the basis of feature extraction described in this work. We tested all features using their sentence-level statistics ( functionals) consisting of maximum, minimum, variance, standard deviation, skewness, kurtosis, quartiles 1,2&3, and percentiles 1.0& Baseline features We use the opensmile [22] and Praat [23] toolkits for baseline features extraction. All the baseline features are extracted on a frame basis within windows of 25 ms with 10 ms frame shifts. Then, statistical functionals are calculated per recording from the frame-level features. The prosodic category features for this study are pitch (F 0 ) and intensity (I), both extracted by using Praat. The vocal tract is commonly quantified by the formant frequencies which are the primary resonances determined by the vocal tract shape during speech production. In this study, vocal tract spectral structure was quantified by the first (F 1 ), second (F 2 ), and third (F 3 ) formant center frequencies and their bandwidths (BW 1, BW 2, BW 3 ) extracted in Praat. The formant center frequencies and bandwidths each represent a unique feature sub-category for analysis. In addition, we extract Mel Frequency Cepstral Coefficients [0-14] (MFCCs) and Line Spectral Pairs [0-7] (LSPs) using the emobase2010 configuration of the opensmile toolkit Classification setup Speaker independent experiments were performed in a leaveone-speaker-out cross validation (LOSO CV) manner using data from each of the 35 speakers as the test set in turn and the data from the other 34 speakers as the training set. The class accuracies are computed on the overall dataset. Then, the classification performance is evaluated by the unweighted average recall rate (UAR), which is the arithmetic average of individual class accuracies. In addition to the UAR rate we also provide the recall rate on the two classes (, non- ) for more insight. We use LibSVM [24] implementation of Support Vector Machines and employ the linear kernel in all experiments with features scale normalization and class weights of [0.45, 0.55] for non- and categories, respectively Baseline results 4. Experimental Results We first present baseline results with the well-known speech acoustic features of two categories: prosody and spectral features. In Table 1, we present a comparison of several standard feature sets. In the upper part of the table, we see the classification performance of individual prosody features, and in the lower part that of spectral features are shown. Among the prosody features, the intensity (I) is the most discriminative feature, whereas F 0 has lower classification rate due to its speaker dependency. For the spectral features, MFCCs are discriminative, but are also very characteristic of the speaker. We can see that MS performs the best, but is very close to the other standard spectral feature sets. Additionally, random classification accuracy is calculated as %. 1244

3 Table 1. Baseline classification rates with prosodic and spectral feature sets. F 0 I F 1 F 2 F 3 BW 1 BW 2 BW 3 MFCC LSP MS Modulation spectrum features parameter setting Modulation spectrum (MS) features are a joint acoustic and modulation frequency representation of speech signals that is obtained by simultaneous spectral analysis of all frequency bins. Thus, frame shift and time-frequency context length (M) are two crucial parameters. Frame shift determines sampling rate for modulation frequency domain and M, on the other hand, controls the resolution of the MS. We extract STFT of the speech signals within windows of length of 32 ms and frame shifts of 17 ms. We tested several values for M (10, 20, 25, and 30) and selected M to be 25 (corresponding to an analysis window of length 425 ms) with the best performance to create a valid baseline for MS features. Additionally, we apply mean normalization of frequency bins (DC removal) prior to Mel filtering. However, variance normalization following mean normalization does not improve results. Moreover, log compression of STFT outputs or MS components does not increase recognition rates. We apply N A =256 point FFT for the calculation of STFT components and N M =128 point FFT for the calculation of MS components. Thus, the original feature vector size for framelevel MS features is For feature dimension reduction, we apply Mel filterbank with N=26 components in the acoustic frequency domain and DCT in the modulation frequency domain. We retain the first D=10 components of the DCTs, which results in a feature vector size of 2860 at the functionals level Modulation spectrum features bin selection Our manageable feature set is a subset of the Mel and DCT bins of Modulation Spectrum representation of speech signal. In Figure 1, we see the classification performance of several selections of the Mel bins as a function of the number of DCT coefficients (always starting from coefficient 1). As we can see, the best result is achieved by taking the Mel bins in the middle range [6-19], corresponding to a frequency range from 668 to 2000 Hz, with an increasing gain as the number of DCT coefficients reduces down to 1. In Table 3, we summarize the best result of the MS feature selection using only the first DCT coefficient and Mel bins in the range of 6-19, in comparison with the original MS features without selection. We denote these selected set of features as MS sel. We can see the dramatic improvement in accuracy of the depressed class, with some degradation on the nondepressed class and overall UAR improvement. UAR ber of DCT coefficients Figure 1: UAR classification performance of the MS features with Mel band selection for varying number of DCT components. Table 3. Classification performance of MS selected features ( Mel bins 6-19 and 1 st DCT ) in comparison with the original MS features with no selection. MS MS sel Feature fusion results We present the performance of several combinations of formant and prosody features in Table 4. We start by fusing three formant frequencies (F 123 ), and three bandwidths (BW 123 ) with moderate performance. Next, we add the intensity (I) features to F 123 and get ~4 % improvement. Adding BW 123 to the features gives only a marginal gain, and adding the F 0 degrades the performance. Finally, the combination of the 3 top-performing individual features (I, F 2, and BW 3 ) from Table 1 gives the best fusion performance as % UAR. Table 4. Classification performance for feature-level fusion of prosody and formant features F BW I+F I+F 123 +BW F 0 +I+F 123 +BW I+F 2 +BW Next, in upper part of Table 5, we show the results of fusing the MS sel features with other features. We can see that none of the feature combinations give any gain beyond the performance of the MS sel features, so our next step was to apply PCA for dimension reduction. Our experimentation with PCA on the MS sel features set did not yield any performance improvement, but for other features (e.g. MFCC) some gain was achieved, probably due to redundancies in feature representation. In lower part of Table 5, we show the results of 1245

4 combing the complete MS sel feature set with a second feature set reduced by PCA. Table 5. Classification performance of fusing the MS sel features with other feature sets, before and after applying PCA on the 2 nd feature set. MFCC LSP I I+F 2 + BW 3 I+F 2 + BW 3 +MFCC I+F 2 + BW 3 +LSP after applying PCA MFCC LSP I I+F 2 + BW 3 I+F 2 + BW 3 +MFCC I+F 2 + BW 3 +LSP To better understand the behavior of the PCA dimension reduction on the second feature set in fusion with the MS sel features, we show in Figure 2 the performance of MS sel fusion with four other feature sets as a function of the number of PCA components of the second feature set. The horizontal dotted line is the MS sel baseline performance. We can see that by selecting few principal components from the second feature set and fusing them with the MS sel features, we are able to improve the classification performance, especially with the feature set (I+F 2 +BW 3 +MFCC). UAR PCA in second feature set Figure 2. Classification performance of PCA on the second feature set in fusion with the MS sel features Correlation results MS+MFCC MS+LSP MS+I MS+(I+F 2 +BW 3 +MFCC) All the classification experiments in the previous sections were performed in a commonly-used 2-class setup of vs. non- classification based on setting a threshold on the total clinical HAMD score. In such setup, the classes are very broad. To avoid the sensitivity of the results, we measured the correlation between the classification result and the clinical total HAMD score. Since the HAMD score is measured on an ordinal scale and its relationship to classification result is monotonic, we used the Spearman rank correlation. The correlation coefficients along with their twotails p-values are shown in Table 6 for several feature sets, for features fusion and some decision fusion experiments. In feature fusion, the features are combined and one classification experiment is performed on which the correlation is measured. In decision fusion, two or more classification experiments are performed, each with a different feature set and the classification results are averaged and the correlation is measured on the fused decision. As we can see, fusion at the features level is more powerful than fusion at the decision level. Table 6. Spearman correlation between classification result and clinical total HAMD score for several features sets, features fusion, and decision fusion. and Fusion Corr p-value MFCC LSP I+F 2 +BW 3 MS sel Fusion MS sel +I+F 2 +BW 3 MS sel +(I+F 2 +BW 3 _PCA 3 ) MS sel +(I+F 2 +BW 3 +MFCC) MS sel +(I+F 2 +BW 3 +MFCC_PCA 5 ) Decision Fusion MS sel + MFCC MS sel + LSP MS sel + LSP + MFCC MS sel + (I+F 2 +BW 3 ) MS sel + (I+F 2 +BW 3 _PCA 3 ) MS sel + (I+F 2 +BW 3 ) + MFCC MS sel + (I+F 2 +BW 3 ) + LSP e-5 6.1e-7 1.7e- 1.9e e- 1.3e e e e e e e e e e Conclusions Our results clearly suggest that the proposed modulation spectrum-based manageable feature set improves the overall discrimination of depressed speech from non-depressed. The selected joint subset of Mel and DCT bins in MS brings a ~7% UAR improvement over the conventional MS feature set performance. Feature fusion of this feature set with formant, intensity, and MFCC features further advances recognition rates up to % UAR when PCA dimension reduction is applied on the second feature set. Correlation results also indicate that our feature fusion classification results are more correlated to clinical rating scores compared to decision fusion of the same feature sets. Future research will involve analysis on other datasets and improvements on feature selection strategy so that an objective analysis tool may be designed for clinical practice. 6. Acknowledgements The authors would like to thank Dr. James C. Mundt for providing the dataset that was collected under the U.S. National Institute of Mental Health Grant R43MH This work is supported by the Dem@Care FP7 project, partially funded by the EC under contract number

5 7. References [1] Greenberg, P.E., Stiglin, L. E., Finkelstein, S.D. and Berndt, E.R., Depression: A neglected major illness, Journal of Clinical Psychiatry, 54, p , [2] Darby, J. K., Speech and voice parameters of depression: A pilot study, J. Commun. Disord., 17, pp , [3] Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K. and Geralts, D. S. Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology, Journal of Neurolinguistics, vol. 20, p , [4] Ivanov, A. & Chen, X., Modulation Spectrum Analysis for Speaker Personality Trait Recognition, INTERSPEECH, ISCA, [5] Markaki, M.; Stylianou, Y.; Arias-Londoño, J.D.; Godino- Llorente, J.I., Dysphonia detection based on modulation spectral features and cepstral coefficients, Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, vol., no., pp.5162,5165, March [6] Wu, S., Falk, T., H., and Chan, W., Y., Automatic speech emotion recognition using modulation spectral features, Speech Communication, vol. 53 (5) p , 20. [7] Flint, A.J., et al., "Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression", Journal of Psychiatric Research, vol. 27(3): p , [8] France, D. J., Shiavi, R. G., Silverman, S., Silverman, M. and Wilkes, M., Acoustical properties of speech as indicators of depression and suicidal risk, Bio-Eng, IEEE Transactions on, vol. 47, p , [9] Ozdas, A., Shiavi, R. G., Silverman, S. E., Silverman, M. K., and Wilkes, D. M., Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk, Bio-Eng, IEEE Transactions on, vol. 51, pp , [10] Quatieri, T. F. and Malyska, N., Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity, in INTERSPEECH-2012, Portland, USA, [] Cummins, N., Epps, J. and Ambikairajah, E., Spectro-Temporal Analysis of Speech Affected by Depression and Psychomotor Retardation, in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, p , [12] Trevino, A., Quatieri, T., and Malyska, N., Phonologicallybased biomarkers for major depressive disorder, EURASIP Journal on Advances in Signal Processing, vol. 20, p. 1-18, 20. [13] Sanchez, M. H., Vergyri, D., Ferrer, L., Richey, C., Garcia, P., Knoth, B., and Jarrold, W., Using Prosodic and Spectral in Detecting Depression in Elderly Males, INTERSPEECH, p , 20. [14] Scherer, S., Stratou, G., Gratch, J. & Morency, L.-P. (2013). Investigating voice quality as a speaker-independent indicator of depression and PTSD.. In F. Bimbot, C. Cerisara, C. Fougeron, G. Gravier, L. Lamel, F. Pellegrino & P. Perrier (eds.), INTERSPEECH (p./pp ), : ISCA [15] Low, L. S. A., N. C. Maddage, et al. "Mel frequency cepstral feature and Gaussian Mixtures for modeling clinical depression in adolescents." in Proc. IEEE Int. Conf. on Cognitive Informatics, 2009, pp [16] Yingthawornsuk, T., Keskinpala, H., K., Wilkes, D., M., Shiavi, R., G., and Salomon, R., M., Direct Acoustic Feature using Iterative EM Algorithm and Spectral Energy for Classifying Suicidal Risk, Interspeech 2007, Antwerp, Belguim. [17] Moore, E., Clements, M., A., Peifer, J. W., and Weisser, L., Critical Analysis of the Impact of Glottal in the Classification of Clinical Depression in Speech, IEEE Trans. Biomed. Engineering,vol. 55 (1), p , [18] Moore, E., Clements, M. Peifer, J., and Weisser, L., Comparing objective feature statistics of speech for classifying clinical depression, IEEE 26 th Annual International Conference of Engineering in Biology and Medicine Society, p.17-20, 2004 [19] Cummins, N., Epps, J., Sethu, V., Breakspear, M. & Goecke, R., Modeling spectral variability for the classification of depressed speech, INTERSPEECH, p , ISCA, [20] Sturim, D., Torres-Carrasquillo, P. A., Quatieri, T. F., Malyska, N., and McCree, A., Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis, Interspeech 20, p , 20. [21] Hamilton, H., HAMD: A rating scale for depression, Neurosurg Psychiat, vol. 23, p , [22] Eyben, F., Wöllmer, M., Schuller, B., opensmile - The Munich Versatile and Fast Open-Source Audio Feature Extractor, in Proc. ACM Multimedia (MM), ACM, Firenze, Italy, [23] Boersma, Paul & Weenink, David (2013). Praat: doing phonetics by computer [Computer program]. Version , available at [24] Chang, C.-C. and C.-J. Lin. LIBSVM: a library for support vector machines, Software available at

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices

Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Hemant A.Patil 1, Pallavi N. Baljekar T. K. Basu 3 1 Dhirubhai Ambani Institute of Information and

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Linguistic Phonetics. Spectral Analysis

Linguistic Phonetics. Spectral Analysis 24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models

Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models Rong Phoophuangpairoj applied signal processing to animal sounds [1]-[3]. In speech recognition, digitized human speech

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Environmental Sound Recognition using MP-based Features

Environmental Sound Recognition using MP-based Features Environmental Sound Recognition using MP-based Features Selina Chu, Shri Narayanan *, and C.-C. Jay Kuo * Speech Analysis and Interpretation Lab Signal & Image Processing Institute Department of Computer

More information

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Gammatone Cepstral Coefficient for Speaker Identification

Gammatone Cepstral Coefficient for Speaker Identification Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia

More information

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India

More information

A multi-class method for detecting audio events in news broadcasts

A multi-class method for detecting audio events in news broadcasts A multi-class method for detecting audio events in news broadcasts Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis Computational Intelligence Laboratory, Institute of Informatics and

More information

Campus Location Recognition using Audio Signals

Campus Location Recognition using Audio Signals 1 Campus Location Recognition using Audio Signals James Sun,Reid Westwood SUNetID:jsun2015,rwestwoo Email: jsun2015@stanford.edu, rwestwoo@stanford.edu I. INTRODUCTION People use sound both consciously

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

REVERBERATION-BASED FEATURE EXTRACTION FOR ACOUSTIC SCENE CLASSIFICATION. Miloš Marković, Jürgen Geiger

REVERBERATION-BASED FEATURE EXTRACTION FOR ACOUSTIC SCENE CLASSIFICATION. Miloš Marković, Jürgen Geiger REVERBERATION-BASED FEATURE EXTRACTION FOR ACOUSTIC SCENE CLASSIFICATION Miloš Marković, Jürgen Geiger Huawei Technologies Düsseldorf GmbH, European Research Center, Munich, Germany ABSTRACT 1 We present

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS

CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS Hamid Eghbal-Zadeh Bernhard Lehner Matthias Dorfer Gerhard Widmer Department of Computational

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Separating Voiced Segments from Music File using MFCC, ZCR and GMM Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.

More information

Noise Reduction on the Raw Signal of Emotiv EEG Neuroheadset

Noise Reduction on the Raw Signal of Emotiv EEG Neuroheadset Noise Reduction on the Raw Signal of Emotiv EEG Neuroheadset Raimond-Hendrik Tunnel Institute of Computer Science, University of Tartu Liivi 2 Tartu, Estonia jee7@ut.ee ABSTRACT In this paper, we describe

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Identification of disguised voices using feature extraction and classification

Identification of disguised voices using feature extraction and classification Identification of disguised voices using feature extraction and classification Lini T Lal, Avani Nath N.J, Dept. of Electronics and Communication, TKMIT, Kollam, Kerala, India linithyvila23@gmail.com,

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data

Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data INTERSPEECH 2013 Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data Cong-Thanh Do 1, Claude Barras 1, Viet-Bac Le 2, Achintya K. Sarkar

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

The SRI AVEC-2014 Evaluation System

The SRI AVEC-2014 Evaluation System The SRI AVEC-2014 Evaluation System Vikramjit Mitra vikramjit.mitra@sri.com Andreas Kathol andreas.kathol@sri.com Elizabeth Shriberg elizabeth.shriberg@sri.com Colleen Richey colleen.richey@sri.com Martin

More information

Measuring the complexity of sound

Measuring the complexity of sound PRAMANA c Indian Academy of Sciences Vol. 77, No. 5 journal of November 2011 physics pp. 811 816 Measuring the complexity of sound NANDINI CHATTERJEE SINGH National Brain Research Centre, NH-8, Nainwal

More information

Modulation Domain Spectral Subtraction for Speech Enhancement

Modulation Domain Spectral Subtraction for Speech Enhancement Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9

More information

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System Jordi Luque and Javier Hernando Technical University of Catalonia (UPC) Jordi Girona, 1-3 D5, 08034 Barcelona, Spain

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

An Optimization of Audio Classification and Segmentation using GASOM Algorithm

An Optimization of Audio Classification and Segmentation using GASOM Algorithm An Optimization of Audio Classification and Segmentation using GASOM Algorithm Dabbabi Karim, Cherif Adnen Research Unity of Processing and Analysis of Electrical and Energetic Systems Faculty of Sciences

More information

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants Zhi Zhu, Ryota Miyauchi, Yukiko Araki, and Masashi Unoki School of Information Science, Japan Advanced

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

Automatic Morse Code Recognition Under Low SNR

Automatic Morse Code Recognition Under Low SNR 2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) Automatic Morse Code Recognition Under Low SNR Xianyu Wanga, Qi Zhaob, Cheng Mac, * and Jianping

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

A New Scheme for No Reference Image Quality Assessment

A New Scheme for No Reference Image Quality Assessment Author manuscript, published in "3rd International Conference on Image Processing Theory, Tools and Applications, Istanbul : Turkey (2012)" A New Scheme for No Reference Image Quality Assessment Aladine

More information

Long Range Acoustic Classification

Long Range Acoustic Classification Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire

More information

Advances in Speech Signal Processing for Voice Quality Assessment

Advances in Speech Signal Processing for Voice Quality Assessment Processing for Part II University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Bilbao, 2011 September 1 Multi-linear Algebra Features selection 2 Introduction Application:

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015 RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

The Effects of Noise on Acoustic Parameters

The Effects of Noise on Acoustic Parameters The Effects of Noise on Acoustic Parameters * 1 Turgut Özseven and 2 Muharrem Düğenci 1 Turhal Vocational School, Gaziosmanpaşa University, Turkey * 2 Faculty of Engineering, Department of Industrial Engineering

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure

More information

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,

More information

Monitoring Infant s Emotional Cry in Domestic Environments using the Capsule Network Architecture

Monitoring Infant s Emotional Cry in Domestic Environments using the Capsule Network Architecture Interspeech 2018 2-6 September 2018, Hyderabad Monitoring Infant s Emotional Cry in Domestic Environments using the Capsule Network Architecture M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Combining Voice Activity Detection Algorithms by Decision Fusion

Combining Voice Activity Detection Algorithms by Decision Fusion Combining Voice Activity Detection Algorithms by Decision Fusion Evgeny Karpov, Zaur Nasibov, Tomi Kinnunen, Pasi Fränti Speech and Image Processing Unit, University of Eastern Finland, Joensuu, Finland

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

ADAPTIVE NOISE LEVEL ESTIMATION

ADAPTIVE NOISE LEVEL ESTIMATION Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

An Improved Voice Activity Detection Based on Deep Belief Networks

An Improved Voice Activity Detection Based on Deep Belief Networks e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.

More information

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Modulation Spectrum Power-law Expansion for Robust Speech Recognition Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:

More information

NCCF ACF. cepstrum coef. error signal > samples

NCCF ACF. cepstrum coef. error signal > samples ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information