Exploring Modulation Spectrum Features for Speech-Based Depression Level Classification
|
|
- August Chase
- 5 years ago
- Views:
Transcription
1 INTERSPEECH 2014 Exploring Modulation Spectrum for Speech-Based Depression Level Classification Elif Bozkurt 1, Orith Toledo-Ronen 2, Alexander Sorin 2,Ron Hoory 2 1 Multimedia, Vision and Graphics Laboratory, Koç University, Istanbul, Turkey 2 IBM Research Haifa, Haifa University Mount Carmel, Haifa, Israel ebozkurt@ku.edu.tr, {oritht, sorin, hoory}@il.ibm.com Abstract In this paper, we propose a Modulation Spectrum-based manageable feature set for detection of depressed speech. Modulation Spectrum (MS) is obtained from the conventional speech spectrogram by spectral analysis along the temporal trajectories of the acoustic frequency bins. While MS representation of speech provides rich and high-dimensional joint frequency information, extraction of discriminative features from it remains as an open question. We propose a lower dimensional representation, which first employs a Melfrequency filterbank in the acoustic frequency domain and Discrete Cosine Transform in the modulation frequency domain, and then applies feature selection in both domains. We compare and fuse the proposed feature set with other complementary prosodic and spectral features at the feature and decision levels. In our experiments, we use Support Vector Machines for discriminating the depressed speech in a speaker-independent fashion. Feature-level fusion of the proposed MS-based features with other prosodic and spectral features after dimension reduction provides up to ~9% improvement over the baseline results and also correlates the most with clinical ratings of patients depression level. Index Terms: depression assessment, modulation spectrum, prosody, feature fusion, decision fusion 1. Introduction Characterization of emotional expression of speech and its relation to the overall state of the speaker is a challenging task, yet one that would provide new avenues for health care technologies. While emotions are a part of everyday communication, emotional or mood disorders such as clinical depression remain as a critical public health concern [1]. There is a large foundation of research to suggest that the analysis of voice patterns can lead to the formation of objective analysis tools for the characterization of depression in speech [2, 3]. One of the goals of this research is to find objectively measureable speech features that can distinguish speaking patterns of individuals with a diagnosis of clinical depression on a speaker independent basis. We particularly focus on Modulation Spectrum (MS) features, which provide long-term dynamic characteristics of the speech signal [4, 5, 6]. In its original definition, MS is a high dimensional representation. We employ Mel filterbank and Discrete Cosine Transformation (DCT) for lower dimensional representations in acoustic and modulation frequency domains, respectively. We hypothesize that energy modulations in particular frequency ranges may be more discriminative for depressive speech recognition and experimentally select a joint subset of Mel and DCT bins for better performance. As a secondary goal, we wish to explore how the MS-based feature set compares with the commonly used prosodic and spectral features, and whether these features have fusion potential at the feature and decision levels. Experiments for two-class depression classification problem (depressed vs. non-depressed) are performed using Support Vector Machine (SVM) classifiers implemented in speaker-independent configurations on the free speech recordings of the Mundt dataset referred in [3] Related Work The perceptual qualities of depression in voice have been most commonly studied with regard to prosodic and vocal tract perturbations [7,8,9,10]. Studies have shown that the second formant location is most affected by depressive speech. Patients with major depressive disorder had decreased second formant (F2) measurements [7]. Energy variability has been shown to decrease with increasing levels of depression [10,]. Speech-rate as combined phone duration measures [12], statistics of pitch and energy features [13], and voice quality measures [14] have also been useful for detecting depression symptoms. Spectral features such as Mel frequency Cepstral Coefficients (MFCCs), power spectral density, and spectral tilt also potentially include useful information in the classification of depression [8,15,16,17]. In addition, glottal measures have also been analyzed [17,18]. More recently, Cummins et al. [] investigated the effects of depression on speech by analyzing MS features on the trisyllabic sequence PATAKA recordings of the Mundt dataset [3] used in our study. Authors apply log mean subtraction along each acoustic frequency during MS features extraction and report 66.9% weighted accuracy using 10-fold CV for the two-class depression recognition problem. In a later study, authors investigate covariance structure of a Gaussian Mixture Model (GMM) to capture depression based information [19] on Grandfather read-speech passages of the same dataset. The best classification result for the two-class depression recognition problem is presented as 68.6% when variance and weight parameters of Gaussian are updated during adaptation. Sturim et al. also investigate the free speech recordings of the same dataset in leave-one-recording-out fashion. They focus on depression severity as a class distinction and apply joint factor analysis with Wiener filtering for modeling speaker and channel variation [20]. Authors test their system using MFCCs and shifted delta cepstral features modeled with GMMs. The proposed system brings 20-30% of equal error rate gain for two-class depression classification task. The rest of the paper is organized as follows. In Section 2, we summarize the MS features extraction steps. In Section 3, we set up the depression classification problem. In Section 4, we describe our results with baseline and proposed features. Finally, in Section 5, we provide conclusions and projections to future work. Copyright 2014 ISCA September 2014, Singapore
2 2. Modulation Spectrum Modulation spectral analysis tries to capture long-term dynamics within an acoustic signal, which is typically a twodimensional joint acoustic frequency and modulation frequency representation [4,5,6]. Acoustic frequency means the frequency variable of conventional spectrogram derived from short term Fourier transform (STFT) whereas, modulation frequency captures time-varying information through temporal modulation of the signal. The computation of joint acoustic-modulation frequency spectrum is carried out in two phases. First, speech spectrogram is computed using N A -point FFT (Fast Fourier Transform) on each pre-emphasized, Hamming windowed overlapping frame. Let S[n, k] denote the STFT of speech signal as a function of frame index n and acoustic frequency index k (0 k N A /2). The modulation spectrum is derived from the analysis of magnitude spectrogram, S[n,k], within longer duration windows (of length M frames) with some overlap. The windows correspond to a two-dimensional time frequency context, e.g. starting from frame n 0 and having a length of M frames, consists of all frequency bands within the time interval [n 0, n 0+M-1 ]. The temporal trajectory of the k th frequency band within time-frequency context is denoted as T(n 0,M, k) = ( S(n 0, k), S(n 0+1, k),, S(n 0+M-1, k) ) (1) The second N M -point FFT is then applied on meannormalized, Mel-filtered, and Hamming windowed T(n 0,M, k) to produce the modulation spectrum MS(n 0,M,k,q), where q is the modulation frequency index and (0 q N M /2). In our set up, a standard N component Mel filterbank is used to effectively reduce both dimensionality of acoustic frequency domain and correlations between the frequency sub-bands. Additionally, Discrete Cosine Transformation (DCT) is applied to each modulation spectrum MS(n 0,M,k,q) for reducing the modulation frequency domain dimensionality yielding a N M /2+1-dimensional vector of DCT coefficients for each acoustic bin. We retain the lowest D coefficients, including the DC coefficient which preserves the most significant signal energy. The frame-level MS features have N-by-D dimensionality. 3. Experimental Setup We use in-clinic speech recordings of the database originally collected by Mundt et al. for depression severity study [3]. The database contains voice samples from 35 patients (20 F/15 M, ages from 20 to 68 years) subject to depression treatment over a six week period. Depression severity level of participants was observed during clinical interviews at one-week intervals and evaluated using the Hamilton Rating Scale for Depression (HAMD) over the course of treatment [21]. HAMD assessment has 17 symptom sub-topics with scores for each. We use the total HAMD score of individual ratings as the ground truth in defining classes in our study. Recordings with total HAMD score of greater and equal to 17 are assigned the category depressed () and rest of the recordings are assigned the category non-depressed (non-). In this study, 257 samples of free speech recordings are labeled as non-depressed and remaining 2 are labeled as depressed, respectively. Speech features in our study may be considered in two main categories: prosodic and spectral. While these categories are not all inclusive of measurable speech features, they will form the basis of feature extraction described in this work. We tested all features using their sentence-level statistics ( functionals) consisting of maximum, minimum, variance, standard deviation, skewness, kurtosis, quartiles 1,2&3, and percentiles 1.0& Baseline features We use the opensmile [22] and Praat [23] toolkits for baseline features extraction. All the baseline features are extracted on a frame basis within windows of 25 ms with 10 ms frame shifts. Then, statistical functionals are calculated per recording from the frame-level features. The prosodic category features for this study are pitch (F 0 ) and intensity (I), both extracted by using Praat. The vocal tract is commonly quantified by the formant frequencies which are the primary resonances determined by the vocal tract shape during speech production. In this study, vocal tract spectral structure was quantified by the first (F 1 ), second (F 2 ), and third (F 3 ) formant center frequencies and their bandwidths (BW 1, BW 2, BW 3 ) extracted in Praat. The formant center frequencies and bandwidths each represent a unique feature sub-category for analysis. In addition, we extract Mel Frequency Cepstral Coefficients [0-14] (MFCCs) and Line Spectral Pairs [0-7] (LSPs) using the emobase2010 configuration of the opensmile toolkit Classification setup Speaker independent experiments were performed in a leaveone-speaker-out cross validation (LOSO CV) manner using data from each of the 35 speakers as the test set in turn and the data from the other 34 speakers as the training set. The class accuracies are computed on the overall dataset. Then, the classification performance is evaluated by the unweighted average recall rate (UAR), which is the arithmetic average of individual class accuracies. In addition to the UAR rate we also provide the recall rate on the two classes (, non- ) for more insight. We use LibSVM [24] implementation of Support Vector Machines and employ the linear kernel in all experiments with features scale normalization and class weights of [0.45, 0.55] for non- and categories, respectively Baseline results 4. Experimental Results We first present baseline results with the well-known speech acoustic features of two categories: prosody and spectral features. In Table 1, we present a comparison of several standard feature sets. In the upper part of the table, we see the classification performance of individual prosody features, and in the lower part that of spectral features are shown. Among the prosody features, the intensity (I) is the most discriminative feature, whereas F 0 has lower classification rate due to its speaker dependency. For the spectral features, MFCCs are discriminative, but are also very characteristic of the speaker. We can see that MS performs the best, but is very close to the other standard spectral feature sets. Additionally, random classification accuracy is calculated as %. 1244
3 Table 1. Baseline classification rates with prosodic and spectral feature sets. F 0 I F 1 F 2 F 3 BW 1 BW 2 BW 3 MFCC LSP MS Modulation spectrum features parameter setting Modulation spectrum (MS) features are a joint acoustic and modulation frequency representation of speech signals that is obtained by simultaneous spectral analysis of all frequency bins. Thus, frame shift and time-frequency context length (M) are two crucial parameters. Frame shift determines sampling rate for modulation frequency domain and M, on the other hand, controls the resolution of the MS. We extract STFT of the speech signals within windows of length of 32 ms and frame shifts of 17 ms. We tested several values for M (10, 20, 25, and 30) and selected M to be 25 (corresponding to an analysis window of length 425 ms) with the best performance to create a valid baseline for MS features. Additionally, we apply mean normalization of frequency bins (DC removal) prior to Mel filtering. However, variance normalization following mean normalization does not improve results. Moreover, log compression of STFT outputs or MS components does not increase recognition rates. We apply N A =256 point FFT for the calculation of STFT components and N M =128 point FFT for the calculation of MS components. Thus, the original feature vector size for framelevel MS features is For feature dimension reduction, we apply Mel filterbank with N=26 components in the acoustic frequency domain and DCT in the modulation frequency domain. We retain the first D=10 components of the DCTs, which results in a feature vector size of 2860 at the functionals level Modulation spectrum features bin selection Our manageable feature set is a subset of the Mel and DCT bins of Modulation Spectrum representation of speech signal. In Figure 1, we see the classification performance of several selections of the Mel bins as a function of the number of DCT coefficients (always starting from coefficient 1). As we can see, the best result is achieved by taking the Mel bins in the middle range [6-19], corresponding to a frequency range from 668 to 2000 Hz, with an increasing gain as the number of DCT coefficients reduces down to 1. In Table 3, we summarize the best result of the MS feature selection using only the first DCT coefficient and Mel bins in the range of 6-19, in comparison with the original MS features without selection. We denote these selected set of features as MS sel. We can see the dramatic improvement in accuracy of the depressed class, with some degradation on the nondepressed class and overall UAR improvement. UAR ber of DCT coefficients Figure 1: UAR classification performance of the MS features with Mel band selection for varying number of DCT components. Table 3. Classification performance of MS selected features ( Mel bins 6-19 and 1 st DCT ) in comparison with the original MS features with no selection. MS MS sel Feature fusion results We present the performance of several combinations of formant and prosody features in Table 4. We start by fusing three formant frequencies (F 123 ), and three bandwidths (BW 123 ) with moderate performance. Next, we add the intensity (I) features to F 123 and get ~4 % improvement. Adding BW 123 to the features gives only a marginal gain, and adding the F 0 degrades the performance. Finally, the combination of the 3 top-performing individual features (I, F 2, and BW 3 ) from Table 1 gives the best fusion performance as % UAR. Table 4. Classification performance for feature-level fusion of prosody and formant features F BW I+F I+F 123 +BW F 0 +I+F 123 +BW I+F 2 +BW Next, in upper part of Table 5, we show the results of fusing the MS sel features with other features. We can see that none of the feature combinations give any gain beyond the performance of the MS sel features, so our next step was to apply PCA for dimension reduction. Our experimentation with PCA on the MS sel features set did not yield any performance improvement, but for other features (e.g. MFCC) some gain was achieved, probably due to redundancies in feature representation. In lower part of Table 5, we show the results of 1245
4 combing the complete MS sel feature set with a second feature set reduced by PCA. Table 5. Classification performance of fusing the MS sel features with other feature sets, before and after applying PCA on the 2 nd feature set. MFCC LSP I I+F 2 + BW 3 I+F 2 + BW 3 +MFCC I+F 2 + BW 3 +LSP after applying PCA MFCC LSP I I+F 2 + BW 3 I+F 2 + BW 3 +MFCC I+F 2 + BW 3 +LSP To better understand the behavior of the PCA dimension reduction on the second feature set in fusion with the MS sel features, we show in Figure 2 the performance of MS sel fusion with four other feature sets as a function of the number of PCA components of the second feature set. The horizontal dotted line is the MS sel baseline performance. We can see that by selecting few principal components from the second feature set and fusing them with the MS sel features, we are able to improve the classification performance, especially with the feature set (I+F 2 +BW 3 +MFCC). UAR PCA in second feature set Figure 2. Classification performance of PCA on the second feature set in fusion with the MS sel features Correlation results MS+MFCC MS+LSP MS+I MS+(I+F 2 +BW 3 +MFCC) All the classification experiments in the previous sections were performed in a commonly-used 2-class setup of vs. non- classification based on setting a threshold on the total clinical HAMD score. In such setup, the classes are very broad. To avoid the sensitivity of the results, we measured the correlation between the classification result and the clinical total HAMD score. Since the HAMD score is measured on an ordinal scale and its relationship to classification result is monotonic, we used the Spearman rank correlation. The correlation coefficients along with their twotails p-values are shown in Table 6 for several feature sets, for features fusion and some decision fusion experiments. In feature fusion, the features are combined and one classification experiment is performed on which the correlation is measured. In decision fusion, two or more classification experiments are performed, each with a different feature set and the classification results are averaged and the correlation is measured on the fused decision. As we can see, fusion at the features level is more powerful than fusion at the decision level. Table 6. Spearman correlation between classification result and clinical total HAMD score for several features sets, features fusion, and decision fusion. and Fusion Corr p-value MFCC LSP I+F 2 +BW 3 MS sel Fusion MS sel +I+F 2 +BW 3 MS sel +(I+F 2 +BW 3 _PCA 3 ) MS sel +(I+F 2 +BW 3 +MFCC) MS sel +(I+F 2 +BW 3 +MFCC_PCA 5 ) Decision Fusion MS sel + MFCC MS sel + LSP MS sel + LSP + MFCC MS sel + (I+F 2 +BW 3 ) MS sel + (I+F 2 +BW 3 _PCA 3 ) MS sel + (I+F 2 +BW 3 ) + MFCC MS sel + (I+F 2 +BW 3 ) + LSP e-5 6.1e-7 1.7e- 1.9e e- 1.3e e e e e e e e e e Conclusions Our results clearly suggest that the proposed modulation spectrum-based manageable feature set improves the overall discrimination of depressed speech from non-depressed. The selected joint subset of Mel and DCT bins in MS brings a ~7% UAR improvement over the conventional MS feature set performance. Feature fusion of this feature set with formant, intensity, and MFCC features further advances recognition rates up to % UAR when PCA dimension reduction is applied on the second feature set. Correlation results also indicate that our feature fusion classification results are more correlated to clinical rating scores compared to decision fusion of the same feature sets. Future research will involve analysis on other datasets and improvements on feature selection strategy so that an objective analysis tool may be designed for clinical practice. 6. Acknowledgements The authors would like to thank Dr. James C. Mundt for providing the dataset that was collected under the U.S. National Institute of Mental Health Grant R43MH This work is supported by the Dem@Care FP7 project, partially funded by the EC under contract number
5 7. References [1] Greenberg, P.E., Stiglin, L. E., Finkelstein, S.D. and Berndt, E.R., Depression: A neglected major illness, Journal of Clinical Psychiatry, 54, p , [2] Darby, J. K., Speech and voice parameters of depression: A pilot study, J. Commun. Disord., 17, pp , [3] Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K. and Geralts, D. S. Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology, Journal of Neurolinguistics, vol. 20, p , [4] Ivanov, A. & Chen, X., Modulation Spectrum Analysis for Speaker Personality Trait Recognition, INTERSPEECH, ISCA, [5] Markaki, M.; Stylianou, Y.; Arias-Londoño, J.D.; Godino- Llorente, J.I., Dysphonia detection based on modulation spectral features and cepstral coefficients, Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, vol., no., pp.5162,5165, March [6] Wu, S., Falk, T., H., and Chan, W., Y., Automatic speech emotion recognition using modulation spectral features, Speech Communication, vol. 53 (5) p , 20. [7] Flint, A.J., et al., "Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression", Journal of Psychiatric Research, vol. 27(3): p , [8] France, D. J., Shiavi, R. G., Silverman, S., Silverman, M. and Wilkes, M., Acoustical properties of speech as indicators of depression and suicidal risk, Bio-Eng, IEEE Transactions on, vol. 47, p , [9] Ozdas, A., Shiavi, R. G., Silverman, S. E., Silverman, M. K., and Wilkes, D. M., Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk, Bio-Eng, IEEE Transactions on, vol. 51, pp , [10] Quatieri, T. F. and Malyska, N., Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity, in INTERSPEECH-2012, Portland, USA, [] Cummins, N., Epps, J. and Ambikairajah, E., Spectro-Temporal Analysis of Speech Affected by Depression and Psychomotor Retardation, in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, p , [12] Trevino, A., Quatieri, T., and Malyska, N., Phonologicallybased biomarkers for major depressive disorder, EURASIP Journal on Advances in Signal Processing, vol. 20, p. 1-18, 20. [13] Sanchez, M. H., Vergyri, D., Ferrer, L., Richey, C., Garcia, P., Knoth, B., and Jarrold, W., Using Prosodic and Spectral in Detecting Depression in Elderly Males, INTERSPEECH, p , 20. [14] Scherer, S., Stratou, G., Gratch, J. & Morency, L.-P. (2013). Investigating voice quality as a speaker-independent indicator of depression and PTSD.. In F. Bimbot, C. Cerisara, C. Fougeron, G. Gravier, L. Lamel, F. Pellegrino & P. Perrier (eds.), INTERSPEECH (p./pp ), : ISCA [15] Low, L. S. A., N. C. Maddage, et al. "Mel frequency cepstral feature and Gaussian Mixtures for modeling clinical depression in adolescents." in Proc. IEEE Int. Conf. on Cognitive Informatics, 2009, pp [16] Yingthawornsuk, T., Keskinpala, H., K., Wilkes, D., M., Shiavi, R., G., and Salomon, R., M., Direct Acoustic Feature using Iterative EM Algorithm and Spectral Energy for Classifying Suicidal Risk, Interspeech 2007, Antwerp, Belguim. [17] Moore, E., Clements, M., A., Peifer, J. W., and Weisser, L., Critical Analysis of the Impact of Glottal in the Classification of Clinical Depression in Speech, IEEE Trans. Biomed. Engineering,vol. 55 (1), p , [18] Moore, E., Clements, M. Peifer, J., and Weisser, L., Comparing objective feature statistics of speech for classifying clinical depression, IEEE 26 th Annual International Conference of Engineering in Biology and Medicine Society, p.17-20, 2004 [19] Cummins, N., Epps, J., Sethu, V., Breakspear, M. & Goecke, R., Modeling spectral variability for the classification of depressed speech, INTERSPEECH, p , ISCA, [20] Sturim, D., Torres-Carrasquillo, P. A., Quatieri, T. F., Malyska, N., and McCree, A., Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis, Interspeech 20, p , 20. [21] Hamilton, H., HAMD: A rating scale for depression, Neurosurg Psychiat, vol. 23, p , [22] Eyben, F., Wöllmer, M., Schuller, B., opensmile - The Munich Versatile and Fast Open-Source Audio Feature Extractor, in Proc. ACM Multimedia (MM), ACM, Firenze, Italy, [23] Boersma, Paul & Weenink, David (2013). Praat: doing phonetics by computer [Computer program]. Version , available at [24] Chang, C.-C. and C.-J. Lin. LIBSVM: a library for support vector machines, Software available at
Dimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi
More informationSYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE
SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),
More informationIsolated Digit Recognition Using MFCC AND DTW
MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics
More informationRelative phase information for detecting human speech and spoofed speech
Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University
More informationLearning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives
Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationNovel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices
Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Hemant A.Patil 1, Pallavi N. Baljekar T. K. Basu 3 1 Dhirubhai Ambani Institute of Information and
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More informationComparison of Spectral Analysis Methods for Automatic Speech Recognition
INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More informationElectronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis
International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate
More informationPerformance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationDrum Transcription Based on Independent Subspace Analysis
Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,
More informationIntroducing COVAREP: A collaborative voice analysis repository for speech technologies
Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction
More informationCepstrum alanysis of speech signals
Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP
More informationDERIVATION OF TRAPS IN AUDITORY DOMAIN
DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.
More informationLinguistic Phonetics. Spectral Analysis
24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationRECENTLY, there has been an increasing interest in noisy
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In
More informationVoiced/nonvoiced detection based on robustness of voiced epochs
Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies
More informationDetermining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models
Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models Rong Phoophuangpairoj applied signal processing to animal sounds [1]-[3]. In speech recognition, digitized human speech
More informationI D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationEnvironmental Sound Recognition using MP-based Features
Environmental Sound Recognition using MP-based Features Selina Chu, Shri Narayanan *, and C.-C. Jay Kuo * Speech Analysis and Interpretation Lab Signal & Image Processing Institute Department of Computer
More informationI D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b
R E S E A R C H R E P O R T I D I A P Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-47 September 23 Iain McCowan a Hemant Misra a,b to appear
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationGammatone Cepstral Coefficient for Speaker Identification
Gammatone Cepstral Coefficient for Speaker Identification Rahana Fathima 1, Raseena P E 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala, India 1 Asst. Professor, Ilahia
More informationMFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM
www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India
More informationA multi-class method for detecting audio events in news broadcasts
A multi-class method for detecting audio events in news broadcasts Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis Computational Intelligence Laboratory, Institute of Informatics and
More informationCampus Location Recognition using Audio Signals
1 Campus Location Recognition using Audio Signals James Sun,Reid Westwood SUNetID:jsun2015,rwestwoo Email: jsun2015@stanford.edu, rwestwoo@stanford.edu I. INTRODUCTION People use sound both consciously
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationREVERBERATION-BASED FEATURE EXTRACTION FOR ACOUSTIC SCENE CLASSIFICATION. Miloš Marković, Jürgen Geiger
REVERBERATION-BASED FEATURE EXTRACTION FOR ACOUSTIC SCENE CLASSIFICATION Miloš Marković, Jürgen Geiger Huawei Technologies Düsseldorf GmbH, European Research Center, Munich, Germany ABSTRACT 1 We present
More informationUsing RASTA in task independent TANDEM feature extraction
R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t
More informationCP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS
CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS Hamid Eghbal-Zadeh Bernhard Lehner Matthias Dorfer Gerhard Widmer Department of Computational
More informationAspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied
More informationA CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION
More informationDigital Speech Processing and Coding
ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/
More informationEpoch Extraction From Emotional Speech
Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract
More informationSpeech and Music Discrimination based on Signal Modulation Spectrum.
Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we
More informationSeparating Voiced Segments from Music File using MFCC, ZCR and GMM
Separating Voiced Segments from Music File using MFCC, ZCR and GMM Mr. Prashant P. Zirmite 1, Mr. Mahesh K. Patil 2, Mr. Santosh P. Salgar 3,Mr. Veeresh M. Metigoudar 4 1,2,3,4Assistant Professor, Dept.
More informationNoise Reduction on the Raw Signal of Emotiv EEG Neuroheadset
Noise Reduction on the Raw Signal of Emotiv EEG Neuroheadset Raimond-Hendrik Tunnel Institute of Computer Science, University of Tartu Liivi 2 Tartu, Estonia jee7@ut.ee ABSTRACT In this paper, we describe
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationAuditory Based Feature Vectors for Speech Recognition Systems
Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines
More informationROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE
- @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu
More informationIdentification of disguised voices using feature extraction and classification
Identification of disguised voices using feature extraction and classification Lini T Lal, Avani Nath N.J, Dept. of Electronics and Communication, TKMIT, Kollam, Kerala, India linithyvila23@gmail.com,
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationAugmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data
INTERSPEECH 2013 Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data Cong-Thanh Do 1, Claude Barras 1, Viet-Bac Le 2, Achintya K. Sarkar
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure
More informationReading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.
L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationThe SRI AVEC-2014 Evaluation System
The SRI AVEC-2014 Evaluation System Vikramjit Mitra vikramjit.mitra@sri.com Andreas Kathol andreas.kathol@sri.com Elizabeth Shriberg elizabeth.shriberg@sri.com Colleen Richey colleen.richey@sri.com Martin
More informationMeasuring the complexity of sound
PRAMANA c Indian Academy of Sciences Vol. 77, No. 5 journal of November 2011 physics pp. 811 816 Measuring the complexity of sound NANDINI CHATTERJEE SINGH National Brain Research Centre, NH-8, Nainwal
More informationModulation Domain Spectral Subtraction for Speech Enhancement
Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9
More informationRobust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System
Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System Jordi Luque and Javier Hernando Technical University of Catalonia (UPC) Jordi Girona, 1-3 D5, 08034 Barcelona, Spain
More informationA Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)
More informationEffective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a
R E S E A R C H R E P O R T I D I A P Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a IDIAP RR 7-7 January 8 submitted for publication a IDIAP Research Institute,
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationAn Optimization of Audio Classification and Segmentation using GASOM Algorithm
An Optimization of Audio Classification and Segmentation using GASOM Algorithm Dabbabi Karim, Cherif Adnen Research Unity of Processing and Analysis of Electrical and Energetic Systems Faculty of Sciences
More informationFeasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants
Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants Zhi Zhu, Ryota Miyauchi, Yukiko Araki, and Masashi Unoki School of Information Science, Japan Advanced
More informationDetermination of instants of significant excitation in speech using Hilbert envelope and group delay function
Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,
More informationSPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT
SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com
More informationCan binary masks improve intelligibility?
Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationAutomatic Morse Code Recognition Under Low SNR
2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) Automatic Morse Code Recognition Under Low SNR Xianyu Wanga, Qi Zhaob, Cheng Mac, * and Jianping
More informationQuantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationA New Scheme for No Reference Image Quality Assessment
Author manuscript, published in "3rd International Conference on Image Processing Theory, Tools and Applications, Istanbul : Turkey (2012)" A New Scheme for No Reference Image Quality Assessment Aladine
More informationLong Range Acoustic Classification
Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire
More informationAdvances in Speech Signal Processing for Voice Quality Assessment
Processing for Part II University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Bilbao, 2011 September 1 Multi-linear Algebra Features selection 2 Introduction Application:
More informationEC 6501 DIGITAL COMMUNICATION UNIT - II PART A
EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing
More informationChange Point Determination in Audio Data Using Auditory Features
INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features
More informationInternational Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015
RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,
More informationIsolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques
Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT
More informationSONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS
SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R
More informationThe Effects of Noise on Acoustic Parameters
The Effects of Noise on Acoustic Parameters * 1 Turgut Özseven and 2 Muharrem Düğenci 1 Turhal Vocational School, Gaziosmanpaşa University, Turkey * 2 Faculty of Engineering, Department of Industrial Engineering
More informationAudio Fingerprinting using Fractional Fourier Transform
Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,
More informationSound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska
Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure
More informationAn Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation
An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,
More informationMonitoring Infant s Emotional Cry in Domestic Environments using the Capsule Network Architecture
Interspeech 2018 2-6 September 2018, Hyderabad Monitoring Infant s Emotional Cry in Domestic Environments using the Capsule Network Architecture M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationCombining Voice Activity Detection Algorithms by Decision Fusion
Combining Voice Activity Detection Algorithms by Decision Fusion Evgeny Karpov, Zaur Nasibov, Tomi Kinnunen, Pasi Fränti Speech and Image Processing Unit, University of Eastern Finland, Joensuu, Finland
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationADAPTIVE NOISE LEVEL ESTIMATION
Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France
More informationA Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification
A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department
More informationAn Improved Voice Activity Detection Based on Deep Belief Networks
e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 676-683 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com An Improved Voice Activity Detection Based on Deep Belief Networks Shabeeba T. K.
More informationModulation Spectrum Power-law Expansion for Robust Speech Recognition
Modulation Spectrum Power-law Expansion for Robust Speech Recognition Hao-Teng Fan, Zi-Hao Ye and Jeih-weih Hung Department of Electrical Engineering, National Chi Nan University, Nantou, Taiwan E-mail:
More informationNCCF ACF. cepstrum coef. error signal > samples
ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based
More informationSINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor
More information