Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1
Introduction (a) Gilles Degottex (b) Thomas Drugman (c) Tuomo Raitio (d) Stefan Scherer COVAREP - Open-source speech processing repository 2
Motivation...open, well-documented, and well-tested scientific code is essential not only to reproducibility in modern scientific research, but to the very progression of research itself. COVAREP - Open-source speech processing repository 3
Related toolkits KALDI - Speech recognition toolkit - Speech processing toolkit VOICEBOX - Speech analysis toolkit COVAREP - Open-source speech processing repository 4
Solution? Fast, effective results every time COVAREP - Open-source speech processing repository 5
COVAREP - Aims Website: http://covarep.github.io/covarep/index.html GitHub: https://github.com/covarep/covarep COVAREP - Open-source speech processing repository 6
COVAREP - Aims More reproducible research Increase the availability and impact of speech processing algorithms Participation and feedback COVAREP - Open-source speech processing repository 7
COVAREP - Scope Broad scope - any speech signal processing algorithms Speech analysis, synthesis, conversion, transformation, speech quality, enhancement, glottal source/voice quality analysis, etc. Use! Contribute! COVAREP - Open-source speech processing repository 8
Overview of COVAREP Speech Signal Polarity Detection Pitch Tracking GCI SpectraldEnvelope d GlottaldFlow Sinusoidal Modeling Phase-based Representation Formant Tracking GlottaldFlow Parameterization COVAREP - Open-source speech processing repository 9
Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 10
Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 11
Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling COVAREP - Open-source speech processing repository 12
Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling 4. Glottal analysis COVAREP - Open-source speech processing repository 13
Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow 4. Phase analysis Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling 4. Glottal analysis COVAREP - Open-source speech processing repository 14
COVAREP - Periodicity & synchronicity Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 15
COVAREP - Periodicity & synchronicity Polarity detection f 0 and voicing decision extraction Detection of glottal closure instants COVAREP - Open-source speech processing repository 16
Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Speech amplitude spectrum COVAREP - Open-source speech processing repository 17
Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Residual spectrum 0 Amplitude (db) 20 40 60 0 1000 2000 3000 4000 5000 Frequency (Hz) Envelope-removed speech amplitude spectrum COVAREP - Open-source speech processing repository 18
Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Residual spectrum 0 Amplitude (db) 20 40 60 0 1000 2000 3000 4000 5000 Frequency (Hz) SRH(f) = E(f )+ N k=2 [E(k f ) E((k 0.5) f )] for f [F 0 min, F 0 max ] where E is the residual spectrum, f is frequency (Hz) and N is the number of harmonics considered COVAREP - Open-source speech processing repository 19
Periodicity & synchronicity - F0 extraction 250 Residual harmonic summation Frequency (Hz) 200 150 100 50 0.5 1 1.5 2 2.5 3 Time (seconds) Residual harmonic summation over time COVAREP - Open-source speech processing repository 20
5000 Frequency [Hz] 4000 3000 2000 COVAREP - Periodicity & synchronicity 1000 0 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.15 Glottal Flow (GF) derivative with GCIs 0.1 Amplitude 0.05 0 0.05 0.1 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 Time [s] Detected glottal closure instants COVAREP - Open-source speech processing repository 21
COVAREP - Spectral envelope estimation 2. Spectral envelope Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 22
COVAREP - Spectral envelope estimation Discrete all-pole (DAP) model True envelope (TE) - spectral envelope by iterative cepstral smoothing Weighted linear prediction Conversion from envelope to Mel-Frequency Cepstral Coefficients (MFCC) COVAREP - Open-source speech processing repository 23
COVAREP - Spectral envelope estimation 30 Speech spectrum 20 10 Amplitude (db) 0 10 20 30 40 50 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) Speech amplitude spectrum COVAREP - Open-source speech processing repository 24
COVAREP - Spectral envelope estimation 30 Speech spectrum with mel spaced filters 1 10 0.75 Amplitude (db) 10 0.5 30 0.25 50 0 1000 2000 3000 4000 5000 6000 7000 8000 0 Frequency (Hz) Speech spectrum with mel-spaced triangular filters COVAREP - Open-source speech processing repository 25
COVAREP - Spectral envelope estimation 40 Speech spectrum with "True Envelope" 20 0 Amplitude (db) 20 40 60 80 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) Speech spectrum with TE spectral envelope COVAREP - Open-source speech processing repository 26
COVAREP - Spectral envelope estimation 30 "True Envelope" spectrum with mel spaced filters 1 10 0.75 Amplitude (db) 10 0.5 30 0.25 50 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) TE spectral envelope with mel-spaced triangular filters COVAREP - Open-source speech processing repository 27
COVAREP - Sinusoidal modelling Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling COVAREP - Open-source speech processing repository 28
COVAREP - Sinusoidal modelling Harmonic model Quasi-Harmonic Model (QHM) Adaptive Harmonic Model (ahm) Harmonic synthesis COVAREP - Open-source speech processing repository 29
COVAREP - Glottal analysis Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 4. Glottal analysis COVAREP - Open-source speech processing repository 30
COVAREP - Glottal analysis COVAREP - Open-source speech processing repository 31
COVAREP - Glottal analysis Deconvolution of glottal source and vocal tract components Algorithms for parameterising the glottal source Detection of changes in tone-of-voice and voice quality COVAREP - Open-source speech processing repository 32
COVAREP - Glottal analysis Vocal effort COVAREP - Open-source speech processing repository 33
COVAREP - Glottal analysis 8000 4000 Frequency (Hz) 2000 1000 500 250 125 0 0.005 0.01 0.015 0.02 Time (seconds) Wavelet decomposition of an impulse COVAREP - Open-source speech processing repository 34
COVAREP - Glottal analysis Amplitude Amplitude 1 0.8 0.6 0.4 0.2 0 0.3 0.31 0.32 0.33 0.34 0.35 Time (seconds) 125 Hz 250 Hz 500 Hz 1 khz 2 khz 4 khz 8 khz 1 0.8 0.6 0.4 0.2 0 0.3 0.31 0.32 0.33 0.34 0.35 Time (seconds) All peaks across the different frequency bands for breathy (top) and tense (bottom) speech samples COVAREP - Open-source speech processing repository 35
COVAREP - Phase processing Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow 4. Phase analysis Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 36
COVAREP - Phase processing Relative phase shift - speaker verification Phase distortion - emotional valence detection Chirp group delay represenation - detection of voice disorders COVAREP - Open-source speech processing repository 37
Emotion classification experiment Speech data: Berlin emotion database (10 speakers, 7 acted emotions, 500+ utterances) Class labellng: Emotion vs non-emotion (binary), Passive-neutral-active (3-class) Feature extraction: Using COVAREP v1.1.0 Classification: Support vector machines (RBF kernel) Validation: Speaker independent, leave-one-speaker-out COVAREP - Open-source speech processing repository 38
Emotion classification experiment Feature sets MFCC: Standard Mel-frequency cepstral coefficients TE-MFCC MFCCs derived from True Envelope representation Glottal/VQ: Glottal and voice quality related features ALL: TE-MFCC and Glottal/VQ combined SEL: 10 most discriminative features Speaker independent - Leave-one-speaker-out classification experiments COVAREP - Open-source speech processing repository 39
Emotion classification experiment - Results 0 peakslope 0.2 0.4 Neutral Anger Bored Disgust Fear Happy Sad 2 Rd 1.5 1 0.5 Neutral Anger Bored Disgust Fear Happy Sad COVAREP - Open-source speech processing repository 40
Emotion classification experiment - Results 40 Emotion vs neutral Activation (3 class) Error (%) 30 20 10 0 MFCCs TE_MFCCs Glottal/VQ ALL SEL COVAREP - Open-source speech processing repository 41
Emotion classification experiment - Results Table: Confusion matrix (%) MFCCs Glottal/VQ Neutral Emotion Neutral Emotion Neutral 48 52 82 18 Emotion 18 82 27 73 COVAREP - Open-source speech processing repository 42
Emotion classification experiment - Results COVAREP - Open-source speech processing repository 43
Potential applications for COVAREP algorithms Speech synthesis Speech recognition Modelling variation in speaking styles and affective states Speaker verification Voice pathology detection Lots of others!! COVAREP - Open-source speech processing repository 44
COVAREP summary Repository of open-source speech processing algorithms Cross-unversity/country effort Fast access to newly developed state-of-the-art algorithms Improve visability and impact More reproducible research COVAREP - Open-source speech processing repository 45
... and finally! COVAREP - Open-source speech processing repository 46
Thank you! Resources: Website: http://covarep.github.io/covarep/ GitHub: https://github.com/covarep/covarep Paper: Degottex, G., Kane, J., Drugman, T., Raitio, T., COVAREP - A collaborative voice analysis repository for speech technologies, Submitted to ICASSP 2014 COVAREP - Open-source speech processing repository 47