Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1

Introduction (a) Gilles Degottex (b) Thomas Drugman (c) Tuomo Raitio (d) Stefan Scherer COVAREP - Open-source speech processing repository 2

Motivation...open, well-documented, and well-tested scientific code is essential not only to reproducibility in modern scientific research, but to the very progression of research itself. COVAREP - Open-source speech processing repository 3

Related toolkits KALDI - Speech recognition toolkit - Speech processing toolkit VOICEBOX - Speech analysis toolkit COVAREP - Open-source speech processing repository 4

Solution? Fast, effective results every time COVAREP - Open-source speech processing repository 5

COVAREP - Aims Website: http://covarep.github.io/covarep/index.html GitHub: https://github.com/covarep/covarep COVAREP - Open-source speech processing repository 6

COVAREP - Aims More reproducible research Increase the availability and impact of speech processing algorithms Participation and feedback COVAREP - Open-source speech processing repository 7

COVAREP - Scope Broad scope - any speech signal processing algorithms Speech analysis, synthesis, conversion, transformation, speech quality, enhancement, glottal source/voice quality analysis, etc. Use! Contribute! COVAREP - Open-source speech processing repository 8

Overview of COVAREP Speech Signal Polarity Detection Pitch Tracking GCI SpectraldEnvelope d GlottaldFlow Sinusoidal Modeling Phase-based Representation Formant Tracking GlottaldFlow Parameterization COVAREP - Open-source speech processing repository 9

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 10

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 11

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling COVAREP - Open-source speech processing repository 12

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling 4. Glottal analysis COVAREP - Open-source speech processing repository 13

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow 4. Phase analysis Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling 4. Glottal analysis COVAREP - Open-source speech processing repository 14

COVAREP - Periodicity & synchronicity Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 15

COVAREP - Periodicity & synchronicity Polarity detection f 0 and voicing decision extraction Detection of glottal closure instants COVAREP - Open-source speech processing repository 16

Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Speech amplitude spectrum COVAREP - Open-source speech processing repository 17

Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Residual spectrum 0 Amplitude (db) 20 40 60 0 1000 2000 3000 4000 5000 Frequency (Hz) Envelope-removed speech amplitude spectrum COVAREP - Open-source speech processing repository 18

Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Residual spectrum 0 Amplitude (db) 20 40 60 0 1000 2000 3000 4000 5000 Frequency (Hz) SRH(f) = E(f )+ N k=2 [E(k f ) E((k 0.5) f )] for f [F 0 min, F 0 max ] where E is the residual spectrum, f is frequency (Hz) and N is the number of harmonics considered COVAREP - Open-source speech processing repository 19

Periodicity & synchronicity - F0 extraction 250 Residual harmonic summation Frequency (Hz) 200 150 100 50 0.5 1 1.5 2 2.5 3 Time (seconds) Residual harmonic summation over time COVAREP - Open-source speech processing repository 20

5000 Frequency [Hz] 4000 3000 2000 COVAREP - Periodicity & synchronicity 1000 0 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.15 Glottal Flow (GF) derivative with GCIs 0.1 Amplitude 0.05 0 0.05 0.1 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 Time [s] Detected glottal closure instants COVAREP - Open-source speech processing repository 21

COVAREP - Spectral envelope estimation 2. Spectral envelope Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 22

COVAREP - Spectral envelope estimation Discrete all-pole (DAP) model True envelope (TE) - spectral envelope by iterative cepstral smoothing Weighted linear prediction Conversion from envelope to Mel-Frequency Cepstral Coefficients (MFCC) COVAREP - Open-source speech processing repository 23

COVAREP - Spectral envelope estimation 30 Speech spectrum 20 10 Amplitude (db) 0 10 20 30 40 50 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) Speech amplitude spectrum COVAREP - Open-source speech processing repository 24

COVAREP - Spectral envelope estimation 30 Speech spectrum with mel spaced filters 1 10 0.75 Amplitude (db) 10 0.5 30 0.25 50 0 1000 2000 3000 4000 5000 6000 7000 8000 0 Frequency (Hz) Speech spectrum with mel-spaced triangular filters COVAREP - Open-source speech processing repository 25

COVAREP - Spectral envelope estimation 40 Speech spectrum with "True Envelope" 20 0 Amplitude (db) 20 40 60 80 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) Speech spectrum with TE spectral envelope COVAREP - Open-source speech processing repository 26

COVAREP - Spectral envelope estimation 30 "True Envelope" spectrum with mel spaced filters 1 10 0.75 Amplitude (db) 10 0.5 30 0.25 50 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) TE spectral envelope with mel-spaced triangular filters COVAREP - Open-source speech processing repository 27

COVAREP - Sinusoidal modelling Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling COVAREP - Open-source speech processing repository 28

COVAREP - Sinusoidal modelling Harmonic model Quasi-Harmonic Model (QHM) Adaptive Harmonic Model (ahm) Harmonic synthesis COVAREP - Open-source speech processing repository 29

COVAREP - Glottal analysis Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 4. Glottal analysis COVAREP - Open-source speech processing repository 30

COVAREP - Glottal analysis COVAREP - Open-source speech processing repository 31

COVAREP - Glottal analysis Deconvolution of glottal source and vocal tract components Algorithms for parameterising the glottal source Detection of changes in tone-of-voice and voice quality COVAREP - Open-source speech processing repository 32

COVAREP - Glottal analysis Vocal effort COVAREP - Open-source speech processing repository 33

COVAREP - Glottal analysis 8000 4000 Frequency (Hz) 2000 1000 500 250 125 0 0.005 0.01 0.015 0.02 Time (seconds) Wavelet decomposition of an impulse COVAREP - Open-source speech processing repository 34

COVAREP - Glottal analysis Amplitude Amplitude 1 0.8 0.6 0.4 0.2 0 0.3 0.31 0.32 0.33 0.34 0.35 Time (seconds) 125 Hz 250 Hz 500 Hz 1 khz 2 khz 4 khz 8 khz 1 0.8 0.6 0.4 0.2 0 0.3 0.31 0.32 0.33 0.34 0.35 Time (seconds) All peaks across the different frequency bands for breathy (top) and tense (bottom) speech samples COVAREP - Open-source speech processing repository 35

COVAREP - Phase processing Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow 4. Phase analysis Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 36

COVAREP - Phase processing Relative phase shift - speaker verification Phase distortion - emotional valence detection Chirp group delay represenation - detection of voice disorders COVAREP - Open-source speech processing repository 37

Emotion classification experiment Speech data: Berlin emotion database (10 speakers, 7 acted emotions, 500+ utterances) Class labellng: Emotion vs non-emotion (binary), Passive-neutral-active (3-class) Feature extraction: Using COVAREP v1.1.0 Classification: Support vector machines (RBF kernel) Validation: Speaker independent, leave-one-speaker-out COVAREP - Open-source speech processing repository 38

Emotion classification experiment Feature sets MFCC: Standard Mel-frequency cepstral coefficients TE-MFCC MFCCs derived from True Envelope representation Glottal/VQ: Glottal and voice quality related features ALL: TE-MFCC and Glottal/VQ combined SEL: 10 most discriminative features Speaker independent - Leave-one-speaker-out classification experiments COVAREP - Open-source speech processing repository 39

Emotion classification experiment - Results 0 peakslope 0.2 0.4 Neutral Anger Bored Disgust Fear Happy Sad 2 Rd 1.5 1 0.5 Neutral Anger Bored Disgust Fear Happy Sad COVAREP - Open-source speech processing repository 40

Emotion classification experiment - Results 40 Emotion vs neutral Activation (3 class) Error (%) 30 20 10 0 MFCCs TE_MFCCs Glottal/VQ ALL SEL COVAREP - Open-source speech processing repository 41

Emotion classification experiment - Results Table: Confusion matrix (%) MFCCs Glottal/VQ Neutral Emotion Neutral Emotion Neutral 48 52 82 18 Emotion 18 82 27 73 COVAREP - Open-source speech processing repository 42

Emotion classification experiment - Results COVAREP - Open-source speech processing repository 43

Potential applications for COVAREP algorithms Speech synthesis Speech recognition Modelling variation in speaking styles and affective states Speaker verification Voice pathology detection Lots of others!! COVAREP - Open-source speech processing repository 44

COVAREP summary Repository of open-source speech processing algorithms Cross-unversity/country effort Fast access to newly developed state-of-the-art algorithms Improve visability and impact More reproducible research COVAREP - Open-source speech processing repository 45

... and finally! COVAREP - Open-source speech processing repository 46

Thank you! Resources: Website: http://covarep.github.io/covarep/ GitHub: https://github.com/covarep/covarep Paper: Degottex, G., Kane, J., Drugman, T., Raitio, T., COVAREP - A collaborative voice analysis repository for speech technologies, Submitted to ICASSP 2014 COVAREP - Open-source speech processing repository 47