Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Similar documents
A Full-Band Adaptive Harmonic Representation of Speech

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Applications of Music Processing

Linguistic Phonetics. Spectral Analysis

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Advanced audio analysis. Martin Gasser

Glottal source model selection for stationary singing-voice by low-band envelope matching

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Speech Signal Analysis

Detecting Speech Polarity with High-Order Statistics

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Speech Synthesis using Mel-Cepstral Coefficient Feature

Isolated Digit Recognition Using MFCC AND DTW

On the glottal flow derivative waveform and its properties

L19: Prosodic modification of speech

Slovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova

Epoch Extraction From Emotional Speech

CS 188: Artificial Intelligence Spring Speech in an Hour

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Aalto Aparat A Freely Available Tool for Glottal Inverse Filtering and Voice Source Parameterization

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Speech Synthesis; Pitch Detection and Vocoders

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Digital Speech Processing and Coding

Converting Speaking Voice into Singing Voice

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Complex Sounds. Reading: Yost Ch. 4

SOUND SOURCE RECOGNITION AND MODELING

Cepstrum alanysis of speech signals

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

Robust Algorithms For Speech Reconstruction On Mobile Devices

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis

EE482: Digital Signal Processing Applications

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

Communications Theory and Engineering

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

SPEECH AND SPECTRAL ANALYSIS

Adaptive Filters Application of Linear Prediction

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

Parameterization of the glottal source with the phase plane plot

Speech Enhancement using Wiener filtering

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Glottal inverse filtering based on quadratic programming

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

Edinburgh Research Explorer

Audio Signal Compression using DCT and LPC Techniques

A Comparative Study of Formant Frequencies Estimation Techniques

Tools for Advanced Sound & Vibration Analysis

Relative phase information for detecting human speech and spoofed speech

Lecture 6: Speech modeling and synthesis

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Discrete Fourier Transform (DFT)

Lecture 5: Speech modeling. The speech signal

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION

Overview of Code Excited Linear Predictive Coder

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Voice Activity Detection

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Synthesis Algorithms and Validation

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

Pitch Period of Speech Signals Preface, Determination and Transformation

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

NCCF ACF. cepstrum coef. error signal > samples

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

Autonomous Vehicle Speaker Verification System

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Voiced/nonvoiced detection based on robustness of voiced epochs

Audio processing methods on marine mammal vocalizations

Learning Human Context through Unobtrusive Methods

Vocal effort modification for singing synthesis

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Electric Guitar Pickups Recognition

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

Linear Predictive Coding *

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

The GlottHMM Entry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved Excitation Generation

USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

Transcription:

Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1

Introduction (a) Gilles Degottex (b) Thomas Drugman (c) Tuomo Raitio (d) Stefan Scherer COVAREP - Open-source speech processing repository 2

Motivation...open, well-documented, and well-tested scientific code is essential not only to reproducibility in modern scientific research, but to the very progression of research itself. COVAREP - Open-source speech processing repository 3

Related toolkits KALDI - Speech recognition toolkit - Speech processing toolkit VOICEBOX - Speech analysis toolkit COVAREP - Open-source speech processing repository 4

Solution? Fast, effective results every time COVAREP - Open-source speech processing repository 5

COVAREP - Aims Website: http://covarep.github.io/covarep/index.html GitHub: https://github.com/covarep/covarep COVAREP - Open-source speech processing repository 6

COVAREP - Aims More reproducible research Increase the availability and impact of speech processing algorithms Participation and feedback COVAREP - Open-source speech processing repository 7

COVAREP - Scope Broad scope - any speech signal processing algorithms Speech analysis, synthesis, conversion, transformation, speech quality, enhancement, glottal source/voice quality analysis, etc. Use! Contribute! COVAREP - Open-source speech processing repository 8

Overview of COVAREP Speech Signal Polarity Detection Pitch Tracking GCI SpectraldEnvelope d GlottaldFlow Sinusoidal Modeling Phase-based Representation Formant Tracking GlottaldFlow Parameterization COVAREP - Open-source speech processing repository 9

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 10

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 11

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling COVAREP - Open-source speech processing repository 12

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling 4. Glottal analysis COVAREP - Open-source speech processing repository 13

Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow 4. Phase analysis Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling 4. Glottal analysis COVAREP - Open-source speech processing repository 14

COVAREP - Periodicity & synchronicity Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 15

COVAREP - Periodicity & synchronicity Polarity detection f 0 and voicing decision extraction Detection of glottal closure instants COVAREP - Open-source speech processing repository 16

Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Speech amplitude spectrum COVAREP - Open-source speech processing repository 17

Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Residual spectrum 0 Amplitude (db) 20 40 60 0 1000 2000 3000 4000 5000 Frequency (Hz) Envelope-removed speech amplitude spectrum COVAREP - Open-source speech processing repository 18

Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) 0 50 0 1000 2000 3000 4000 5000 Frequency (Hz) Residual spectrum 0 Amplitude (db) 20 40 60 0 1000 2000 3000 4000 5000 Frequency (Hz) SRH(f) = E(f )+ N k=2 [E(k f ) E((k 0.5) f )] for f [F 0 min, F 0 max ] where E is the residual spectrum, f is frequency (Hz) and N is the number of harmonics considered COVAREP - Open-source speech processing repository 19

Periodicity & synchronicity - F0 extraction 250 Residual harmonic summation Frequency (Hz) 200 150 100 50 0.5 1 1.5 2 2.5 3 Time (seconds) Residual harmonic summation over time COVAREP - Open-source speech processing repository 20

5000 Frequency [Hz] 4000 3000 2000 COVAREP - Periodicity & synchronicity 1000 0 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.15 Glottal Flow (GF) derivative with GCIs 0.1 Amplitude 0.05 0 0.05 0.1 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 Time [s] Detected glottal closure instants COVAREP - Open-source speech processing repository 21

COVAREP - Spectral envelope estimation 2. Spectral envelope Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 22

COVAREP - Spectral envelope estimation Discrete all-pole (DAP) model True envelope (TE) - spectral envelope by iterative cepstral smoothing Weighted linear prediction Conversion from envelope to Mel-Frequency Cepstral Coefficients (MFCC) COVAREP - Open-source speech processing repository 23

COVAREP - Spectral envelope estimation 30 Speech spectrum 20 10 Amplitude (db) 0 10 20 30 40 50 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) Speech amplitude spectrum COVAREP - Open-source speech processing repository 24

COVAREP - Spectral envelope estimation 30 Speech spectrum with mel spaced filters 1 10 0.75 Amplitude (db) 10 0.5 30 0.25 50 0 1000 2000 3000 4000 5000 6000 7000 8000 0 Frequency (Hz) Speech spectrum with mel-spaced triangular filters COVAREP - Open-source speech processing repository 25

COVAREP - Spectral envelope estimation 40 Speech spectrum with "True Envelope" 20 0 Amplitude (db) 20 40 60 80 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) Speech spectrum with TE spectral envelope COVAREP - Open-source speech processing repository 26

COVAREP - Spectral envelope estimation 30 "True Envelope" spectrum with mel spaced filters 1 10 0.75 Amplitude (db) 10 0.5 30 0.25 50 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) TE spectral envelope with mel-spaced triangular filters COVAREP - Open-source speech processing repository 27

COVAREP - Sinusoidal modelling Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling COVAREP - Open-source speech processing repository 28

COVAREP - Sinusoidal modelling Harmonic model Quasi-Harmonic Model (QHM) Adaptive Harmonic Model (ahm) Harmonic synthesis COVAREP - Open-source speech processing repository 29

COVAREP - Glottal analysis Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 4. Glottal analysis COVAREP - Open-source speech processing repository 30

COVAREP - Glottal analysis COVAREP - Open-source speech processing repository 31

COVAREP - Glottal analysis Deconvolution of glottal source and vocal tract components Algorithms for parameterising the glottal source Detection of changes in tone-of-voice and voice quality COVAREP - Open-source speech processing repository 32

COVAREP - Glottal analysis Vocal effort COVAREP - Open-source speech processing repository 33

COVAREP - Glottal analysis 8000 4000 Frequency (Hz) 2000 1000 500 250 125 0 0.005 0.01 0.015 0.02 Time (seconds) Wavelet decomposition of an impulse COVAREP - Open-source speech processing repository 34

COVAREP - Glottal analysis Amplitude Amplitude 1 0.8 0.6 0.4 0.2 0 0.3 0.31 0.32 0.33 0.34 0.35 Time (seconds) 125 Hz 250 Hz 500 Hz 1 khz 2 khz 4 khz 8 khz 1 0.8 0.6 0.4 0.2 0 0.3 0.31 0.32 0.33 0.34 0.35 Time (seconds) All peaks across the different frequency bands for breathy (top) and tense (bottom) speech samples COVAREP - Open-source speech processing repository 35

COVAREP - Phase processing Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow 4. Phase analysis Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 36

COVAREP - Phase processing Relative phase shift - speaker verification Phase distortion - emotional valence detection Chirp group delay represenation - detection of voice disorders COVAREP - Open-source speech processing repository 37

Emotion classification experiment Speech data: Berlin emotion database (10 speakers, 7 acted emotions, 500+ utterances) Class labellng: Emotion vs non-emotion (binary), Passive-neutral-active (3-class) Feature extraction: Using COVAREP v1.1.0 Classification: Support vector machines (RBF kernel) Validation: Speaker independent, leave-one-speaker-out COVAREP - Open-source speech processing repository 38

Emotion classification experiment Feature sets MFCC: Standard Mel-frequency cepstral coefficients TE-MFCC MFCCs derived from True Envelope representation Glottal/VQ: Glottal and voice quality related features ALL: TE-MFCC and Glottal/VQ combined SEL: 10 most discriminative features Speaker independent - Leave-one-speaker-out classification experiments COVAREP - Open-source speech processing repository 39

Emotion classification experiment - Results 0 peakslope 0.2 0.4 Neutral Anger Bored Disgust Fear Happy Sad 2 Rd 1.5 1 0.5 Neutral Anger Bored Disgust Fear Happy Sad COVAREP - Open-source speech processing repository 40

Emotion classification experiment - Results 40 Emotion vs neutral Activation (3 class) Error (%) 30 20 10 0 MFCCs TE_MFCCs Glottal/VQ ALL SEL COVAREP - Open-source speech processing repository 41

Emotion classification experiment - Results Table: Confusion matrix (%) MFCCs Glottal/VQ Neutral Emotion Neutral Emotion Neutral 48 52 82 18 Emotion 18 82 27 73 COVAREP - Open-source speech processing repository 42

Emotion classification experiment - Results COVAREP - Open-source speech processing repository 43

Potential applications for COVAREP algorithms Speech synthesis Speech recognition Modelling variation in speaking styles and affective states Speaker verification Voice pathology detection Lots of others!! COVAREP - Open-source speech processing repository 44

COVAREP summary Repository of open-source speech processing algorithms Cross-unversity/country effort Fast access to newly developed state-of-the-art algorithms Improve visability and impact More reproducible research COVAREP - Open-source speech processing repository 45

... and finally! COVAREP - Open-source speech processing repository 46

Thank you! Resources: Website: http://covarep.github.io/covarep/ GitHub: https://github.com/covarep/covarep Paper: Degottex, G., Kane, J., Drugman, T., Raitio, T., COVAREP - A collaborative voice analysis repository for speech technologies, Submitted to ICASSP 2014 COVAREP - Open-source speech processing repository 47