Introducing COVAREP: A collaborative voice analysis repository for speech technologies
|
|
- Stuart Shaw
- 5 years ago
- Views:
Transcription
1 Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1
2 Introduction (a) Gilles Degottex (b) Thomas Drugman (c) Tuomo Raitio (d) Stefan Scherer COVAREP - Open-source speech processing repository 2
3 Motivation...open, well-documented, and well-tested scientific code is essential not only to reproducibility in modern scientific research, but to the very progression of research itself. COVAREP - Open-source speech processing repository 3
4 Related toolkits KALDI - Speech recognition toolkit - Speech processing toolkit VOICEBOX - Speech analysis toolkit COVAREP - Open-source speech processing repository 4
5 Solution? Fast, effective results every time COVAREP - Open-source speech processing repository 5
6 COVAREP - Aims Website: GitHub: COVAREP - Open-source speech processing repository 6
7 COVAREP - Aims More reproducible research Increase the availability and impact of speech processing algorithms Participation and feedback COVAREP - Open-source speech processing repository 7
8 COVAREP - Scope Broad scope - any speech signal processing algorithms Speech analysis, synthesis, conversion, transformation, speech quality, enhancement, glottal source/voice quality analysis, etc. Use! Contribute! COVAREP - Open-source speech processing repository 8
9 Overview of COVAREP Speech Signal Polarity Detection Pitch Tracking GCI SpectraldEnvelope d GlottaldFlow Sinusoidal Modeling Phase-based Representation Formant Tracking GlottaldFlow Parameterization COVAREP - Open-source speech processing repository 9
10 Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 10
11 Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 11
12 Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling COVAREP - Open-source speech processing repository 12
13 Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling 4. Glottal analysis COVAREP - Open-source speech processing repository 13
14 Overview of COVAREP Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI 2. Spectral envelope Spectral Envelope Glottal Flow 4. Phase analysis Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling 4. Glottal analysis COVAREP - Open-source speech processing repository 14
15 COVAREP - Periodicity & synchronicity Speech Signal 1. Periodicity Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 15
16 COVAREP - Periodicity & synchronicity Polarity detection f 0 and voicing decision extraction Detection of glottal closure instants COVAREP - Open-source speech processing repository 16
17 Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) Frequency (Hz) Speech amplitude spectrum COVAREP - Open-source speech processing repository 17
18 Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) Frequency (Hz) Residual spectrum 0 Amplitude (db) Frequency (Hz) Envelope-removed speech amplitude spectrum COVAREP - Open-source speech processing repository 18
19 Periodicity & synchronicity - F0 extraction 50 Speech spectrum Amplitude (db) Frequency (Hz) Residual spectrum 0 Amplitude (db) Frequency (Hz) SRH(f) = E(f )+ N k=2 [E(k f ) E((k 0.5) f )] for f [F 0 min, F 0 max ] where E is the residual spectrum, f is frequency (Hz) and N is the number of harmonics considered COVAREP - Open-source speech processing repository 19
20 Periodicity & synchronicity - F0 extraction 250 Residual harmonic summation Frequency (Hz) Time (seconds) Residual harmonic summation over time COVAREP - Open-source speech processing repository 20
21 5000 Frequency [Hz] COVAREP - Periodicity & synchronicity Glottal Flow (GF) derivative with GCIs 0.1 Amplitude Time [s] Detected glottal closure instants COVAREP - Open-source speech processing repository 21
22 COVAREP - Spectral envelope estimation 2. Spectral envelope Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 22
23 COVAREP - Spectral envelope estimation Discrete all-pole (DAP) model True envelope (TE) - spectral envelope by iterative cepstral smoothing Weighted linear prediction Conversion from envelope to Mel-Frequency Cepstral Coefficients (MFCC) COVAREP - Open-source speech processing repository 23
24 COVAREP - Spectral envelope estimation 30 Speech spectrum Amplitude (db) Frequency (Hz) Speech amplitude spectrum COVAREP - Open-source speech processing repository 24
25 COVAREP - Spectral envelope estimation 30 Speech spectrum with mel spaced filters Amplitude (db) Frequency (Hz) Speech spectrum with mel-spaced triangular filters COVAREP - Open-source speech processing repository 25
26 COVAREP - Spectral envelope estimation 40 Speech spectrum with "True Envelope" 20 0 Amplitude (db) Frequency (Hz) Speech spectrum with TE spectral envelope COVAREP - Open-source speech processing repository 26
27 COVAREP - Spectral envelope estimation 30 "True Envelope" spectrum with mel spaced filters Amplitude (db) Frequency (Hz) TE spectral envelope with mel-spaced triangular filters COVAREP - Open-source speech processing repository 27
28 COVAREP - Sinusoidal modelling Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 3. Sine modelling COVAREP - Open-source speech processing repository 28
29 COVAREP - Sinusoidal modelling Harmonic model Quasi-Harmonic Model (QHM) Adaptive Harmonic Model (ahm) Harmonic synthesis COVAREP - Open-source speech processing repository 29
30 COVAREP - Glottal analysis Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation 4. Glottal analysis COVAREP - Open-source speech processing repository 30
31 COVAREP - Glottal analysis COVAREP - Open-source speech processing repository 31
32 COVAREP - Glottal analysis Deconvolution of glottal source and vocal tract components Algorithms for parameterising the glottal source Detection of changes in tone-of-voice and voice quality COVAREP - Open-source speech processing repository 32
33 COVAREP - Glottal analysis Vocal effort COVAREP - Open-source speech processing repository 33
34 COVAREP - Glottal analysis Frequency (Hz) Time (seconds) Wavelet decomposition of an impulse COVAREP - Open-source speech processing repository 34
35 COVAREP - Glottal analysis Amplitude Amplitude Time (seconds) 125 Hz 250 Hz 500 Hz 1 khz 2 khz 4 khz 8 khz Time (seconds) All peaks across the different frequency bands for breathy (top) and tense (bottom) speech samples COVAREP - Open-source speech processing repository 35
36 COVAREP - Phase processing Speech Signal Pitch Tracking Polarity Detection GCI Spectral Envelope Glottal Flow 4. Phase analysis Sinusoidal Modeling Formant Tracking Glottal Flow Parameterization Phase-based Representation COVAREP - Open-source speech processing repository 36
37 COVAREP - Phase processing Relative phase shift - speaker verification Phase distortion - emotional valence detection Chirp group delay represenation - detection of voice disorders COVAREP - Open-source speech processing repository 37
38 Emotion classification experiment Speech data: Berlin emotion database (10 speakers, 7 acted emotions, 500+ utterances) Class labellng: Emotion vs non-emotion (binary), Passive-neutral-active (3-class) Feature extraction: Using COVAREP v1.1.0 Classification: Support vector machines (RBF kernel) Validation: Speaker independent, leave-one-speaker-out COVAREP - Open-source speech processing repository 38
39 Emotion classification experiment Feature sets MFCC: Standard Mel-frequency cepstral coefficients TE-MFCC MFCCs derived from True Envelope representation Glottal/VQ: Glottal and voice quality related features ALL: TE-MFCC and Glottal/VQ combined SEL: 10 most discriminative features Speaker independent - Leave-one-speaker-out classification experiments COVAREP - Open-source speech processing repository 39
40 Emotion classification experiment - Results 0 peakslope Neutral Anger Bored Disgust Fear Happy Sad 2 Rd Neutral Anger Bored Disgust Fear Happy Sad COVAREP - Open-source speech processing repository 40
41 Emotion classification experiment - Results 40 Emotion vs neutral Activation (3 class) Error (%) MFCCs TE_MFCCs Glottal/VQ ALL SEL COVAREP - Open-source speech processing repository 41
42 Emotion classification experiment - Results Table: Confusion matrix (%) MFCCs Glottal/VQ Neutral Emotion Neutral Emotion Neutral Emotion COVAREP - Open-source speech processing repository 42
43 Emotion classification experiment - Results COVAREP - Open-source speech processing repository 43
44 Potential applications for COVAREP algorithms Speech synthesis Speech recognition Modelling variation in speaking styles and affective states Speaker verification Voice pathology detection Lots of others!! COVAREP - Open-source speech processing repository 44
45 COVAREP summary Repository of open-source speech processing algorithms Cross-unversity/country effort Fast access to newly developed state-of-the-art algorithms Improve visability and impact More reproducible research COVAREP - Open-source speech processing repository 45
46 ... and finally! COVAREP - Open-source speech processing repository 46
47 Thank you! Resources: Website: GitHub: Paper: Degottex, G., Kane, J., Drugman, T., Raitio, T., COVAREP - A collaborative voice analysis repository for speech technologies, Submitted to ICASSP 2014 COVAREP - Open-source speech processing repository 47
A Full-Band Adaptive Harmonic Representation of Speech
A Full-Band Adaptive Harmonic Representation of Speech Gilles Degottex and Yannis Stylianou {degottex,yannis}@csd.uoc.gr University of Crete - FORTH - Swiss National Science Foundation G. Degottex & Y.
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationBetween physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz
Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationSound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska
Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationLinguistic Phonetics. Spectral Analysis
24.963 Linguistic Phonetics Spectral Analysis 4 4 Frequency (Hz) 1 Reading for next week: Liljencrants & Lindblom 1972. Assignment: Lip-rounding assignment, due 1/15. 2 Spectral analysis techniques There
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationGlottal source model selection for stationary singing-voice by low-band envelope matching
Glottal source model selection for stationary singing-voice by low-band envelope matching Fernando Villavicencio Yamaha Corporation, Corporate Research & Development Center, 3 Matsunokijima, Iwata, Shizuoka,
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationSpeech Signal Analysis
Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for
More informationDetecting Speech Polarity with High-Order Statistics
Detecting Speech Polarity with High-Order Statistics Thomas Drugman, Thierry Dutoit TCTS Lab, University of Mons, Belgium Abstract. Inverting the speech polarity, which is dependent upon the recording
More informationLearning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives
Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives Mathew Magimai Doss Collaborators: Vinayak Abrol, Selen Hande Kabil, Hannah Muckenhirn, Dimitri
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationIsolated Digit Recognition Using MFCC AND DTW
MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics
More informationOn the glottal flow derivative waveform and its properties
COMPUTER SCIENCE DEPARTMENT UNIVERSITY OF CRETE On the glottal flow derivative waveform and its properties A time/frequency study George P. Kafentzis Bachelor s Dissertation 29/2/2008 Supervisor: Yannis
More informationL19: Prosodic modification of speech
L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture
More informationSlovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova
Slovak University of Technology and Planned Research in Voice De-Identification Anna Pribilova SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA the oldest and the largest university of technology in Slovakia
More informationEpoch Extraction From Emotional Speech
Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract
More informationCS 188: Artificial Intelligence Spring Speech in an Hour
CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch
More informationQuantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationAalto Aparat A Freely Available Tool for Glottal Inverse Filtering and Voice Source Parameterization
[LOGO] Aalto Aparat A Freely Available Tool for Glottal Inverse Filtering and Voice Source Parameterization Paavo Alku, Hilla Pohjalainen, Manu Airaksinen Aalto University, Department of Signal Processing
More informationIMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey
Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationUniversity of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis
More informationDetermination of instants of significant excitation in speech using Hilbert envelope and group delay function
Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,
More informationDigital Speech Processing and Coding
ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/
More informationConverting Speaking Voice into Singing Voice
Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech
More informationMel- frequency cepstral coefficients (MFCCs) and gammatone filter banks
SGN- 14006 Audio and Speech Processing Pasi PerQlä SGN- 14006 2015 Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks Slides for this lecture are based on those created by Katariina
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationEpoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE
1602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 8, NOVEMBER 2008 Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE Abstract
More informationComplex Sounds. Reading: Yost Ch. 4
Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationCepstrum alanysis of speech signals
Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP
More informationGLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES
Clemson University TigerPrints All Dissertations Dissertations 5-2012 GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES Yiqiao Chen Clemson University, rls_lms@yahoo.com
More informationRobust Algorithms For Speech Reconstruction On Mobile Devices
Robust Algorithms For Speech Reconstruction On Mobile Devices XU SHAO A Thesis presented for the degree of Doctor of Philosophy Speech Group School of Computing Sciences University of East Anglia England
More informationSpeech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065
Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);
More informationAspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied
More informationSOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1 Glottal Spectral Separation for Speech Synthesis João P. Cabral, Korin Richmond, Member, IEEE, Junichi Yamagishi, Member, IEEE, and Steve Renals,
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationWaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8
WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels See Rogers chapter 7 8 Allows us to see Waveform Spectrogram (color or gray) Spectral section short-time spectrum = spectrum of a brief
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationSub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech
Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech Vikram Ramesh Lakkavalli, K V Vijay Girish, A G Ramakrishnan Medical Intelligence and Language Engineering (MILE) Laboratory
More informationSPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT
SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationSPEECH AND SPECTRAL ANALYSIS
SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs
More informationAdaptive Filters Application of Linear Prediction
Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing
More informationPerformance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationVOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL
VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL Narsimh Kamath Vishweshwara Rao Preeti Rao NIT Karnataka EE Dept, IIT-Bombay EE Dept, IIT-Bombay narsimh@gmail.com vishu@ee.iitb.ac.in
More informationSPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction
SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction by Xi Li A thesis submitted to the Faculty of Graduate School, Marquette University, in Partial Fulfillment of the Requirements
More informationParameterization of the glottal source with the phase plane plot
INTERSPEECH 2014 Parameterization of the glottal source with the phase plane plot Manu Airaksinen, Paavo Alku Department of Signal Processing and Acoustics, Aalto University, Finland manu.airaksinen@aalto.fi,
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationReading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.
L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are
More informationGlottal inverse filtering based on quadratic programming
INTERSPEECH 25 Glottal inverse filtering based on quadratic programming Manu Airaksinen, Tom Bäckström 2, Paavo Alku Department of Signal Processing and Acoustics, Aalto University, Finland 2 International
More informationInternational Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015
RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,
More informationEdinburgh Research Explorer
Edinburgh Research Explorer Voice source modelling using deep neural networks for statistical parametric speech synthesis Citation for published version: Raitio, T, Lu, H, Kane, J, Suni, A, Vainio, M,
More informationAudio Signal Compression using DCT and LPC Techniques
Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,
More informationA Comparative Study of Formant Frequencies Estimation Techniques
A Comparative Study of Formant Frequencies Estimation Techniques DORRA GARGOURI, Med ALI KAMMOUN and AHMED BEN HAMIDA Unité de traitement de l information et électronique médicale, ENIS University of Sfax
More informationTools for Advanced Sound & Vibration Analysis
Tools for Advanced Sound & Vibration Ravichandran Raghavan Technical Marketing Engineer Agenda NI Sound and Vibration Measurement Suite Advanced Signal Processing Algorithms Time- Quefrency and Cepstrum
More informationRelative phase information for detecting human speech and spoofed speech
Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University
More informationLecture 6: Speech modeling and synthesis
EE E682: Speech & Audio Processing & Recognition Lecture 6: Speech modeling and synthesis 1 2 3 4 5 Modeling speech signals Spectral and cepstral models Linear Predictive models (LPC) Other signal models
More informationStructure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping
Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics
More informationDiscrete Fourier Transform (DFT)
Amplitude Amplitude Discrete Fourier Transform (DFT) DFT transforms the time domain signal samples to the frequency domain components. DFT Signal Spectrum Time Frequency DFT is often used to do frequency
More informationLecture 5: Speech modeling. The speech signal
EE E68: Speech & Audio Processing & Recognition Lecture 5: Speech modeling 1 3 4 5 Modeling speech signals Spectral and cepstral models Linear Predictive models (LPC) Other signal models Speech synthesis
More informationTE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION
TE 302 DISCRETE SIGNALS AND SYSTEMS Study on the behavior and processing of information bearing functions as they are currently used in human communication and the systems involved. Chapter 1: INTRODUCTION
More informationOverview of Code Excited Linear Predictive Coder
Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances
More informationPattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt
Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationSPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION
M.Tech. Credit Seminar Report, Electronic Systems Group, EE Dept, IIT Bombay, submitted November 04 SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION G. Gidda Reddy (Roll no. 04307046)
More informationHungarian Speech Synthesis Using a Phase Exact HNM Approach
Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationSynthesis Algorithms and Validation
Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided
More informationE : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21
E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1
More informationNovel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices
Novel Temporal and Spectral Features Derived from TEO for Classification of Normal and Dysphonic Voices Hemant A.Patil 1, Pallavi N. Baljekar T. K. Basu 3 1 Dhirubhai Ambani Institute of Information and
More informationMFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM
www.advancejournals.org Open Access Scientific Publisher MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM ABSTRACT- P. Santhiya 1, T. Jayasankar 1 1 AUT (BIT campus), Tiruchirappalli, India
More informationPitch Period of Speech Signals Preface, Determination and Transformation
Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com
More informationNon-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment
Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,
More informationNCCF ACF. cepstrum coef. error signal > samples
ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based
More informationON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1
ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN SPEECH SIGNALS Zied Mnasri 1, Hamid Amiri 1 1 Electrical engineering dept, National School of Engineering in Tunis, University Tunis El
More informationAutonomous Vehicle Speaker Verification System
Autonomous Vehicle Speaker Verification System Functional Requirements List and Performance Specifications Aaron Pfalzgraf Christopher Sullivan Project Advisor: Dr. Jose Sanchez 4 November 2013 AVSVS 2
More informationSINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor
More informationEffects of Reverberation on Pitch, Onset/Offset, and Binaural Cues
Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation
More informationINTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN
More informationVoiced/nonvoiced detection based on robustness of voiced epochs
Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies
More informationAudio processing methods on marine mammal vocalizations
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio http://labrosa.ee.columbia.edu Sound to Signal sound is pressure
More informationLearning Human Context through Unobtrusive Methods
Learning Human Context through Unobtrusive Methods WINLAB, Rutgers University We care about our contexts Glasses Meeting Vigo: your first energy meter Watch Necklace Wristband Fitbit: Get Fit, Sleep Better,
More informationVocal effort modification for singing synthesis
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Vocal effort modification for singing synthesis Olivier Perrotin, Christophe d Alessandro LIMSI, CNRS, Université Paris-Saclay, France olivier.perrotin@limsi.fr
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationElectric Guitar Pickups Recognition
Electric Guitar Pickups Recognition Warren Jonhow Lee warrenjo@stanford.edu Yi-Chun Chen yichunc@stanford.edu Abstract Electric guitar pickups convert vibration of strings to eletric signals and thus direcly
More informationVocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA
Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA ECE-492/3 Senior Design Project Spring 2015 Electrical and Computer Engineering Department Volgenau
More informationLinear Predictive Coding *
OpenStax-CNX module: m45345 1 Linear Predictive Coding * Kiefer Forseth This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 1 LPC Implementation Linear
More informationAnnouncements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.
Introduction to Artificial Intelligence Announcements V22.0472-001 Fall 2009 Lecture 19: Speech Recognition & Viterbi Decoding Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides from John
More informationThe GlottHMM Entry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved Excitation Generation
The GlottHMM ntry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved xcitation Generation Antti Suni 1, Tuomo Raitio 2, Martti Vainio 1, Paavo Alku
More informationUSING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM
USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM by Brandon R. Graham A report submitted in partial fulfillment of the requirements for
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationINTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006
1. Resonators and Filters INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006 Different vibrating objects are tuned to specific frequencies; these frequencies at which a particular
More informationSINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015
1 SINUSOIDAL MODELING EE6641 Analysis and Synthesis of Audio Signals Yi-Wen Liu Nov 3, 2015 2 Last time: Spectral Estimation Resolution Scenario: multiple peaks in the spectrum Choice of window type and
More information