A Full-Band Adaptive Harmonic Representation of Speech

Similar documents
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH. George P. Kafentzis and Yannis Stylianou

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Decomposition of AM-FM Signals with Applications in Speech Processing

Enhanced Waveform Interpolative Coding at 4 kbps

HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Exploiting the Sparsity of the Sinusoidal Model Using Compressed Sensing for Audio Coding

ACCURATE SPEECH DECOMPOSITION INTO PERIODIC AND APERIODIC COMPONENTS BASED ON DISCRETE HARMONIC TRANSFORM

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Full-Band Quasi-Harmonic Analysis and Synthesis of Musical Instrument Sounds with Adaptive Sinusoids

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

ADAPTIVE NOISE LEVEL ESTIMATION

METHODS FOR SEPARATION OF AMPLITUDE AND FREQUENCY MODULATION IN FOURIER TRANSFORMED SIGNALS

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

On a Sturm Liouville Framework for Continuous and Discrete Frequency Modulation

RECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Modulation is the process of impressing a low-frequency information signal (baseband signal) onto a higher frequency carrier signal

Single-channel and Multi-channel Sinusoidal Audio Coding Using Compressed Sensing

Rec. ITU-R F RECOMMENDATION ITU-R F *,**

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche

Sinusoidal Modelling in Speech Synthesis, A Survey.

Frequency slope estimation and its application for non-stationary sinusoidal parameter estimation

8.3 Basic Parameters for Audio

AhoTransf: A tool for Multiband Excitation based speech analysis and modification

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER /$ IEEE

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis

Single-channel Mixture Decomposition using Bayesian Harmonic Models

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Adaptive noise level estimation

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

HILBERT SPECTRAL ANALYSIS OF VOWELS USING INTRINSIC MODE FUNCTIONS. Phillip L. De Leon

Sound Synthesis Methods

Phase estimation in speech enhancement unimportant, important, or impossible?

Special Session: Phase Importance in Speech Processing Applications

PVSOLA: A PHASE VOCODER WITH SYNCHRONIZED OVERLAP-ADD

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

Pitch and Harmonic to Noise Ratio Estimation

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION

ADDITIVE synthesis [1] is the original spectrum modeling

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound

Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016

Sinusoidal Modeling. summer 2006 lecture on analysis, modeling and transformation of audio signals

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA

AN ANALYSIS OF ITERATIVE ALGORITHM FOR ESTIMATION OF HARMONICS-TO-NOISE RATIO IN SPEECH

Data Transmission. ITS323: Introduction to Data Communications. Sirindhorn International Institute of Technology Thammasat University ITS323

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Audio Imputation Using the Non-negative Hidden Markov Model

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Spectrum. Additive Synthesis. Additive Synthesis Caveat. Music 270a: Modulation

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Fundamental frequency estimation of speech signals using MUSIC algorithm

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. Audio DSP basics. Paris Smaragdis. paris.cs.illinois.

Frequency slope estimation and its application for non-stationary sinusoidal parameter estimation

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

Change Point Determination in Audio Data Using Auditory Features

Lab10: FM Spectra and VCO

Lecture 6: Nonspeech and Music

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain {jordi.bonada,

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Book Chapters. Refereed Journal Publications J11

EE4512 Analog and Digital Communications Chapter 6. Chapter 6 Analog Modulation and Demodulation

MODAL ANALYSIS OF IMPACT SOUNDS WITH ESPRIT IN GABOR TRANSFORMS

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Complex Sounds. Reading: Yost Ch. 4

Synthesis Techniques. Juan P Bello

IMPROVED HIDDEN MARKOV MODEL PARTIAL TRACKING THROUGH TIME-FREQUENCY ANALYSIS

On the glottal flow derivative waveform and its properties

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

Audio Enhancement Using Remez Exchange Algorithm with DWT

L19: Prosodic modification of speech

A NOVEL VOICED SPEECH ENHANCEMENT APPROACH BASED ON MODULATED PERIODIC SIGNAL EXTRACTION. Mahdi Triki y, Dirk T.M. Slock Λ

Chapter 3 Data Transmission COSC 3213 Summer 2003

PERIODIC SIGNAL MODELING FOR THE OCTAVE PROBLEM IN MUSIC TRANSCRIPTION. Antony Schutz, Dirk Slock

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Outline. Communications Engineering 1

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING

Friedrich-Alexander Universität Erlangen-Nürnberg. Lab Course. Pitch Estimation. International Audio Laboratories Erlangen. Prof. Dr.-Ing.

Advanced audio analysis. Martin Gasser

NOISE ESTIMATION IN A SINGLE CHANNEL

The psychoacoustics of reverberation

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Two-Dimensional Wavelets with Complementary Filter Banks

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components

CS 591 S1 Midterm Exam

Transcription:

A Full-Band Adaptive Harmonic Representation of Speech Gilles Degottex and Yannis Stylianou {degottex,yannis}@csd.uoc.gr University of Crete - FORTH - Swiss National Science Foundation G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 1 / 11

The Sinusoidal and Harmonic Models Amplitude [db] 40 30 20 10 0 10 20 30 40 50 DFT Harmonics 60 0 500 1000 1500 2000 2500 3000 3500 4000 Can fit any monophonic signal, we use it for speech The sinusoids can be harmonic, quasi-harmonic, or adaptive... G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 2 / 11

Time-Frequency Representations DFT s(t) = K k=0 a k e jφ k (t) φ k (t) = k (2π/K) t Constant frequency basis 3500 3000 2500 2000 0.05 0.1 0.15 Time [s] FChT 1 s(t) = K k=0 a k e jφ k (t) φ k (t) = k (2π/K + α t) t Linear frequency basis 3500 3000 2500 2000 0.05 0.1 0.15 Time [s] 1 M. Kepesi and L. Weruaga, Adaptive Chirp-based time-frequency analysis of speech signals, Speech communication, 2006. G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 3 / 11

The Adaptive Quasi-Harmonic + Noise Model (aqhnm) 1 We can adapt the frequency basis to follow the frequency tracks Adaptive Quasi-Harmonic Model (aqhm) 1 φ k (t) = 2π f s t 0 f k(τ)dτ For speech representation in the high frequencies Amplitude modulated noise (aqhnm) 2 1 Y. Pantazis, O. Rosec and Y. Stylianou, Adaptive AM-FM Signal Decomposition With Application to Speech Analysis, IEEE Trans. on Audio, Speech, and Language Processing, 2010. 2 Y. Pantazis, G. Tzedakis, O. Rosec, Y. Stylianou, Analysis/Synthesis of Speech based on an Adaptive Quasi-Harmonic plus Noise Model, ICASSP, 2010. G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 4 / 11

The new ideas 1) From FChT, harmonics exist in high frequencies Use a full-band representation G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 5 / 11

The new ideas 1) From FChT, harmonics exist in high frequencies Use a full-band representation 2) Quasi-harmonicity can be useful for analysis but maybe not necessary for encoding/decoding Use the strict harmonicity and keep the adaptivity aqhnm ahm G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 5 / 11

The Adaptive Harmonic Model (ahm) ahm s(t) = K k= K φ k (t) = k 2π f s a k (t) e jφ k (t) t 0 f 0(τ)dτ a k (t) Amplitude and phase (complex-valued function) Interpolated from a t i k at time t i f 0 (t) Fundamental frequency Interpolated from f t i 0 at time t i Parameters at a time t i : {f t i 0, at i k } k {0,..., K i} G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 6 / 11

The problem of estimation for full-band representation A small f 0 error propagates by multiplication: f k = k f 0 Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 500 1000 1500 2000 2500 3000 3500 4000 Question How to estimate harmonics up to Nyquist? G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 7 / 11

The Adaptive Iterative Refinement (AIR) Assume first the f 0 error is small for low harmonics Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Then the frequency correction mechanism of QHM 1 can be used Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 1 Y. Pantazis, O. Rosec and Y. Stylianou, Iterative Estimation of Sinusoidal Signal Parameters, IEEE Signal Processing Letters, 2010. G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) We can therefore increase the harmonic level Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Correct the frequencies Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Increase the harmonic level Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Correct the frequencies Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Increase the harmonic level Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Correct the frequencies Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

Evaluation: Listening test Impairment 5 4 3 2 1 Total Male voices Female voices Original ahm AIR aqhnm SM 6 languages to represent voice variability Female and male voices for each language 12 sounds 20 listeners answered Conclusions + Perceived quality ahm-air is almost perfect Compared to SM: stable frequency tracks in ahm Compared to aqhnm: no noise model in ahm, also more stable G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 9 / 11

Conclusions Points to remember Adaptive Harmonic Model (ahm) Frequency tracks adapted to the f 0 curve Simple harmonicity G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 10 / 11

Conclusions Points to remember Adaptive Harmonic Model (ahm) Frequency tracks adapted to the f 0 curve Simple harmonicity Dedicated algorithm, Adaptive Iterative Refinement (AIR), to localize the harmonic structures in the high frequencies G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 10 / 11

Conclusions Points to remember Adaptive Harmonic Model (ahm) Frequency tracks adapted to the f 0 curve Simple harmonicity Dedicated algorithm, Adaptive Iterative Refinement (AIR), to localize the harmonic structures in the high frequencies Quasi-perfect perceived quality according to a listening test G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 10 / 11

Conclusions Points to remember Adaptive Harmonic Model (ahm) Frequency tracks adapted to the f 0 curve Simple harmonicity Dedicated algorithm, Adaptive Iterative Refinement (AIR), to localize the harmonic structures in the high frequencies Quasi-perfect perceived quality according to a listening test Less parameters than aqhnm and SM G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 10 / 11

Conclusions Points to remember Adaptive Harmonic Model (ahm) Frequency tracks adapted to the f 0 curve Simple harmonicity Dedicated algorithm, Adaptive Iterative Refinement (AIR), to localize the harmonic structures in the high frequencies Quasi-perfect perceived quality according to a listening test Less parameters than aqhnm and SM Future works Forthcoming paper with more evaluations, parameters accuracy, etc. The good resynthesis quality is promising before starting to build higher level models (e.g. spectral envelopes) G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 10 / 11

G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 11 / 11