A Full-Band Adaptive Harmonic Representation of Speech

A Full-Band Adaptive Harmonic Representation of Speech Gilles Degottex and Yannis Stylianou {degottex,yannis}@csd.uoc.gr University of Crete - FORTH - Swiss National Science Foundation G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 1 / 11

The Sinusoidal and Harmonic Models Amplitude [db] 40 30 20 10 0 10 20 30 40 50 DFT Harmonics 60 0 500 1000 1500 2000 2500 3000 3500 4000 Can fit any monophonic signal, we use it for speech The sinusoids can be harmonic, quasi-harmonic, or adaptive... G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 2 / 11

Time-Frequency Representations DFT s(t) = K k=0 a k e jφ k (t) φ k (t) = k (2π/K) t Constant frequency basis 3500 3000 2500 2000 0.05 0.1 0.15 Time [s] FChT 1 s(t) = K k=0 a k e jφ k (t) φ k (t) = k (2π/K + α t) t Linear frequency basis 3500 3000 2500 2000 0.05 0.1 0.15 Time [s] 1 M. Kepesi and L. Weruaga, Adaptive Chirp-based time-frequency analysis of speech signals, Speech communication, 2006. G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 3 / 11

The Adaptive Quasi-Harmonic + Noise Model (aqhnm) 1 We can adapt the frequency basis to follow the frequency tracks Adaptive Quasi-Harmonic Model (aqhm) 1 φ k (t) = 2π f s t 0 f k(τ)dτ For speech representation in the high frequencies Amplitude modulated noise (aqhnm) 2 1 Y. Pantazis, O. Rosec and Y. Stylianou, Adaptive AM-FM Signal Decomposition With Application to Speech Analysis, IEEE Trans. on Audio, Speech, and Language Processing, 2010. 2 Y. Pantazis, G. Tzedakis, O. Rosec, Y. Stylianou, Analysis/Synthesis of Speech based on an Adaptive Quasi-Harmonic plus Noise Model, ICASSP, 2010. G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 4 / 11

The new ideas 1) From FChT, harmonics exist in high frequencies Use a full-band representation G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 5 / 11

The new ideas 1) From FChT, harmonics exist in high frequencies Use a full-band representation 2) Quasi-harmonicity can be useful for analysis but maybe not necessary for encoding/decoding Use the strict harmonicity and keep the adaptivity aqhnm ahm G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 5 / 11

The Adaptive Harmonic Model (ahm) ahm s(t) = K k= K φ k (t) = k 2π f s a k (t) e jφ k (t) t 0 f 0(τ)dτ a k (t) Amplitude and phase (complex-valued function) Interpolated from a t i k at time t i f 0 (t) Fundamental frequency Interpolated from f t i 0 at time t i Parameters at a time t i : {f t i 0, at i k } k {0,..., K i} G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 6 / 11

The problem of estimation for full-band representation A small f 0 error propagates by multiplication: f k = k f 0 Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 500 1000 1500 2000 2500 3000 3500 4000 Question How to estimate harmonics up to Nyquist? G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 7 / 11

The Adaptive Iterative Refinement (AIR) Assume first the f 0 error is small for low harmonics Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Then the frequency correction mechanism of QHM 1 can be used Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 1 Y. Pantazis, O. Rosec and Y. Stylianou, Iterative Estimation of Sinusoidal Signal Parameters, IEEE Signal Processing Letters, 2010. G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) We can therefore increase the harmonic level Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Correct the frequencies Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

The Adaptive Iterative Refinement (AIR) Increase the harmonic level Amplitude [db] 20 30 40 50 60 70 80 90 100 110 120 0 200 400 600 800 1000 1200 1400 1600 G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 8 / 11

Evaluation: Listening test Impairment 5 4 3 2 1 Total Male voices Female voices Original ahm AIR aqhnm SM 6 languages to represent voice variability Female and male voices for each language 12 sounds 20 listeners answered Conclusions + Perceived quality ahm-air is almost perfect Compared to SM: stable frequency tracks in ahm Compared to aqhnm: no noise model in ahm, also more stable G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 9 / 11

Conclusions Points to remember Adaptive Harmonic Model (ahm) Frequency tracks adapted to the f 0 curve Simple harmonicity G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 10 / 11

Conclusions Points to remember Adaptive Harmonic Model (ahm) Frequency tracks adapted to the f 0 curve Simple harmonicity Dedicated algorithm, Adaptive Iterative Refinement (AIR), to localize the harmonic structures in the high frequencies G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 10 / 11

Conclusions Points to remember Adaptive Harmonic Model (ahm) Frequency tracks adapted to the f 0 curve Simple harmonicity Dedicated algorithm, Adaptive Iterative Refinement (AIR), to localize the harmonic structures in the high frequencies Quasi-perfect perceived quality according to a listening test Less parameters than aqhnm and SM Future works Forthcoming paper with more evaluations, parameters accuracy, etc. The good resynthesis quality is promising before starting to build higher level models (e.g. spectral envelopes) G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 10 / 11

G. Degottex & Y. Stylianou (UOC/FORTH/SNSF) A Full-Band Adaptive HM of Speech September the 10th 2012 11 / 11