E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Size: px

Start display at page:

Download "E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21"

Frederick Crawford
5 years ago
Views:

1 E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing / 21

2 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1 n 2 n Transformation n Synthesis Separate Source/excitation fine time/frequency structure (e.g. pitch) Filter broad spectral shape (resonances) Similar to subtractive synthesis Satisfying physical interpretation for real-world signals Easier to make sense of than e.g. phase n E85.267: Lecture 8 Source-Filter Processing / 21

3 Human speech production Reasonable approximation to speech signals: Source is oscillation of vocal chords e.g. normal speech (varying pitches) vs whispering Filtered by vocal tract (throat + tongue + lips) e.g. oooh vs aaah resonances = formants Both are time-varying E85.267: Lecture 8 Source-Filter Processing / 21

4 Source filter model Excitation source Resonance filter t t f x 1 3 time signal of pred. error e(n) magnitude spectra X(f) and G" H(f) in db n! f/khz! E85.267: Lecture 8 Source-Filter Processing / 21

5 Formants in speech h ε z has e a w t cl c ^ θ I n I watch thin as a dime z I d a y m E85.267: Lecture 8 Source-Filter Processing / 21

6 How to separate the source and filter? Source Signal x(n) 1 H 1(z) e (n) 1 Source Signal Processing H (z) 2 y(n) Chan. Voc. LPC Cepstrum Spectral Envelope Estimation Spectral Envelope Transformation Short-time analysis For each frame, estimate spectral envelope (filter response) 1 Channel vocoder (frequency-domain) 2 Linear Predictive Coding (LPC) (time-domain) 3 Cepstral analysis Source signal is whats left over (residual) after whitening E85.267: Lecture 8 Source-Filter Processing / 21

7 Channel vocoder (a) BP 1 2 x BP1 (n) ( ) 2 LP x (n) RMS1 Wideband STFT filterbank but using relatively few filters Linearly spaced with equal bandwidth (STFT) Logarithmically spaced (constant-q filter bank) Take RMS energy in each frequency band x(n) (b) BP 1 2 x BP2 (n) BP 2 ( ) 2 LP x (n) RMS2 2 x BPk (n) BP k ( ) 2 LP x (n) RMSk Octave-spaced channel stacking BP 2 BP k Equally-spaced channel stacking f BP 1 BP 2 BP k f E85.267: Lecture 8 Source-Filter Processing / 21

8 Channel vocoder using FFT Short time spectrum and spectral envelope X(f)/dB f/hz! Lowpass filter magnitude of each STFT frame i.e. filter columns of the spectrogram E85.267: Lecture 8 Source-Filter Processing / 21

9 Linear predictive coding Predict next input sample as linear combination of previous samples _ e(n) x(n) z -1 z -1 z -1 a 1 a 2 a p Filter is described by a few filter coefficients for each frame p x m [n] ˆx[n] = a k x[n k] k=1 Excitation is whats left after filtering (residual aka prediction error) p e[n] = x[n] ˆx[n] = x[n] a k x[n k] k=1 ^x(n) E85.267: Lecture 8 Source-Filter Processing / 21

10 LPC analysis/synthesis x(n) e(n) ~ e(n) y(n) _ P(z) ^x(n) P(z) (a) (a) LPC analysis (b) (b) LPC synthesis P(z) is just an FIR filter: P(z) = p k=1 a kz k Excitation is still a filtered version of the input: E(x) = X (z) (1 P(z)) For synthesis, pass (approximate) excitation through the inverse filter: Y (z) = Ẽ(z)H(z) 1 H(z) = 1 P(z) all-pole autoregressive (AR) modeling E85.267: Lecture 8 Source-Filter Processing / 21

11 LPC - varying filter order LPC filter H(z) models the spectrum of x[n] Minimizing the energy of the residual e[n] gives optimal coefficients ( {a k } = argmin x[n] ) 2 a k x[m k] a k n k The approximation improves with increasing filter order p 1 X(f) /db spectra of original and LPC filters 5 5 p=1 p=2 p=4 p=6 p=8 p= f/khz! E85.267: Lecture 8 Source-Filter Processing / 21

12 Estimating LPC parameters Set derivative of n e2 [n] w.r.t. a k zero and solve for a k : e 2 [n] = a k End up with p linear equations involving autocorrelations of x: x[m]x[m k] = a k x[m i]x[m k] m i m Solve using Levinson-Durbin recursion n E85.267: Lecture 8 Source-Filter Processing / 21

13 LPC example windowed original -.2 LPC residual db original spectrum LPC spectrum -2 time / samp -4 residual spectrum freq / Hz Filter poles z-plane E85.267: Lecture 8 Source-Filter Processing / 21

5 12 1 5-5 freq / khz 8 6 4-1 -1 1 Real Part -1-15.2.4.6.8 1 2.

14 Short-Time LP Analysis Short-time LPC analysis Solve LPC for each ~2 ms frame freq / khz Imaginary Part freq / khz Real Part E85.267: Lecture 8 Source-Filter Processing time / s 14 / 21

15 Cepstral analysis cepstrum = String.reverse( spec ) + trum Entire lexicon of funny anagrams Insight: source and filter add in the log spectral domain Makes them easy to separate X (z) = E(z)H(z) log X (z) = log E(z) + log H(z) Real Cepstrum Spectral Envelope y(n)=x(n) * h(n) FFT Y(k) log Y(k) Y^ (k) R IFFT c(n) c (n) h FFT C h(k)= log H(k) w(n) w LP(n) Source Envelope c (n) x FFT C x(k)= log X(k) w HP(n) E85.267: Lecture 8 Source-Filter Processing / 21

16 Liftering example By low-pass liftering the cepstrum we obtain the spectral envelope of the signal E85.267: Lecture 8 Source-Filter Processing / 21

17 Liftering example 2 Original waveform has excitation fine structure convolved with resonances DFT shows harmonics modulated by resonances Log DFT is sum of harmonic comb and resonant bumps IDFT separates out resonant bumps (low quefrency) and regular, fine structure ( pitch pulse ) Selecting low-n cepstrum separates resonance information (deconvolution / liftering ).2 Waveform and min. phase IR abs(dft) and liftered log(abs(dft)) and liftered db real cepstrum and lifter samps freq / Hz freq / Hz pitch pulse quefrency E85.267: Lecture 8 Source-Filter Processing / 21

18 prediction filter and residual Applications LP recombining analysis - Speech on them coding ~2ms should yield frames perfect gives s[n] prediction coding applications filter further A(z) and compress residual e[n] e[n] recombining Encoder Filter coefficients them {ai} should Decoder yield perfect s[n] 1 /A(e j! ) coding applications Represent further compress e[n] Input s[n] Input s[n] Encoder LPC f & encode analysis Filter coefficients {ai} Represent Residual & encode 1 /A(e e[n] ) t f LPC Represent & encode e[n] ^ Excitation Decoder generator All-pole filter H(z) = "a i z -i e.g. analysis simple pitch tracker! buzz-hiss encoding e[n] Low bitrate speech codec used Represent in cell phonesexcitation is based ^ All-pole on LPC Pitch period Residual values & encode16 ms frame boundaries generator filter e[n] Quantize LPC filter 1 parameters, use crude approximation to residual time / s E4896 Music Signal Processing Pitch period (Dan values Ellis) 16 ms frame boundaries /16 1 Output s[n] ^ t H(z) = 1 Many different 5 ways to represent filter params: 1 - "a i z -i -5 e.g. simple pitch tracker! buzz-hiss encoding Output s[n] ^ prediction coefficients {a k }, roots of 1 P(z), line spectral frequencies Switch between noise and pulse train for excitation time / s 896 Music Signal UseProcessing codebook(dan of excitations Ellis) (CELP: Code Excited Linear Prediction) E85.267: Lecture 8 Source-Filter Processing / 21

19 Applications - Cross-synthesis/Vocoding freq / Hz freq / Hz Original (mpgr1_sx419) Noise-excited LPC resynthesis with pole freqs time / s Reconstruct using excitation from one sound and filter from another Whisperization: replace excitation with white noise E85.267: Lecture 8 Source-Filter Processing / 21

arps Still more frequencies applications but not magnitudes αz +1 8 Original Frequency 6 4 2 = -.6 6.8 ^ Frequency 8 6 4 2.5 1 1.5 2 2.

edu/~dpwe/resources/matlab/polewarp/ Pitch-shifting while preserving formants Processing (Dan Ellis) 21-2-22-14/16 Shift formants while

20 arps Still more frequencies applications but not magnitudes αz +1 8 Original Frequency = ^ Frequency Time Warped LPC resynth, = Time Process formants independent of pitch Pitch-shifting while preserving formants Processing (Dan Ellis) /16 Shift formants while preserving pitch dpwe/resources/matlab/polewarp/ Voice transformation Pitch-analysis E85.267: Lecture 8 Source-Filter Processing / 21

21 Reading DAFX Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing / 21

Lecture 6: Speech modeling and synthesis

EE E682: Speech & Audio Processing & Recognition Lecture 6: Speech modeling and synthesis 1 2 3 4 5 Modeling speech signals Spectral and cepstral models Linear Predictive models (LPC) Other signal models