E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21
Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1 n 2 n Transformation n Synthesis Separate Source/excitation fine time/frequency structure (e.g. pitch) Filter broad spectral shape (resonances) Similar to subtractive synthesis Satisfying physical interpretation for real-world signals Easier to make sense of than e.g. phase n E85.267: Lecture 8 Source-Filter Processing 21-4-1 2 / 21
Human speech production Reasonable approximation to speech signals: Source is oscillation of vocal chords e.g. normal speech (varying pitches) vs whispering Filtered by vocal tract (throat + tongue + lips) e.g. oooh vs aaah resonances = formants Both are time-varying E85.267: Lecture 8 Source-Filter Processing 21-4-1 3 / 21
Source filter model Excitation source Resonance filter t t f x 1 3 time signal of pred. error e(n) magnitude spectra X(f) and G" H(f) in db 1 5 5 2 4 6 8 1 2 4 6 8 1 n! 2 4 6 8 f/khz! E85.267: Lecture 8 Source-Filter Processing 21-4-1 4 / 21
Formants in speech h ε z has e a w t cl c ^ θ I n I watch thin as a dime z I d a y m E85.267: Lecture 8 Source-Filter Processing 21-4-1 5 / 21
How to separate the source and filter? Source Signal x(n) 1 H 1(z) e (n) 1 Source Signal Processing H (z) 2 y(n) Chan. Voc. LPC Cepstrum Spectral Envelope Estimation Spectral Envelope Transformation Short-time analysis For each frame, estimate spectral envelope (filter response) 1 Channel vocoder (frequency-domain) 2 Linear Predictive Coding (LPC) (time-domain) 3 Cepstral analysis Source signal is whats left over (residual) after whitening E85.267: Lecture 8 Source-Filter Processing 21-4-1 6 / 21
Channel vocoder (a) BP 1 2 x BP1 (n) ( ) 2 LP x (n) RMS1 Wideband STFT filterbank but using relatively few filters Linearly spaced with equal bandwidth (STFT) Logarithmically spaced (constant-q filter bank) Take RMS energy in each frequency band x(n) (b) BP 1 2 x BP2 (n) BP 2 ( ) 2 LP x (n) RMS2 2 x BPk (n) BP k ( ) 2 LP x (n) RMSk Octave-spaced channel stacking BP 2 BP k Equally-spaced channel stacking f BP 1 BP 2 BP k f E85.267: Lecture 8 Source-Filter Processing 21-4-1 7 / 21
Channel vocoder using FFT Short time spectrum and spectral envelope X(f)/dB 2 4 6 8 1 1 2 3 4 5 6 7 8 f/hz! Lowpass filter magnitude of each STFT frame i.e. filter columns of the spectrogram E85.267: Lecture 8 Source-Filter Processing 21-4-1 8 / 21
Linear predictive coding Predict next input sample as linear combination of previous samples _ e(n) x(n) z -1 z -1 z -1 a 1 a 2 a p Filter is described by a few filter coefficients for each frame p x m [n] ˆx[n] = a k x[n k] k=1 Excitation is whats left after filtering (residual aka prediction error) p e[n] = x[n] ˆx[n] = x[n] a k x[n k] k=1 ^x(n) E85.267: Lecture 8 Source-Filter Processing 21-4-1 9 / 21
LPC analysis/synthesis x(n) e(n) ~ e(n) y(n) _ P(z) ^x(n) P(z) (a) (a) LPC analysis (b) (b) LPC synthesis P(z) is just an FIR filter: P(z) = p k=1 a kz k Excitation is still a filtered version of the input: E(x) = X (z) (1 P(z)) For synthesis, pass (approximate) excitation through the inverse filter: Y (z) = Ẽ(z)H(z) 1 H(z) = 1 P(z) all-pole autoregressive (AR) modeling E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21
LPC - varying filter order LPC filter H(z) models the spectrum of x[n] Minimizing the energy of the residual e[n] gives optimal coefficients ( {a k } = argmin x[n] ) 2 a k x[m k] a k n k The approximation improves with increasing filter order p 1 X(f) /db spectra of original and LPC filters 5 5 p=1 p=2 p=4 p=6 p=8 p=12 1 2 4 6 8 f/khz! E85.267: Lecture 8 Source-Filter Processing 21-4-1 11 / 21
Estimating LPC parameters Set derivative of n e2 [n] w.r.t. a k zero and solve for a k : e 2 [n] = a k End up with p linear equations involving autocorrelations of x: x[m]x[m k] = a k x[m i]x[m k] m i m Solve using Levinson-Durbin recursion n E85.267: Lecture 8 Source-Filter Processing 21-4-1 12 / 21
LPC example.1 -.1 windowed original -.2 LPC residual -.3 5 1 15 2 25 3 35 4 db original spectrum LPC spectrum -2 time / samp -4 residual spectrum -6 1 2 3 4 5 6 7 freq / Hz Filter poles z-plane E85.267: Lecture 8 Source-Filter Processing 21-4-1 13 / 21
Short-Time LP Analysis Short-time LPC analysis Solve LPC for each ~2 ms frame freq / khz 8 6 4 2 1 2 15 Imaginary Part.5 -.5 12 1 5-5 freq / khz 8 6 4-1 -1 1 Real Part -1-15.2.4.6.8 1 2.5 1 1.5 2 2.5 3 E85.267: Lecture 8 Source-Filter Processing 21-4-1 time / s 14 / 21
Cepstral analysis cepstrum = String.reverse( spec ) + trum Entire lexicon of funny anagrams Insight: source and filter add in the log spectral domain Makes them easy to separate X (z) = E(z)H(z) log X (z) = log E(z) + log H(z) Real Cepstrum Spectral Envelope y(n)=x(n) * h(n) FFT Y(k) log Y(k) Y^ (k) R IFFT c(n) c (n) h FFT C h(k)= log H(k) w(n) w LP(n) Source Envelope c (n) x FFT C x(k)= log X(k) w HP(n) E85.267: Lecture 8 Source-Filter Processing 21-4-1 15 / 21
Liftering example By low-pass liftering the cepstrum we obtain the spectral envelope of the signal E85.267: Lecture 8 Source-Filter Processing 21-4-1 16 / 21
Liftering example 2 Original waveform has excitation fine structure convolved with resonances DFT shows harmonics modulated by resonances Log DFT is sum of harmonic comb and resonant bumps IDFT separates out resonant bumps (low quefrency) and regular, fine structure ( pitch pulse ) Selecting low-n cepstrum separates resonance information (deconvolution / liftering ).2 Waveform and min. phase IR -.2 1 2 3 4 abs(dft) and liftered 2 1 1 2 3 log(abs(dft)) and liftered db -2-4 1 2 3 2 real cepstrum and lifter 1 1 2 samps freq / Hz freq / Hz pitch pulse quefrency E85.267: Lecture 8 Source-Filter Processing 21-4-1 17 / 21
prediction filter and residual Applications LP recombining analysis - Speech on them coding ~2ms should yield frames perfect gives s[n] prediction coding applications filter further A(z) and compress residual e[n] e[n] recombining Encoder Filter coefficients them {ai} should Decoder yield perfect s[n] 1 /A(e j! ) coding applications Represent further compress e[n] Input s[n] Input s[n] Encoder LPC f & encode analysis Filter coefficients {ai} Represent Residual & encode 1 /A(e e[n] ) t f LPC Represent & encode e[n] ^ Excitation Decoder generator All-pole filter H(z) = 1 1 - "a i z -i e.g. analysis simple pitch tracker! buzz-hiss encoding e[n] Low bitrate speech codec used Represent in cell phonesexcitation is based ^ All-pole on LPC Pitch period Residual values & encode16 ms frame boundaries generator filter e[n] Quantize LPC filter 1 parameters, use crude approximation to residual 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s E4896 Music Signal Processing Pitch period (Dan values Ellis) 16 ms frame boundaries 21-2-22-13/16 1 Output s[n] ^ t H(z) = 1 Many different 5 ways to represent filter params: 1 - "a i z -i -5 e.g. simple pitch tracker! buzz-hiss encoding Output s[n] ^ prediction coefficients {a k }, roots of 1 P(z), line spectral frequencies Switch between noise and pulse train for excitation 5-5 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s 896 Music Signal UseProcessing codebook(dan of excitations Ellis) (CELP: Code Excited Linear Prediction) 21-2-22 E85.267: Lecture 8 Source-Filter Processing 21-4-1 18 / 21
Applications - Cross-synthesis/Vocoding freq / Hz freq / Hz 4 3 2 1 4 3 2 1 Original (mpgr1_sx419) Noise-excited LPC resynthesis with pole freqs.2.4.6.8 1 1.2 1.4 time / s Reconstruct using excitation from one sound and filter from another Whisperization: replace excitation with white noise E85.267: Lecture 8 Source-Filter Processing 21-4-1 19 / 21
arps Still more frequencies applications but not magnitudes αz +1 8 Original Frequency 6 4 2 = -.6 6.8 ^ Frequency 8 6 4 2.5 1 1.5 2 2.5 3 Time Warped LPC resynth, = -.2.5 1 1.5 2 2.5 3 Time Process formants independent of pitch http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/ Pitch-shifting while preserving formants Processing (Dan Ellis) 21-2-22-14/16 Shift formants while preserving pitch http://www.ee.columbia.edu/ dpwe/resources/matlab/polewarp/ Voice transformation Pitch-analysis E85.267: Lecture 8 Source-Filter Processing 21-4-1 2 / 21
Reading DAFX 9.1 9.3 - Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 21 / 21