E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Similar documents
Lecture 6: Speech modeling and synthesis

Lecture 5: Speech modeling. The speech signal

EE482: Digital Signal Processing Applications

Advanced audio analysis. Martin Gasser

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Lecture 5: Speech modeling

Overview of Code Excited Linear Predictive Coder

Speech Signal Analysis

Speech Synthesis; Pitch Detection and Vocoders

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Speech Synthesis using Mel-Cepstral Coefficient Feature

Cepstrum alanysis of speech signals

Speech Compression Using Voice Excited Linear Predictive Coding

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Linguistic Phonetics. Spectral Analysis

The Channel Vocoder (analyzer):

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Lecture 5: Sinusoidal Modeling

Digital Speech Processing and Coding

Module 9: Multirate Digital Signal Processing Prof. Eliathamby Ambikairajah Dr. Tharmarajah Thiruvaran School of Electrical Engineering &

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Audio Signal Compression using DCT and LPC Techniques

Digital Signal Processing

Chapter 7. Frequency-Domain Representations 语音信号的频域表征

Resonator Factoring. Julius Smith and Nelson Lee

EE482: Digital Signal Processing Applications

L19: Prosodic modification of speech

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope

Robust Algorithms For Speech Reconstruction On Mobile Devices

Lecture 6: Nonspeech and Music

Lecture 6: Nonspeech and Music. Music & nonspeech

Rotating Machinery Fault Diagnosis Techniques Envelope and Cepstrum Analyses

APPLICATIONS OF DSP OBJECTIVES

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Comparison of CELP speech coder with a wavelet method

Sound Synthesis Methods

Analysis/synthesis coding

Applications of Music Processing

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Adaptive Filters Application of Linear Prediction

B.Tech III Year II Semester (R13) Regular & Supplementary Examinations May/June 2017 DIGITAL SIGNAL PROCESSING (Common to ECE and EIE)

Page 0 of 23. MELP Vocoder

Analysis and Synthesis of Pathological Vowels

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Chapter 9. Chapter 9 275

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Signal processing preliminaries

PR No. 119 DIGITAL SIGNAL PROCESSING XVIII. Academic Research Staff. Prof. Alan V. Oppenheim Prof. James H. McClellan.

Synthesis Techniques. Juan P Bello

Communications Theory and Engineering

CS 188: Artificial Intelligence Spring Speech in an Hour

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

ENEE408G Multimedia Signal Processing

Speech Coding using Linear Prediction

FX Basics. Filtering STOMPBOX DESIGN WORKSHOP. Esteban Maestre. CCRMA - Stanford University August 2013

Discrete Fourier Transform (DFT)

FFT analysis in practice

DSP Based Corrections of Analog Components in Digital Receivers

Enhanced Waveform Interpolative Coding at 4 kbps

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Speech Production. Automatic Speech Recognition handout (1) Jan - Mar 2009 Revision : 1.1. Speech Communication. Spectrogram. Waveform.

Voice Excited Lpc for Speech Compression by V/Uv Classification

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

Lecture 6: Nonspeech and Music

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Audio processing methods on marine mammal vocalizations

Converting Speaking Voice into Singing Voice

Lecture 9: Time & Pitch Scaling

Fundamental Frequency Detection

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

A Comparative Study of Formant Frequencies Estimation Techniques

-voiced. +voiced. /z/ /s/ Last Lecture. Digital Speech Processing. Overview of Speech Processing. Example on Sound Source Feature

Discrete Fourier Transform, DFT Input: N time samples

EECS 452 Midterm Exam Winter 2012

Linear Predictive Coding *

High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch

SPEECH AND SPECTRAL ANALYSIS

Adaptive Filters Linear Prediction

Vowel Enhancement in Early Stage Spanish Esophageal Speech Using Natural Glottal Flow Pulse and Vocal Tract Frequency Warping

Final Exam Practice Questions for Music 421, with Solutions

Subtractive Synthesis & Formant Synthesis

Copyright S. K. Mitra

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

ADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

Chapter 4 SPEECH ENHANCEMENT

Digital Signal Processing

Evaluation of MELP Quality and Principles Marcus Ek Lars Pääjärvi Martin Sehlstedt Lule_a Technical University in cooperation with Ericsson Erisoft AB

Transcription:

E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21

Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1 n 2 n Transformation n Synthesis Separate Source/excitation fine time/frequency structure (e.g. pitch) Filter broad spectral shape (resonances) Similar to subtractive synthesis Satisfying physical interpretation for real-world signals Easier to make sense of than e.g. phase n E85.267: Lecture 8 Source-Filter Processing 21-4-1 2 / 21

Human speech production Reasonable approximation to speech signals: Source is oscillation of vocal chords e.g. normal speech (varying pitches) vs whispering Filtered by vocal tract (throat + tongue + lips) e.g. oooh vs aaah resonances = formants Both are time-varying E85.267: Lecture 8 Source-Filter Processing 21-4-1 3 / 21

Source filter model Excitation source Resonance filter t t f x 1 3 time signal of pred. error e(n) magnitude spectra X(f) and G" H(f) in db 1 5 5 2 4 6 8 1 2 4 6 8 1 n! 2 4 6 8 f/khz! E85.267: Lecture 8 Source-Filter Processing 21-4-1 4 / 21

Formants in speech h ε z has e a w t cl c ^ θ I n I watch thin as a dime z I d a y m E85.267: Lecture 8 Source-Filter Processing 21-4-1 5 / 21

How to separate the source and filter? Source Signal x(n) 1 H 1(z) e (n) 1 Source Signal Processing H (z) 2 y(n) Chan. Voc. LPC Cepstrum Spectral Envelope Estimation Spectral Envelope Transformation Short-time analysis For each frame, estimate spectral envelope (filter response) 1 Channel vocoder (frequency-domain) 2 Linear Predictive Coding (LPC) (time-domain) 3 Cepstral analysis Source signal is whats left over (residual) after whitening E85.267: Lecture 8 Source-Filter Processing 21-4-1 6 / 21

Channel vocoder (a) BP 1 2 x BP1 (n) ( ) 2 LP x (n) RMS1 Wideband STFT filterbank but using relatively few filters Linearly spaced with equal bandwidth (STFT) Logarithmically spaced (constant-q filter bank) Take RMS energy in each frequency band x(n) (b) BP 1 2 x BP2 (n) BP 2 ( ) 2 LP x (n) RMS2 2 x BPk (n) BP k ( ) 2 LP x (n) RMSk Octave-spaced channel stacking BP 2 BP k Equally-spaced channel stacking f BP 1 BP 2 BP k f E85.267: Lecture 8 Source-Filter Processing 21-4-1 7 / 21

Channel vocoder using FFT Short time spectrum and spectral envelope X(f)/dB 2 4 6 8 1 1 2 3 4 5 6 7 8 f/hz! Lowpass filter magnitude of each STFT frame i.e. filter columns of the spectrogram E85.267: Lecture 8 Source-Filter Processing 21-4-1 8 / 21

Linear predictive coding Predict next input sample as linear combination of previous samples _ e(n) x(n) z -1 z -1 z -1 a 1 a 2 a p Filter is described by a few filter coefficients for each frame p x m [n] ˆx[n] = a k x[n k] k=1 Excitation is whats left after filtering (residual aka prediction error) p e[n] = x[n] ˆx[n] = x[n] a k x[n k] k=1 ^x(n) E85.267: Lecture 8 Source-Filter Processing 21-4-1 9 / 21

LPC analysis/synthesis x(n) e(n) ~ e(n) y(n) _ P(z) ^x(n) P(z) (a) (a) LPC analysis (b) (b) LPC synthesis P(z) is just an FIR filter: P(z) = p k=1 a kz k Excitation is still a filtered version of the input: E(x) = X (z) (1 P(z)) For synthesis, pass (approximate) excitation through the inverse filter: Y (z) = Ẽ(z)H(z) 1 H(z) = 1 P(z) all-pole autoregressive (AR) modeling E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21

LPC - varying filter order LPC filter H(z) models the spectrum of x[n] Minimizing the energy of the residual e[n] gives optimal coefficients ( {a k } = argmin x[n] ) 2 a k x[m k] a k n k The approximation improves with increasing filter order p 1 X(f) /db spectra of original and LPC filters 5 5 p=1 p=2 p=4 p=6 p=8 p=12 1 2 4 6 8 f/khz! E85.267: Lecture 8 Source-Filter Processing 21-4-1 11 / 21

Estimating LPC parameters Set derivative of n e2 [n] w.r.t. a k zero and solve for a k : e 2 [n] = a k End up with p linear equations involving autocorrelations of x: x[m]x[m k] = a k x[m i]x[m k] m i m Solve using Levinson-Durbin recursion n E85.267: Lecture 8 Source-Filter Processing 21-4-1 12 / 21

LPC example.1 -.1 windowed original -.2 LPC residual -.3 5 1 15 2 25 3 35 4 db original spectrum LPC spectrum -2 time / samp -4 residual spectrum -6 1 2 3 4 5 6 7 freq / Hz Filter poles z-plane E85.267: Lecture 8 Source-Filter Processing 21-4-1 13 / 21

Short-Time LP Analysis Short-time LPC analysis Solve LPC for each ~2 ms frame freq / khz 8 6 4 2 1 2 15 Imaginary Part.5 -.5 12 1 5-5 freq / khz 8 6 4-1 -1 1 Real Part -1-15.2.4.6.8 1 2.5 1 1.5 2 2.5 3 E85.267: Lecture 8 Source-Filter Processing 21-4-1 time / s 14 / 21

Cepstral analysis cepstrum = String.reverse( spec ) + trum Entire lexicon of funny anagrams Insight: source and filter add in the log spectral domain Makes them easy to separate X (z) = E(z)H(z) log X (z) = log E(z) + log H(z) Real Cepstrum Spectral Envelope y(n)=x(n) * h(n) FFT Y(k) log Y(k) Y^ (k) R IFFT c(n) c (n) h FFT C h(k)= log H(k) w(n) w LP(n) Source Envelope c (n) x FFT C x(k)= log X(k) w HP(n) E85.267: Lecture 8 Source-Filter Processing 21-4-1 15 / 21

Liftering example By low-pass liftering the cepstrum we obtain the spectral envelope of the signal E85.267: Lecture 8 Source-Filter Processing 21-4-1 16 / 21

Liftering example 2 Original waveform has excitation fine structure convolved with resonances DFT shows harmonics modulated by resonances Log DFT is sum of harmonic comb and resonant bumps IDFT separates out resonant bumps (low quefrency) and regular, fine structure ( pitch pulse ) Selecting low-n cepstrum separates resonance information (deconvolution / liftering ).2 Waveform and min. phase IR -.2 1 2 3 4 abs(dft) and liftered 2 1 1 2 3 log(abs(dft)) and liftered db -2-4 1 2 3 2 real cepstrum and lifter 1 1 2 samps freq / Hz freq / Hz pitch pulse quefrency E85.267: Lecture 8 Source-Filter Processing 21-4-1 17 / 21

prediction filter and residual Applications LP recombining analysis - Speech on them coding ~2ms should yield frames perfect gives s[n] prediction coding applications filter further A(z) and compress residual e[n] e[n] recombining Encoder Filter coefficients them {ai} should Decoder yield perfect s[n] 1 /A(e j! ) coding applications Represent further compress e[n] Input s[n] Input s[n] Encoder LPC f & encode analysis Filter coefficients {ai} Represent Residual & encode 1 /A(e e[n] ) t f LPC Represent & encode e[n] ^ Excitation Decoder generator All-pole filter H(z) = 1 1 - "a i z -i e.g. analysis simple pitch tracker! buzz-hiss encoding e[n] Low bitrate speech codec used Represent in cell phonesexcitation is based ^ All-pole on LPC Pitch period Residual values & encode16 ms frame boundaries generator filter e[n] Quantize LPC filter 1 parameters, use crude approximation to residual 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s E4896 Music Signal Processing Pitch period (Dan values Ellis) 16 ms frame boundaries 21-2-22-13/16 1 Output s[n] ^ t H(z) = 1 Many different 5 ways to represent filter params: 1 - "a i z -i -5 e.g. simple pitch tracker! buzz-hiss encoding Output s[n] ^ prediction coefficients {a k }, roots of 1 P(z), line spectral frequencies Switch between noise and pulse train for excitation 5-5 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s 896 Music Signal UseProcessing codebook(dan of excitations Ellis) (CELP: Code Excited Linear Prediction) 21-2-22 E85.267: Lecture 8 Source-Filter Processing 21-4-1 18 / 21

Applications - Cross-synthesis/Vocoding freq / Hz freq / Hz 4 3 2 1 4 3 2 1 Original (mpgr1_sx419) Noise-excited LPC resynthesis with pole freqs.2.4.6.8 1 1.2 1.4 time / s Reconstruct using excitation from one sound and filter from another Whisperization: replace excitation with white noise E85.267: Lecture 8 Source-Filter Processing 21-4-1 19 / 21

arps Still more frequencies applications but not magnitudes αz +1 8 Original Frequency 6 4 2 = -.6 6.8 ^ Frequency 8 6 4 2.5 1 1.5 2 2.5 3 Time Warped LPC resynth, = -.2.5 1 1.5 2 2.5 3 Time Process formants independent of pitch http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/ Pitch-shifting while preserving formants Processing (Dan Ellis) 21-2-22-14/16 Shift formants while preserving pitch http://www.ee.columbia.edu/ dpwe/resources/matlab/polewarp/ Voice transformation Pitch-analysis E85.267: Lecture 8 Source-Filter Processing 21-4-1 2 / 21

Reading DAFX 9.1 9.3 - Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 21 / 21