Chapter 7. Frequency-Domain Representations 语音信号的频域表征

Similar documents
Digital Signal Processing

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

ADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

SAMPLING THEORY. Representing continuous signals with discrete numbers

Final Exam Practice Questions for Music 421, with Solutions

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Digital Processing of

Digital Processing of Continuous-Time Signals

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Fourier Methods of Spectral Estimation

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT

EE123 Digital Signal Processing

Complex Sounds. Reading: Yost Ch. 4

Laboratory Assignment 4. Fourier Sound Synthesis

Digital Speech Processing and Coding

Discrete Fourier Transform (DFT)

ME scope Application Note 01 The FFT, Leakage, and Windowing

ECE438 - Laboratory 7a: Digital Filter Design (Week 1) By Prof. Charles Bouman and Prof. Mireille Boutin Fall 2015

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Design of FIR Filters

Advanced Digital Signal Processing Part 2: Digital Processing of Continuous-Time Signals

PROBLEM SET 6. Note: This version is preliminary in that it does not yet have instructions for uploading the MATLAB problems.

Chapter 5 Window Functions. periodic with a period of N (number of samples). This is observed in table (3.1).

Linguistic Phonetics. Spectral Analysis

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

6 Sampling. Sampling. The principles of sampling, especially the benefits of coherent sampling

COMP 546, Winter 2017 lecture 20 - sound 2

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

Sampling and Reconstruction of Analog Signals

L19: Prosodic modification of speech

System analysis and signal processing

Sampling and Signal Processing

Multirate Signal Processing Lecture 7, Sampling Gerald Schuller, TU Ilmenau

Lecture 7 Frequency Modulation

Speech Synthesis; Pitch Detection and Vocoders

Speech Signal Analysis

Digital Filters IIR (& Their Corresponding Analog Filters) Week Date Lecture Title

Laboratory Assignment 5 Amplitude Modulation

EE482: Digital Signal Processing Applications

CS3291: Digital Signal Processing

Spectrum Analysis - Elektronikpraktikum

ELEC-C5230 Digitaalisen signaalinkäsittelyn perusteet

Module 3 : Sampling and Reconstruction Problem Set 3

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Subtractive Synthesis. Describing a Filter. Filters. CMPT 468: Subtractive Synthesis

Speech/Non-speech detection Rule-based method using log energy and zero crossing rate

Final Exam Solutions June 7, 2004

Multirate Digital Signal Processing

GEORGIA INSTITUTE OF TECHNOLOGY. SCHOOL of ELECTRICAL and COMPUTER ENGINEERING. ECE 2026 Summer 2018 Lab #8: Filter Design of FIR Filters

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

The Fundamentals of FFT-Based Signal Analysis and Measurement Michael Cerna and Audrey F. Harvey

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Signal processing preliminaries

Enhanced Waveform Interpolative Coding at 4 kbps

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Sampling of Continuous-Time Signals. Reference chapter 4 in Oppenheim and Schafer.

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

APPLICATIONS OF DSP OBJECTIVES

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis


6.02 Fall 2012 Lecture #13

The Channel Vocoder (analyzer):

FFT analysis in practice

2.1 BASIC CONCEPTS Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal.

Digital Signal Processing

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

DIGITAL FILTERS. !! Finite Impulse Response (FIR) !! Infinite Impulse Response (IIR) !! Background. !! Matlab functions AGC DSP AGC DSP

B.Tech III Year II Semester (R13) Regular & Supplementary Examinations May/June 2017 DIGITAL SIGNAL PROCESSING (Common to ECE and EIE)

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Final Exam Solutions June 14, 2006

Topic 6. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

DSP Laboratory (EELE 4110) Lab#10 Finite Impulse Response (FIR) Filters

DSP First. Laboratory Exercise #11. Extracting Frequencies of Musical Tones

Digital Signal Processing Fourier Analysis of Continuous-Time Signals with the Discrete Fourier Transform

Signal Processing Toolbox

Experiment 8: Sampling

Speech Compression Using Voice Excited Linear Predictive Coding

Time and Frequency Domain Windowing of LFM Pulses Mark A. Richards

ELECTRONOTES APPLICATION NOTE NO Hanshaw Road Ithaca, NY Nov 7, 2014 MORE CONCERNING NON-FLAT RANDOM FFT

Problems from the 3 rd edition

y(n)= Aa n u(n)+bu(n) b m sin(2πmt)= b 1 sin(2πt)+b 2 sin(4πt)+b 3 sin(6πt)+ m=1 x(t)= x = 2 ( b b b b

Concordia University. Discrete-Time Signal Processing. Lab Manual (ELEC442) Dr. Wei-Ping Zhu

Two-Dimensional Wavelets with Complementary Filter Banks

Understanding Digital Signal Processing

Lecture Schedule: Week Date Lecture Title

Short-Time Fourier Transform and Its Inverse

EE228 Applications of Course Concepts. DePiero

Mel Spectrum Analysis of Speech Recognition using Single Microphone

EE 422G - Signals and Systems Laboratory

Outline. Introduction to Biosignal Processing. Overview of Signals. Measurement Systems. -Filtering -Acquisition Systems (Quantisation and Sampling)

4. Design of Discrete-Time Filters

Chapter 4 SPEECH ENHANCEMENT

Transcription:

Chapter 7 Frequency-Domain Representations 语音信号的频域表征 1

General Discrete-Time Model of Speech Production Voiced Speech: A V P(z)G(z)V(z)R(z) Unvoiced Speech: A N N(z)V(z)R(z) 2

DTFT and DFT of Speech The DTFT and the DFT for the speech signal could be calculated by the following: using a value of L=25000 we get the following plot 3

25000-Point DFT of Speech Log Magnitude (db) Magnitude 4

Why STFT for Speech Signals steady state sounds, like vowels, are produced by periodic excitation of a linear system => speech spectrum is the product of the excitation spectrum and the vocal tract frequency response speech is a time-varying signal => need more sophisticated analysis to reflect time varying properties changes occur at syllabic rates (~10 times/sec) over fixed time intervals of 10-30 msec, properties of most speech signals are relatively constant 5

Frequency Domain Processing Coding transform, subband, homomorphic, channel vocoders Restoration/Enhancement/Modification noise and reverberation removal, time-scale modifications (speed-up and slow-down of speech) 6

Overview of Lecture define time-varying Fourier transform (STFT) analysis method define synthesis method from time-varying FT (filterbank summation, overlap addition) show how time-varying FT can be viewed in terms of a bank of filters model computation methods based on using FFT application to vocoders, spectrum displays, format estimation, pitch period estimation 7

Short-Time Fourier Transform (STFT) 8

Short-Time Fourier Transform speech is not a stationary signal, i.e., it has properties that change with time thus a single representation based on all the samples of a speech utterance, for the most part, has no meaning instead, we define a time-dependent Fourier transform (TDFT or STFT) of speech that changes periodically as the speech properties change over time 9

Definition of STFT 10

Short-Time Fourier Transform STFT is a function of two variables, the time index, ˆn, which is discrete, and the frequency variable, ˆω, which is continuous 11

STFT-Different Time Origins the STFT can be viewed as having two different time origins 1. time origin tied to signal x(n) 2. time origin tied to window signal w(-m) 12

Interpretations of STFT j ˆ there are 2 distinct interpretations of Xn ˆ ( e ω ) ˆ 1. assume ˆn is fixed, then X ˆ ( j is simply the normal n e ω ) Fourier transform of the sequence wn ( ˆ mxm ) ( ), < m< j ˆ => for fixed ˆn, X has the same properties as a nˆ ( e ω ) normal Fourier transform j ˆ 2. consider Xn ˆ ( e ω ) as a function of the time index ˆn ˆ with ˆω fixed. Then X ˆ ( j n e ω ) is in the form of a j ˆnˆ convolution of the signal xne ( ˆ) ω with the window wn ( ˆ). This leads to an interpretation in the form of linear filtering of the frequency modulated j ˆnˆ signal xne ( ˆ) ω by wn ( ˆ). We will now consider each of these interpretations of the STFT in a lot more detail 13

DTFT Interpretation of STFT 14

Fourier Transform Interpretation j ˆ consider Xn ˆ ( e ω ) as the normal Fourier transform of the sequence wn ( ˆ mxm ) ( ), < m< for fixed ˆn the window wn ( ˆ m) slides along the sequence x(m) and defines a new STFT for every value of ˆn what are the conditions for the existence of the STFT the sequence wn ( ˆ mxm ) ( ) must be absolutely summable for all values of ˆn since xn ( ˆ) L (32767 for 16-bit sampling) since wn ( ˆ) 1 (normalized window level) since window duration is usually finite wn ( ˆ mxm ) ( ) is absolutely summable for all ˆn 15

Signal Recovery from STFT ˆ since for a given value of ˆn, X ˆ ( j n e ω ) has the same properties as a normal Fourier transform, we can recover the input sequence exactly j ˆ since X is the normal Fourier transform of the window nˆ ( e ω ) sequence wn ( ˆ mxm ) ( ), then assuming the window satisfies the property that w(0) 0 a trivial requirement), then by evaluating the inverse Fourier transform when m= nˆ, we obtain 16

Signal Recovery from STFT with the requirement that w(0) 0, the sequence xn ( ˆ) can j ˆ be recovered from Xn ˆ ( e ω j ˆ ), if Xn ˆ ( e ω ) is known for all values of ˆω over one complete period sample-by-sample recovery process j ˆ Xn ˆ ( e ω ) must be known for every value of ˆn and for all ˆω can also recover sequence wn ( ˆ mxm ) ( ) but can t guarantee that x(m) can be recovered since wn ( ˆ m) can equal 0 17

Alternative Forms of STFT 1. real and imaginary parts 2. magnitude and phase 18

Role of Window in STFT The window wn ( ˆ m) does the following chooses portion of x(m) to be analyzed j ˆ window shape determines the nature of Xn ˆ ( e ω ) j ˆ Since X (for fixed ) is the normal FT of ˆ nˆ ( e ω ) ˆn wn ( mxm ) ( ) then if we consider the normal FT s of both x(n) and w(n) individually, we get 19

Role of Window in STFT then for fixed ˆn, the normal Fourier transform of the product wn ( ˆ mxm ) ( ) is the convolution of the transforms of wn ( ˆ m) and xm ( ) limiting case we get the same thing no matter where the window is shifted 20

Interpretation of Role of Window j ˆ Xn ˆ ( e ω j ˆ ) is the convolution of X( e ω ) with the FT of the shifted j ˆ ω j ˆ ωnˆ window sequence We ( ) e j ˆ X( e ω ) really doesn t have meaning since xn ( ˆ) varies with time consider xn ( ˆ) defined for window duration and extended for all time to have the same properties j ˆ => then X( e ω ) does exist with properties that reflect the sound within the window j ˆ Xn ˆ ( e ω ) is a smoothed version of the FT of the part of xn ( ˆ) that is within the window w 21

Windows in STFT consider rectangular and Hamming windows, where width of the main spectral lobe is inversely proportional to window length, and side lobe levels are essentially independent of window length Rectangular Window: flat window of length L samples; first zero in frequency response occurs at F S /L, with sidelobe levels of -14 db or lower Hamming Window: raised cosine window of length L samples; first zero in frequency response occurs at 2 F S /L, with sidelobe levels of -40 db or lower 22

Windows L=2M+1-point Hamming window and its corresponding DTFT 23

Frequency Responses of Windows 24

Effect of Window Length - HW 25

Effect of Window Length - HW 26

Effect of Window Length - RW 27

Effect of Window Length - HW 28

Relation to Short-Time Autocorrelation j ˆ Xn ˆ ( e ω ) is the discrete-time Fourier transform of wn [ ˆ mxm ][ ] for each value of ˆn, then it is seen that is the Fourier transform of which is the short-time autocorrelation function of the previous chapter. Thus the above equations relate the shorttime spectrum to the short-time autocorrelation. 29

Short-Time Autocorrelation and STFT 30

Summary of FT view of STFT Interpret X ˆ ( j n e ω ) as the normal Fourier transform of the sequence wn ( ˆ mxm ) ( ), < m< properties of this Fourier transform depend on the window j X ˆ ( e ω ) frequency resolution of varies inversely with the length of n the window => want long windows for high resolution want x(n) to be relatively stationary (non-time-varying) during duration of window for most stable spectrum => want short windows as usual in speech processing, there needs to be a compromise between good temporal resolution (short windows) and good frequency resolution (long windows) 31

Linear Filtering Interpretation of STFT 32

Linear Filtering Interpretation 1. modulation-lowpass filter form 2. bandpass filter-demodulation 33

Linear Filtering Interpretation 34

Linear Filtering Interpretation 35

Linear Filtering Interpretation 36

Linear Filtering Interpretation 2. bandpass filter-demodulation form 37

Summary - STFT Fixed value of ˆn, varying ˆω -- DFT Interpretation Fixed value of ˆω, varying ˆn -- Filter Bank Interpretation 38

Summary DFT Interpretation 39

Summary Modulation/Lowpass Filter 40

Summary Bandpass Filter/Demodulation 41

STFT Magnitude Only for many applications you only need the magnitude of the STFT(not the phase) in such cases, the bandpass filter implementation is less complex, since 42

Sampling Rates of STFT 43

Sampling Rates of STFT need to sample STFT in both time and frequency to produce an unaliased representation from which x(n) can be exactly recovered 44

Sampling Rate in Time to determine the sampling rate in time, we take a linear filtering view j ˆ 1. X is the output of a filter with impulse response n( e ω ) wn ( ) j ˆ 2. We ( ω ) is a lowpass response with effective bandwidth of B Hertz j ˆ thus the effective bandwidth of Xn( e ω j ) is B Hertz => X ˆ n( e ω ) has to be sampled at a rate of 2B samples/second to avoid aliasing 45

Sampling Rate in Frequency ˆ since X ( j n e ω ) is periodic in ˆω with period 2π, it is only necessary to sample over an interval of length 2 π need to determine an appropriate finite set of frequencies, ˆ ωk = 2 πk/ Nk, = 0,1,..., N 1 ˆ at which X ( j must be specified to exactly recover x(n) n e ω ) ˆ use the Fourier transform interpretation of X ( j n e ω ) j ˆ 1. if the window w(n) is time-limited, then the inverse transform of X is n( e ω ) time-limited ˆ 2. since the inverse Fourier transform of X ( j n e ω ) is the signal x(m)w(n-m) and this signal is of duration L samples (the duration of w(n)), then according to j ˆ the sampling theorem X must be sampled (in frequency) at the set of n( e ω ) frequencies ˆ ωk = 2 πk/ Nk, = 0,1,..., N 1, N Lin order to exactly recover x(n) ˆ from X ( j n e ω ) thus for a Hamming window of duration L=400 samples, we require that the STFT be evaluated at least 400 uniformly spaced frequencies around the unit circle 46

Total Sampling Rate of STFT the total sampling rate for the STFT is the product of the sampling rates in time and frequency, i.e., SR = SR(time) x SR(frequency) = 2B x L samples/sec B = frequency bandwidth of window (Hz) L = time width of window (samples) for most windows of interest, B is a multiple of F S /L, i.e., B = C F S /L (Hz), C=1 for Rectangular Window C=2 for Hamming Window SR = 2C F S samples/second can define an oversampling rate of SR/ F S = 2C = oversampling rate of STFT as compared to conventional sampling representation of x(n) for RW, 2C=2; for HW 2C=4 => range of oversampling is 2-4 this oversampling gives a very flexible representation of the speech signal 47

Sampling the STFT DFT Notation let w[-m] 0 for 0 m L-1 (finite duration window with no zero-valued samples) if L N then (DFT defined with no aliasing => can recover sequence exactly using inverse DFT) if R L, then all samples can be recovered from X r [k] (R > L => gaps in sequence) 48

Spectrographic Displays 49

Spectrographic Displays Sound Spectrograph-one of the earliest embodiments of the timedependent spectrum analysis techniques Time-varying average energy in the output of a variable frequency bandpass filter is measured and used as a crude measure of the STFT thus energy is recorded by an ingenious electro-mechanical system on special electrostatic( 静电 ) paper called teledeltos paper( 电记录纸 ) result is a two-dimensional representation of the time-dependent spectrum: with vertical intensity being spectrum level at a given frequency, and horizontal intensity being spectral level at a given time; with spectrum magnitude being represented by the darkness of the marking wide bandpass filters (300 Hz bandwidth) provide good temporal resolution and poor frequency resolution (resolve pitch pulses in time but not in frequency) called wideband spectrogram narrow bandpass filters (45 Hz bandwidth) provide good frequency resolution and poor time resolution (resolve pitch pulses in frequency, but not in time) called narrowband spectrogram 50

Conventional Spectrogram (Every salt breeze comes from the sea) 51

Digital Speech Spectrograms wideband spectrogram follows broad spectral peaks (formants) over time resolves most individual pitch periods as vertical striations since the IR of the analyzing filter is comparable in duration to a pitch period what happens for low pitch males high pitch females for unvoiced speech there are no vertical pitch striations narrowband spectrogram individual harmonics are resolved in voiced regions formant frequencies are still in evidence usually can see fundamental frequency unvoiced regions show no strong structure 52

Digital Speech Spectrograms Speech Parameters ( This is a test ): sampling rate: 16 khz speech duration: 1.406 seconds speaker: male Wideband Spectrogram Parameters: analysis window: Hamming window analysis window duration: 6 msec (96 samples) analysis window shift: 0.625 msec (10 samples) FFT size: 512 Narrowband Spectrogram Parameters: analysis window: Hamming window analysis window duration: 60 msec (960 samples) analysis window shift: 6 msec (96 samples) FFT size: 1024 Matlab Example 53

Digital Speech Spectrograms 6 msec (96 samples) window 60 msec (960 sample) window 54

nfft=1024, L=80, R=5 Spectrogram - Male She had your dark suit in. nfft=1024, L=800,R = 10 55

nfft=1024, L=80, R=5 Spectrogram - Female She had your dark suit in. nfft=1024, L=800,R = 10 56

A Summary on Introduced STFS Methods 57

Method #1 ˆ since X ˆ ( j n e ω ) is the normal Fourier transform of the window sequence wn ( ˆ mxm ) ( ), then with the requirement that w(0) 0, the sequence xn ( ˆ) can j ˆ be recovered from Xn ˆ ( e ω j ˆ ), if Xn ˆ ( e ω ) is known for every value of ˆn and for all ˆω 58

Method #2 j ˆ X ˆ ( e ω ) can be recovered from its sample version n if RR FF ss /2BB and NN LL, where B is the window bandwidth 59

Method #3 DFT Notation let w[-m] 0 for 0 m L-1 (finite duration window with no zero-valued samples) if L N then (DFT defined with no aliasing => can recover sequence exactly using inverse DFT) if R L, then all samples can be recovered from X r [k] (R > L => gaps in sequence) 60

Overlap Addition (OLA) Method 61

Overlap Addition (OLA) Method based on normal FT interpretation of short-time spectrum j can reconstruct x(m) by computing IDFT of Xn ˆ ( e ωk ) and dividing out the window (assumed non-zero for all samples) this process gives L signal values of x(m) for each window => window can be moved by L samples and the process repeated This procedure is theoretically valid with R<=L<=N j k Not practical since small changes in XrR ( e ω ) will be amplified by dividing the inverse DFT by the window 62

Overlap Addition (OLA) Method summation is for overlapping analysis sections for each value of m where is measured, do an inverse FT to give The condition for exact reconstruction of x[n] is wn [ ] = wrr [ n ] = C r= 63

Overlap Addition (OLA) Method 64

Overlap Addition of Bartlett and Hann Windows L = 2M+1 R = M 65

Spectral Condition jω wn [ ] W( e ) w n W e * jω [ ] ( ) * j(2 π k/ R) wn [ ] = wrr [ n] W ( e ) r= R 1 1 π wn [ ] = wrr [ n] = W ( e ) e R r= k= 0 * j(2 k/ R) j(2 πk/ R) n One sufficient condition for perfect reconstruction is: * j(2 πk/ R) j(2 πk/ R) W e We k R ( ) = ( ) = 0, = 1,2,..., 1 66

Window Spectra 67

Hamming Window Spectra DTFTs of even-length, odd-length and modified odd-to-even length Hamming windows Odd-to-even: truncate from L = 2M+1 to L = 2M by simply zeroing the last sample; zeros spaced at 2π/R give perfect reconstruction using OLA 68

Overlap Addition (OLA) Method w(n) is an L-point Hamming window with R=L/4 assume x(n)=0 for n<0 time overlap of 4:1 for HW first analysis section begins at n=l/4 69

Overlap Addition (OLA) Method 4-overlapping sections contribute to each interval N-point FFT s done using L speech samples, with N-L zeros padded at end to allow modifications without significant aliasing effects for a given value of n y(n)=x(n)w(r-n)+x(n)w(2rn)+ x(n)w(3r-n)+x(n)w(4rn)= x(n)[w(r-n)+w(2r-n)+w(3rn)+ w(4r-n)]=x(n) W(e j0 )/R 70

Filter Bank Summation (FBS) 71

Filter Bank Summation the filter bank interpretation of the STFT shows that for any frequency, is a lowpass representation of the signal in a band centered at ( for FBS) where is the lowpass window used at frequency 72

Filter Bank Summation define a bandpass filter and substitute it in the equation to give 73

Filter Bank Summation thus is obtained by bandpass filtering x(n) followed by modulation with the complex exponential. We can express this in the form thus is the output of a bandpass filter with impulse response 74

Filter Bank Summation 75

Filter Bank Summation 76

Filter Bank Summation consider a set of N bandpass filters, uniformly spaced, so that the entire frequency band is covered also assume window the same for all channels, i.e., if we add together all the bandpass outputs, the composite response is if is properly sampled in frequency (N L), where L is the window duration, then it can be shown that 77

Proof of FBS Formula derivation of FBS formula if is sampled in frequency at uniformly spaced points, the inverse discrete Fourier transform of the sampled version of is (recall that sampling multiplication convolution aliasing) an aliased version of w(n) is obtained. 78

Proof of FBS Formula If w(n) is of duration L samples, then and no aliasing occurs due to sampling in frequency of In this case if we evaluate the aliased formula for n = 0, we get the FBS formula is seen to be equivalent to the formula above, since (according to the sampling theorem) any set of N uniformly spaced samples of is adequate. 79

Filter Bank Summation the impulse response of the composite filter bank system is thus the composite output is thus for FBS method, the reconstructed signal is if is sampled properly in frequency, and is independent of the shape of w(n) 80

Filter Bank Summation 81

FBS Reconstruction the composite impulse response for the FBS system is defining a composite of the terms being summed as we get for it is easy to show that p(n) is a periodic train of impulses of the form giving for the expression thus the composite impulse response is the window sequence sampled at intervals of N samples 82

FBS Reconstruction impulse response of ideal lowpass filter with cutoff frequency π/n for ideal LPF we have giving other cases where perfect reconstruction is obtained 83

Summary of FBS Reconstruction for perfect reconstruction using FBS methods 1. w(n) does not need to be either time-limited or frequency-limited to exactly reconstruct x(n) from 2. w(n) just needs equally spaced zeros, spaced N samples apart for theoretically perfect reconstruction exact reconstruction of the input is possible with a number of frequency channels less than that required by the sampling theorem key issue is how to design digital filters that match these criteria 84

Practical Implementation of FBS 85

FBS and OLA Comparisons 86

FBS and OLA Comparisons filter bank summation method overlap addition method one depends on sampling relation in frequency one depends on sampling relation in time FBS requires sampling in frequency be such that the window transform obeys the relation OLA requires that sampling in time be such that the window obeys the relation the key to Short-Time Fourier Analysis is the ability to modify the shorttime spectrum via quantization, noise enhancement, signal enhancement, speed-up/slow-down, etc) and recover an "unaliased" modified signal 87

Applications of STFT 88

Applications of STFT vocoders => voice coders, code speech at rates much lower than waveform coders removal of additive noise de-reverberation speed-up and slow-down of speech for speed learning, aids for the handicapped 89

Coding of STFT elements of STFT 1. set of {ω k } chosen to cover frequency range of interest 2. w k (n)-set of lowpass analysis windows 3. P k -set of complex gains to make composite frequency response as close to ideal as possible => goal is to sample STFT at rates lower than x(n) 90

Coding of STFT non-uniform coding and quantization 28 channels 100/sec SR (gives small amount of aliasing) coding log magnitude and phase using 3 bits for log magnitude and 4 bits for phase for channels 1-10; and 2 bits for log magnitude and 3 bits for phase for channels 11-28 total rate of 16 Kbps 91

The Phase Vocoder used for speed-up and slow-down of speech speed-up: divide center frequency and phase derivative by q slow-down: multiply center frequency and phase derivative by q 92

Examples of Rate Changes in Speech Female Speaker Original rate Speeded up Speeded up more Slowed down Slowed down more Male Speaker Original rate Speeded up Speeded up more Slowed down Slowed down more Modify sampling rate +30% -30% Modify sampling rate +30% -30% 93

Phase Vocoder Time Expanded 94

Phase Vocoder Time Compressed 95

Channel Vocoder interpret STFT so that each channel can be thought of as a bandpass filter with center frequency ω k magnitude of STFT can be approximated by envelope detection on the BPF output analyzer-bank of channels; need excitation info (the phase component) => V/UV detector, pitch detector synthesizer-channel signal control channel amplitude; excitation signals control detailed structure of output for a given channel; V/UV choice of excitation source => highly reverberant speech because of total lack of control of composite filter bank response 96

Channel Vocoder 1200-9600 bps 600 bps for pitch and V/UV easy to modify pitch, timing 97

Channel Vocoder 98