Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

Topic 2 Signal Processing Review (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

Recording Sound Mechanical Vibration Pressure Waves Motion->Voltage Transducer Voltage over time 2

Microphones http://www.mediacollege.com/audio/microphones/how-microphones-work.html 3

Amplitude Pure Tone = Sine Wave 1 Period T 44Hz -1 2 4 6 Time (ms) x t = A sin(2πft + φ) time amplitude frequency initial phase 4

Reminders Frequency, f = 1/T, is measured in cycles per second, a.k.a. Hertz (Hz). One cycle contains 2π radians. Angular frequency Ω, is measured in radians per second and is related to frequency by Ω = 2πf. So we can rewrite the sine wave as x t = A sin(ωt + φ) 5

Amplitude Amplitude Fourier Transform 1-1 2 4 6 Time (ms) X f = න x(t)e j2πft dt X f -44 44 Frequency (Hz) 6

Amplitude Amplitude We can also write 1-1 2 4 6 Time (ms) X Ω = න x(t)e jωt dt X f 44 2π 44 2π Angular Frequency (radians) 7

Complex Tone = Sine Waves 22 Hz 1.8.6.4.2 -.2 -.4 -.6 -.8-1 1 2 3 4 5 6 7 8 9 1 + 1.8.6.4.2 -.2 -.4 -.6 66 Hz -.8-1 1 2 3 4 5 6 7 8 9 1 + 1.8.6.4.2 -.2 -.4 -.6 11 Hz -.8-1 1 2 3 4 5 6 7 8 9 1 = 2.5 2 1.5 1.5 -.5-1 -1.5-2 -2.5 1 2 3 4 5 6 7 8 9 1 8

Amplitude Amplitude Frequency Domain 2.5 2 1.5 1.5 -.5-1 -1.5-2 -2.5 X f 1 2 3 4 5 6 7 8 9 1 Time (ms) X f = න x(t)e j2πft dt 22 66 11 Frequency (Hz) 9

Harmonic Sound 1 or more sine waves Strong components at integer multiples of a fundamental frequency (F) in the range of human hearing (2 Hz ~ 2, Hz) Examples 22 + 66 + 11 is harmonic 22 + 375 + 77 is not harmonic 1

Noise Lots of sines at random freqs. = NOISE Example: 1 sines with random frequencies, such that 1 < f < 1. 3 2 1-1 -2-3.5 1 1.5 2 2.5 3 3.5 x 1 4 11

How strong is the signal? Instantaneous value? Average value? Something else? 1 x t -1 2 4 6 3 2 1-1 -2-3.5 1 1.5 2 2.5 3 3.5 x 1 4 12

Acoustical or Electrical Acoustical Average intensity I = 1 ρc density T 1 Dx න 2 t dt T D sound speed View x t as sound pressure Electrical Average power P = 1 R T 1 Dx න 2 t dt T D View x t as electric voltage resistance 13

Root-Mean-Square (RMS) x RMS = T D should be long enough. 1 T Dx න 2 t dt T D x(t) should have mean, otherwise the DC component will be integrated. For sinusoids x RMS = 1 T න T A 2 sin 2 2πft dt = A 2 /2 =.77A 14

Sound Pressure Level (SPL) Softest audible sound intensity.1 watt/m 2 Threshold of pain is around 1 watt/m 2 12 orders of magnitude difference A log scale helps with this The decibel (db) scale is a log scale, with respect to a reference value 15

The Decibel A logarithmic measurement that expresses the magnitude of a physical quantity (e.g., power or intensity) relative to a specified reference level. Since it expresses a ratio of two (same unit) quantities, it is dimensionless. L L ref = 1 log 1 I I ref = 2 log 1 x RMS x ref,rms 16

Lots of references! db-spl A measure of sound pressure level. db-spl is approximately the quietest sound a human can hear, roughly the sound of a mosquito flying 3 meters away. dbfs relative to digital full-scale. dbfs is the maximum allowable signal. Values typically negative. dbv relative to 1 Volt RMS. dbv = 1V. dbu relative to.775 Volts RMS with an unloaded, open circuit. dbmv relative to 1 millivolt across 75 Ω. Widely used in cable television networks. 17

Typical Values Jet engine at 3m Pain threshold Loud motorcycle, 5m Vacuum cleaner Quiet restaurant Rustling leaves Human breathing, 3m Hearing threshold 14 db-spl 13 db-spl 11 db-spl 8 db-spl 5 db-spl 2 db-spl 1 db-spl db-spl 18

AMPLITUDE Digital Sampling 3 2 1-1 quantization increment RECONSTRUCTION 11 1 1 1-2 11 sample interval TIME 19

AMPLITUDE More quantization levels = more dynamic range 6 5 4 3 2 1-1 -2-3 -4 quantization increment 11 11 1 11 1 1 1 11 11 111 sample interval TIME 2

Bit Depth and Dynamics More bits = more quantization levels = better sound Compact Disc: 16 bits = 65,536 levels POTS (plain old telephone service): 8 bits = 256 levels Signal-to-quantization-noise ratio (SQNR), if the signal is uniformly distributed in the whole range SQNR = 2 log 1 2 N 6.2N db E.g., N = 16 bits depth gives about 96dB SQNR. 21

Amplitude RMS 1-1 2 4 6 The red dots form the discrete signal x[n] x RMS = N 1 1 N n= x 2 [n] 22

AMPLITUDE Aliasing and Nyquist 6 5 4 3 2 1-1 -2-3 -4 sample interval TIME 23

AMPLITUDE Aliasing and Nyquist 6 5 4 3 2 1-1 -2-3 -4 sample interval TIME 24

AMPLITUDE Aliasing and Nyquist 6 5 4 3 2 1-1 -2-3 -4 sample interval TIME 25

Nyquist-Shannon Sampling Theorem You can t reproduce the signal if your sample rate isn t faster than twice the highest frequency in the signal. Nyquist rate: twice the frequency of the highest frequency in the signal. A property of the continuous-time signal. Nyquist frequency: half of the sampling rate A property of the discrete-time system. 26

Amplitude Amplitude Discrete-Time Fourier Transform (DTFT) 1-1 2 4 6 The red dots form the discrete signal x[n], where n =, ±1, ±2, X(ω) is Periodic. We often only show π, π ω is a continuous variable X ω = n= x[n]e jωn X ω 2π π π 2π Angular frequency ω 27

Amplitude Relation between FT and DTFT 1 Sampling: x n = x c (nt) -1 2 4 6 Time (ms) FT: X c (Ω) = xc (t)e jωt dt DTFT: X ω = σ n= x[n]e jωn X ω = 1 T k= X c ω T + 2πk T Scaling: ω = ΩT, i.e., ω = 2π corresponds to Ω = 2π T = 2πf s, which corresponds to f = f s. Repetition: X ω contains infinite copies of X c, spaced by 2π. 28

Aliasing X c Ω Complex tone 9Hz + 18Hz 36π 18π 18π 36π Ω X ω Sampling rate = 8Hz 2π π 36π 8 π 2π ω X ω 18π 2 36π 2 Sampling rate = 2Hz 2π π π 2π ω 2Hz 29

Fourier Series FT and DTFT do not require the signal to be periodic, i.e., the signal may contain arbitrary frequencies, which is why the frequency domain is continuous. Now, if the signal is periodic: x t + mt = x t m Ζ It can be reproduced by a series of sine and cosine functions: x t = A + n=1 A n cos Ω n t + B n sin Ω n t In other words, the frequency domain is discrete.

Discrete Fourier Transform (DFT) FT and DTFT are great, but the infinite integral or summations are hard to deal with. In digital computers, everything is discrete, including both the signal and its spectrum. X k = frequency domain index N 1 n= x[n]e j2πkn/n time domain index Length of the signal, i.e. length of DFT 31

DFT and IDFT DFT: IDFT: X k = x n = 1 N k= N 1 x[n]e j2πkn/n n= N 1 X[k]e j2πkn/n Both x[n] and X[k] are discrete and of length N. Treats x[n] as if it were infinite and periodic. Treats X[k] as if it were infinite and periodic. Only one period is involved in calculation. 32

Discrete Fourier Transform If the time-domain signal has no imaginary part (like an audio signal) then the frequencydomain signal is conjugate symmetric around N/2. Time domain x[n] Frequency domain X[k] DC fs/2 Real portion N-1 Imaginary portion N-1 DFT IDFT Real portion N/2 N-1 Imaginary portion N/2 N-1 33

Kinds of Fourier Transforms Fourier Transform Signals: continuous, aperiodic Spectrum: aperiodic, continuous Fourier Series Signals: continuous, periodic Spectrum: aperiodic, discrete Discrete Time Fourier Transform Signals: discrete, aperiodic Spectrum: periodic, continuous Discrete Fourier Transform Signals: discrete, periodic Spectrum: periodic, discrete 34

Frequency domain Time domain Duality continuous Time domain discrete continuous Fourier Transform DTFT aperiodic discrete Fourier Series DFT periodic aperiodic periodic Frequency domain 35

The FFT Fast Fourier Transform A much, much faster way to do the DFT Introduced by Carl F. Gauss in 185 Rediscovered by J.W. Cooley and John Tukey in 1965 The Cooley-Tukey algorithm is the one we use today (mostly) Big O notation for this is O(N log N) Matlab functions fft and ifft are standard. 36

Windowing A function that is zero-valued outside of some chosen interval. When a signal (data) is multiplied by a window function, the product is zero-valued outside the interval: all that is left is the "view" through the window. x[n] w[n] z[n] x = Example: windowing x[n] with a rectangular window 37

amplitude amplitude amplitude Some famous windows Rectangular w n = 1 Note: we assume w[n] = outside some range [,N] sample Triangular (Bartlett) w n = 2 N 1 Hann N 1 2 n N 1 2 sample w n =.5 1 cos 2πn N 1 sample 38

Why window shape matters Don t forget that a DFT assumes the signal in the window is periodic The boundary conditions mess things up unless you manage to have a window whose length is exactly 1 period of your signal Making the edges of the window less prominent helps suppress undesirable artifacts 39

Fourier Transform of Windows We want - Narrow main lobe - Low sidelobes Amplitude (db) 4 3 2 1-1 Main lobe Sidelobes -2-3 -4-2 2 4 Normalized angular frequency 4

Which window is better? Hann window w n =.5 1 cos 2πn N 1 Hamming window w n =.54.46 cos 2πn N 1 5 4 2 Amplitude (db) -5-1 Amplitude (db) -2-4 -15-4 -2 2 4 Normalized angular frequency -6-4 -2 2 4 Normalized angular frequency 41

Multiplication v.s. Convolution Time domain Frequency Domain x[n] y[n] 1 N X[k] Y[k] x[n] y[n] X[k] Y[k] Windowing is multiplication in time domain, so the spectrum will be a convolution between the signal s spectrum and the window s spectrum Convolution in time domain takes O(N 2 ), but if we perform in the frequency domain FFT takes O N log N Multiplication takes O N IFFT takes O N log N 42

Windowed Signal 3 2 1-1 -2-3 5 1 15 2 25 3 35 4 3 2 1-1 -2-3 5 1 15 2 25 3 35 4 43

Spectrum of Windowed Signal 4 2 Amplitude (db) -2-4 -6-8 1 2 3 4 5 Frequency (Hz) Two sinusoids: 1Hz + 15Hz Sampling rate: 1KHz Window length: 1 (i.e. 1/1K =.1s) FFT length: 4 (i.e. 4 times zero padding) 44

Zero Padding Add zeros after (or before) the signal to make it longer Perform DFT on the padded signal 3 2 1-1 -2-3 2 4 6 8 1 12 14 16 Windowed signal Padded zeros 45

Why Zero Padding? Zero padding in time domain gives the ideal interpolation in the frequency domain. It doesn t increase (the real) frequency resolution! 4 times is generally enough Here the resolution is always fs/l=1hz No zero padding 4 times zero padding 8 times zero padding 4 4 4 2 2 2 Amplitude (db) -2-4 Amplitude (db) -2-4 Amplitude (db) -2-4 -6-6 -6-8 1 2 3 4 5 Frequency (Hz) -8 1 2 3 4 5 Frequency (Hz) -8 1 2 3 4 5 Frequency (Hz) 46

How to increase frequency resolution? Time-frequency resolution tradeoff t f = 1 (second) (Hz) 4 Window length: 1ms Window length: 2ms Window length: 4ms 6 5 2 4 Amplitude (db) -2-4 Amplitude (db) 2-2 -4 Amplitude (db) -5-6 -6-8 1 2 3 4 5 Frequency (Hz) -8 1 2 3 4 5 Frequency (Hz) -1 1 2 3 4 5 Frequency (Hz) 47

Short time Fourier Transform Break signal into frames Window each frame Calculate DFT of each windowed frame 48

The Spectrogram There is a spectrogram function in matlab. 49

A Fun Example (Thanks to Robert Remez) 5

Overlap-Add Synthesis IDFT on each spectrum. Use the complex, full spectrum. Don t forget the phase (often using the original phase). If you do it right, the time signal you get is real. (optional) Multiply with a synthesis window (e.g., Hamming) to suppress signals at edges. Not dividing the analysis window Overlap and add different frames together. 51

Constant Overlap Add (COLA) Windows of all frames add up to a constant function. Perfect reconstruction! Frame index m w[n mr] = const Window function Frame hop size Requires special design of w and R Rectangular window: R L Triangular window: R = L k, k 2, k N Hamming/hann window: R = L 2k, k N Window size 52

Shepard Tones Barber s pole Continuous Risset scale 53

Shepard Tones Make a sound composed of sine waves spaced at octave intervals. Control their amplitudes by imposing a Gaussian (or something like it) filter in the log-frequency dimension. Move all the sine waves up a musical ½ step. Wrap around in frequency. 54