Advanced audio analysis. Martin Gasser

Size: px

Start display at page:

Download "Advanced audio analysis. Martin Gasser"

Florence Barrett
5 years ago
Views:

1 Advanced audio analysis Martin Gasser

2 Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high level descriptions Which properties of the signals are captured by the features?

3 Topics STFT, Phase Vocoder ConstantQ transform Source-filter analysis (LPC, Cepstrum, MFCC) Spectral modeling synthesis Beat tracking Pitch estimation Chord/key recognition

4 STFT Short time fourier transform Take DFT s of (overlapping) frames of audio data Before DFT, multiply data with window function Efficiently implemented via FFT (e.g., FFTW) Resolution of STFT limited by samplerate/number of bins by window type (spectrum is convolved with DFT of window function)

5 Phase vocoder Analysis/resynthesis method based on STFT Independent modification of magnitude and phase values in STFT bins High-quality pitch shifting/ time stretching/other effects

6 Problems of STFT Window size/type has to be manually adjusted to the data Equal time/frequency resolution for all freq. bands Human auditory perception has good frequency resolution in lower bands, good time resolution in upper bands Ratio of center frequency to bandwidth of auditory filters (``filter Q ) is approximately constant

7 Constant Q transform Window length of basis sinusoids is inversely related to center frequencies Center frequencies are logarithmically spaced ( no 0 frequency!) Basis matrix is not invertible there is no unique inversion (yet?) Efficient implementation: Leverages sparsity of basis functions in frequency domain

8 Fast CQT Time kernel: K (dense) Spectral kernel: K (sparse) DFT X cq [k cq ]= N 1 n=0 x[n]k [n, k cq ] = 1 N N 1 k=0 X[k]K [k, k cq ]

9 STFT vs. CQT

10 SMS Spectral modeling synthesis Enhancement of tracking phase vocoder Tries to separate signal into sinusoidal and residual (filtered white noise) parts Store sinusoidal tracks and filter coefficients Mixed bottom-up/top-down approach Usage: transcription, high quality time stretching/pitch shifting

11 Algorithm

12 Deterministic part (a) Peak picking (b) Peak interpolation to increase accuracy

13 (c) Peak tracking Deterministic part

14 Stochastic part Spectral subtraction can be done in frequency or time domain Frequency domain: Synthesize spectral shape of sinusoid (main lobe of window function) and resynthesize Time domain: Use phase matched additive synthesis Ideal residual is stochastic

15 Stochastic part Perform amplitude rescaling in order to reduce smearing artifacts Compare residual to original signal Whenever residual > original, reduce amplitude of residual Model spectral envelope of resulting signal (smoothed DFT, LPC, Cepstrum)

16 Critical steps Spectral analysis: Currently, STFT - can we improve? Additive resynthesis Smearing at transients!

17 Source-filter analysis Idea: signal excitation resonance Models human speech production and many musical instruments Excitation broadband pitched source signal (e.g., glottal pulse train) Resonance slowly varying filter (e.g., vocal tract) formants

18 Source-filter analysis

19 Source-filter analysis Source signal is convolved with time-varying filter How to deconvolve the resulting signal? How to calculate coefficients of the filter? Applications: Pitch tracking, speech recognition/synthesis, music similarity,...

20 Linear Predictive Coding Analysis: Optimize coefficients in a predictive model (FIR filter), such that prediction error is minimized Difference between input signal and prediction: Residual Inverse filter: All pole (IIR) filter Resynthesis: Use (compressed) residual as input to inverse filter

21 LPC maths e(n) =x(n) p k=1 a k x(n k) E{e 2 (n)} a i = 2E{e(n) e(n) a i } = 2E{e(n)x(n i)} p = 2E{[x(n) a k x(n k)]x(n i)} =0 p k=1 Normal equations: Toeplitz matrix k=1 a k E{x(n k)x(n i)} = E{x(n)x(n i)} p a k r xx (i k) =r xx (i),i=1,...,p k=1 Efficient solution: Levinson-Durbin recursion

22 Cepstral techniques ``Cepstrum : Spectrum of a log(abs (spectrum)) Spectrum of signal: Spectrum of source spectrum of filter ``quefrency : Abscissa of cepstrum plot, unit of quefrency: Time (!) ``Cepstrogram : Plot of time intervals vs. spectral periodicities ``Liftering : Filtering in the cepstral domain

23 Cepstrum Inverse transform (DFT) of (liftered) Cepstrum spectral envelope

24 MFCC MFCC(x) =DCT(Mel(log DFT(x) )) Logarithm: Transforms product spectrum to sum Mel: Perceptual scale of pitches judged by listeners to be equal in distance to one another DCT: Decorrelates signal (DCT-II) spectral envelope (timbre) low coeffs.

25 Music similarity Model timbre as Gaussian distribution Σ = E(XX T ) µµ T µ = 1 n Σ(x i) E(XX T ) = 1 n Σx ix T i Compute similarity between distributions (KL divergence, earth movers distance,...) Simple genre classification Training : Labeled reference samples Nearest neighbor classification

26 High-level music analysis Beat tracking: Track locations of downbeats Tempo estimation: Find the (perceptual) tempo of a musical piece Pitch estimation Chord/key estimation

27 Beat tracking First step: Onset detection Can be done in spectral or time domain Causal/ real time methods: Model beat as dynamically excited oscillator Offline methods: Cluster inter-onset-intervals and find most plausible beat hypothesis

28 Scheirer s algorithm Subband decomposition (6 bands) Input half-wave rectified envelopes to resonator filterbank (150 bands ~ bpm) Choose resonator with max. output over all bands ( Tempo)

29 Scheirer s algo cont d Beat phase determination can be done by inspecting output or internal state of winning oscillator Pros: Predicts what is happening NOW (in contrast to simple autocorrelation, which performs calculation after the fact ) Cons: Discretizes tempo

30 Non-causal IOI clustering Multiple agents Dixon s algorithm

31 Dixon s algo cont d Onset detection: Surfboard method Calculate amplitude envelope of signal Linear regression of envelope Use IOI clusters as input to agents which predict beat times

32 Pitch estimation Task: Find the fundamental frequency in a signal Problems: Lowest peak is not always the fundamental frequency Perceived fundamental may not even be physically present

33 Pitch estimation Time-domain Zero-crossing rate Maxima in autocorrelation φ(τ) = 1 N Minima in magnitude difference Frequency-domain Cepstrum Maximum likelihood, HPS N 1 n=0 ψ(τ) = 1 N x(n)x(n + τ) N 1 n=0 x(n) x(n + τ)

34 Cepstrum pitch detection Real Cepstrum: C(x) =IFFT(log( DFT(x) )) log scales values into usable range Regular partials appear as peaks in cepstrum Unit of quefrency is ms (period)

35 HPS, ML Harmonic Product spectrum Y (ω) = R r=1 X(ωr) Ŷ = max ω i Y (ω i ) Maximum likelihood Correlate ideal spectra with input Ideal spectrum: Pulse train starting at ω, convolved with analysis window function Select spectral template with max. corr.

36 Key/Chord recognition Chroma: Fold down spectral representation to 12 bins, one bin covers one pitch class Correlate Chroma vectors with pitch-class distribution templates

37 Thank you!

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21 E85.267: Lecture 8 Source-Filter Processing E85.267: Lecture 8 Source-Filter Processing 21-4-1 1 / 21 Source-filter analysis/synthesis n f Spectral envelope Spectral envelope Analysis Source signal n 1