Applications of Music Processing

Size: px

Start display at page:

Download "Applications of Music Processing"

David Patterson
5 years ago
Views:

1 Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen

2 Singing Voice Detection Important pre-requisite for: Music segmentation Music thumbnailing (preview version) Singing voice transcription Singing voice separation Lyrics alignment Lyrics recognition

3 Singing Voice Detection Detect singing voice activity during course of a recording Assumptions: Real-world, polyphonic music recordings are analyzed Singing voice performs dominant melody above accompaniment Time in seconds

Accompaniment may play same melody as singing

4 Singing Voice Detection Challenges: Complex characteristics of singing voice Large diversity of accompaniment music Accompaniment may play same melody as singing Pitch-fluctuating instruments my be similar to singing Stable pitch Fluctuating pitch

5 Singing Voice Detection Common approach: Frame-wise extraction of audio features Classification via machine learning Time in seconds

6 Audio Feature Extraction Frame-wise processing: Hopsize Q Blocksize K Window function w(n) Signal frame x(n) Compute for each analysis frame: Time-domain features Spectral features Cepstral feature others

7 Audio Feature Extraction Time-domain features: Zero Crossing Rate (ZCR) High-pitched vs. Low-pitched Linear Prediction Coeff. (LPC) Encodes spectral envelope

Spectrogram [db] Gabor Wavelet Spectrogram [db] 11000 8870 Frequency [Hz] 10000 9000 8000 7000

8 Audio Feature Extraction Spectral features: Spectrogram, linear vs. logarithmic frequency spacing Spectral Flatness (SF), Spectral Centroid (SC), and many others STFT Spectrogram [db] Gabor Wavelet Spectrogram [db] Frequency [Hz] Frequency [Hz] Time [Sec] Time [Sec]

Audio Feature Extraction Cepstral features: Singing voice as an example Convolutive: excitation * filter Excitation: vibration of vocal folds Filter: resonance of the vocal tract Magnitude spectrum

9 Audio Feature Extraction Cepstral features: Singing voice as an example Convolutive: excitation * filter Excitation: vibration of vocal folds Filter: resonance of the vocal tract Magnitude spectrum Multiplicative: excitation filter Log-magnitude spectrum Additive: excitation + filter Liftering Separation into smooth spectral envelope and fine-structured excitation Magnitude spectrum Logarithmic magnitude Extraction of spectral envelope via cepstral liftering Observed Spectrum Spectral Envelope Excitation Spectrum x Frequency (Hz) x 10 4

10 Machine Learning Application to audio signals: Speech recognition Speaker recognition Singing voice detection Genre classification Instrument recognition Chord recognition etc

11 Machine Learning Learning principles: Unsupervised learning Find structures in data Supervised learning Human observer provides ground truth Semi-supervised learning Combination of above principles Reinforcement learning Feedback of confident classifications to the training

12 The Feature Space Geometric and algebraic interpretation of ML problems Features contain numerical values Concatenation of several features Dimensionality M The data set contains N observations Cardinality N Illustrative Example SFM & SCF of 6 complex tones SF K 1 K K 1 k 0 K 1 k 0 s k s k SC K 1 k 0 f K 1 k 0 k sk sk

13 The Feature Space Each feature has one value M=2 Number of observations N=6 Spectral Centroid Spectral Flatness M lpnoisetone.wav noisetone.wav hpnoisetone.wav harmonicnoise.wav pianotone.wav harmonictone.wav N

14 The Feature Space Each feature has one value M=2 Number of observations N=6 Mapping of features SC to y-axis SF to x-axis Scatter plot with unnormalized axes Spectral Centroid Scatter plot of Spectral Flatness vs. Spectral Centroid lpnoisetone.wav noisetone.wav hpnoisetone.wav harmonicnoisetone.wav pianotone.wav harmonictone.wav Spectral Flatness

15 The Feature Space Each feature has one value M=2 Number of observations N=6 Mapping of features Target Labels Spectral Centroid Spectral Flatness SC to y-axis SF to x-axis Scatter plot with unnormalized axes Target class labels Provided by manual annotation

16 Classification methods k-nearest Neighbours (knn) Singing Voice Accompaniment Unknown data

17 Classification methods k-nearest Neighbours (knn) Singing Voice Accompaniment Unknown data L1-Dist. (Manhattan) d 1 M m1 x m y m L2-Dist. (Euclidean) d M 2 m1 x m y m 2 d L -Dist. (Maximum) max x y1, 1, x M y M

18 Classification methods Decision Trees (DT) Singing Voice Accompaniment Unknown data

19 Classification methods Random Forests (RF) Singing Voice Accompaniment Unknown data

20 Classification methods Gaussian Mixture Models (GMM) Singing Voice Accompaniment Unknown data Σ

21 Classification methods Gaussian Mixture Models (GMM) Singing Voice Accompaniment Unknown data Gauss components

22 Classification methods Support Vector Machines (SVM) Singing Voice Accompaniment Unknown data sgn,

23 Classification methods Deep Neural Networks (DNN) Singing Voice Accompaniment Unknown data,,

24 Classification methods Deep Neural Networks (DNN) Singing Voice Accompaniment Unknown data Loss function

25 Classification methods Further methods: Hidden Markov Models Transition probabilities between GMMs Sparse Representation Classifier Sparse linear combination of training data Boosting Combine many weak classifiers Convolutional Neural Networks Recurrent Neural Networks Multiple Kernel Learning others 25

Singing Voice Detection Mel-scale Frequency Cepstral Coefficients Frame Filter Bank x t Gaussian Mixture Model (GMM) x, Σ 1 1, Σ 2 2.

26 Singing Voice Detection Mel-scale Frequency Cepstral Coefficients Frame Filter Bank x t Gaussian Mixture Model (GMM) x, Σ 1 1, Σ , Σ G G N () N () Segment-by-Segment Classification V V V V V V V N N N N N N N N N N N N W 1 i0 log p( x tw i S ) Singing W 1 i0 Accompaniment log p( x tw i M ) w w 2 w G 1 + p( x )

27 Audio Mosaicing Target signal: Beatles Let it be Source signal: Bees Mosaic signal: Let it Bee

28 NMF-Inspired Audio Mosaicing Non-negative matrix factorization (NMF) [Driedger et al. ISMIR 2015] Non-negative matrix Components Activations. = fixed learned learned Proposed audio mosaicing approach Target s spectrogram Source s spectrogram Activations Mosaic s spectrogram Frequency. = Time source Frequency fixed Time source fixed Time target learned Time target

29 Basic NMF-Inspired Audio Mosaicing Spectrogram target Spectrogram source Activation matrix Spectrogram mosaic Frequency Frequency Time source. = Frequency Time target Time source Time target Time target

30 Basic NMF-Inspired Audio Mosaicing Spectrogram target Spectrogram source Iterative updates Activation matrix Spectrogram mosaic Frequency Frequency Time source. = Frequency Time target Time source Time target Time target Preserve temporal context Core idea: support the development of sparse diagonal activation structures

Spectrogram mosaic Frequency Frequency Time source.

31 Basic NMF-Inspired Audio Mosaicing Spectrogram target Spectrogram source Activation matrix Spectrogram mosaic Frequency Frequency Time source. = Frequency Time target Time source Time target Time target

32 Basic NMF-Inspired Audio Mosaicing Spectrogram target Spectrogram source Activation matrix Spectrogram mosaic Frequency Frequency Time source. = Frequency Time target Time source Time target Time target

33 Audio Mosaicing Target signal: Chic Good times Source signal: Whales Mosaic signal

34 Audio Mosaicing Target signal: Adele Rolling in the Deep Source signal: Race car Mosaic signal

35 Drum Source Separation

36 Drum Source Separation Signal Model STFT VV V V istft Relative amplitude Log-frequency V V V Time (seconds) Time (seconds)

37 Drum Sound Separation Decomposition via NMFD Score-based information (drum notation) W Rows of H Log-frequency U U U Audio-based information (training drum sounds) Lateral slices from W Time (seconds)

38 Drum Sound Separation Relative Log-frequency amplitude Time (seconds)

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation