REpeating Pattern Extraction Technique (REPET)

Size: px

Start display at page:

Download "REpeating Pattern Extraction Technique (REPET)"

Joel Jackson
5 years ago
Views:

1 REpeating Pattern Extraction Technique (REPET) EECS 32: Machine Perception of Music & Audio Zafar RAFII, Spring 22

2 Repetition Repetition is a fundamental element in generating and perceiving structure Propellerheads - History Repeating Zafar RAFII, Spring 22 2

3 Repetition Repetition is a fundamental element in generating and perceiving structure Propellerheads - History Repeating Zafar RAFII, Spring 22 3

4 Repetition Repetitions happen in audio in general Music Repetitive noises Auditory grouping etc. Zafar RAFII, Spring 22 4

5 Repetition Repetitions happen in art in general Painting Sculpture Architecture etc. Zafar RAFII, Spring 22

6 Repetition Repetitions happen in nature in general Animals Plants Objects etc. Zafar RAFII, Spring 22 6

7 Repetition Musical pieces are generally characterized by an underlying repeating structure over which varying elements are superimposed Propellerheads - History Repeating Zafar RAFII, Spring 22 7

8 Repetition This means there should be patterns that are more or less repeating in time and frequency Mixture Spectrogram High energy Low energy Zafar RAFII, Spring 22 8

time-frequency mask Time-Frequency Mask. 2...9.

9 Repetition The (more or less) repeating patterns could be identified using a time-frequency mask Time-Frequency Mask = +repeating = -repeating Zafar RAFII, Spring 22 9

10 Repetition The mask could be applied on the mixture to extract the (more or less) repeating patterns Repeating Spectrogram High energy Low energy Zafar RAFII, Spring 22

11 Repetition REpeating Pattern Extraction Technique!. Identify the repeating period 2. Model the repeating segment 3. Extract the repeating structure Simple music/voice separation method! Repeating structure = musical background Non-repeating structure = vocal foreground Zafar RAFII, Spring 22

Step 3 Step 2 Step REPET Mixture Signal x Mixture Spectrogram V

. 6 2 3 4 6 p 2 3 4 6 V Median Repeating Segment S 2 2 3 3 4 4 p

Time-Frequency Mask M 2 2 3 2 3 2 4 3 4 3 4 2 3 4 6 4.8.

12 Step 3 Step 2 Step REPET Mixture Signal x Mixture Spectrogram V Beat Spectrum b p V Median Repeating Segment S p 2 3 2p S V Repeating Spectrogram W Time-Frequency Mask M min min min Zafar RAFII, Spring 22 2

13 Practical Advantages Not feature-dependent Does not rely on complex frameworks Does not require prior training Zafar RAFII, Spring 22 3

14 Practical Interests Instrument/vocalist identification Pitch/melody transcription Karaoke gaming Zafar RAFII, Spring 22 4

15 Intellectual Interests Music understanding Music perception Simply based on repetition! Zafar RAFII, Spring 22

16 REPET Parallel with background subtraction in vision Compare frames to estimate a background model Zafar RAFII, Spring 22 6

17 REPET Parallel with background subtraction in vision Extract the background from the foreground Zafar RAFII, Spring 22 7

18 REPET Parallel with background subtraction in vision In audio, we also need to identify the repetitions! Mixture Signal Zafar RAFII, Spring 22 8

19 REPET Parallel with background subtraction in vision In audio, we also need to identify the repetitions! Vocal Foreground Musical Background Zafar RAFII, Spring 22 9

20 amplitude correlation Repeating Period We compute the autocorrelations of the rows of the spectrogram to reveal periodicities Mixture Spectrogram Autocorrelation Plots 2 2 acorr lag (sec) Spectrum at khz Autocorrelation at khz acorr lag (sec) Zafar RAFII, Spring 22 2

21 correlation Repeating Period We take the mean of the autocorrelations (rows) to obtain the beat spectrum 2 Mixture Spectrogram 2 Autocorrelation Plots Beat Spectrum acorr lag (sec) mean lag (sec) Zafar RAFII, Spring 22 2

22 Repeating Period The beat spectrum reveals the repeating period p of the underlying repeating structure Mixture Signal Beat Spectrum. p lag (sec) Zafar RAFII, Spring 22 22

correlation (khz) frequency (khz) frequency (khz) frequency (khz) frequency Repeating Segment The repeating period is then used to segment the mixture spectrogram at period rate 2 Mixture Spectrogram

23 correlation (khz) frequency (khz) frequency (khz) frequency (khz) frequency Repeating Segment The repeating period is then used to segment the mixture spectrogram at period rate 2 Mixture Spectrogram (sec) time (sec) time Segmented Spectrogram time 4 (sec) Spectrogram Spectrogram Spectrogram. Spectrogram. Beat Spectrum lag (sec) Zafar RAFII, Spring 22 23

24 (khz) frequency (khz) frequency (khz) frequency (khz) frequency Repeating Segment The repeating segment model is calculated as the element-wise median of the segments 2 Mixture Spectrogram (sec) time (sec) time Segmented Spectrogram time 4 (sec) Repeating Segment median Spectrogram Spectrogram Spectrogram. Spectrogram Median Zafar RAFII, Spring 22 24

25 Repeating Segment The median helps to derive a smooth repeating segment model, removing outliers 2 Mixture Spectrogram. Repeating Segment Segment Model 2 median energy energy Zafar RAFII, Spring 22 2

26 Repeating Structure We take the element-wise min between the repeating segment model and the segments Mixture Spectrogram Repeating Spectrogram 2 (sec) time min Median Zafar RAFII, Spring 22 26

27 Repeating Structure We obtain a repeating spectrogram model for the repeating musical background Mixture Spectrogram Repeating Spectrogram 2 (sec) time Median min Zafar RAFII, Spring 22 27

28 Repeating Structure The repeating spectrogram model has at most the same values as the mixture spectrogram Mixture Spectrogram Repeating Spectrogram Non-repeating Spectrogram Zafar RAFII, Spring 22 28

29 Repeating Structure The repeating spectrogram model is divided by the mixture spectrogram to get a soft mask Mixture Spectrogram Repeating Spectrogram Time-frequency Mask 2 2Mixture Spectrogram time 6 8(sec) 2 divides Zafar RAFII, Spring 22 29

30 Repeating Structure In the mask, the more (less) a bin is repeating, the more (less) it is weighted toward () Mixture Spectrogram Mixture Spectrogram Repeating Spectrogram Spectrogram ModelTime-frequency Mask Time-Freq median division time - Zafar RAFII, Spring 22 3

Repeating Structure A binary time-frequency mask can be further derived by fixing a threshold between and Mixture Spectrogram Mixture Spectrogram Repeating

31 Repeating Structure A binary time-frequency mask can be further derived by fixing a threshold between and Mixture Spectrogram Mixture Spectrogram Repeating Spectrogram Spectrogram ModelTime-frequency Mask Time-Freq median division time - Zafar RAFII, Spring 22 3

Repeating Structure The mask is then multiplied to the mixture STFT to extract the repeating background STFT 2 Mixture Spectrogram Background Spectrogram 2

32 Repeating Structure The mask is then multiplied to the mixture STFT to extract the repeating background STFT 2 Mixture Spectrogram Background Spectrogram 2 Background Signal istft x Time-frequency Mask 2 You actually apply the mask on the STFT!!! Zafar RAFII, Spring 22 32

33 Repeating Structure The non-repeating foreground is equal to the mixture minus the repeating background 2 Mixture Spectrogram Background Spectrogram 2 Background Signal istft Background Signal _ Mixture Signal Foreground Signal Zafar RAFII, Spring 22 33

34 Repeating Structure Repeating background = music Non-repeating foreground = voice Background Signal - Mixture Signal REPET. Repeating period 2. Repeating segment 3. Repeating structure Foreground Signal Zafar RAFII, Spring 22 34

35 State-of-the-Art Music/voice separation systems generally first identify the vocal/non-vocal segments and then use different techniques to separate the musical accompaniment and the lead vocals Non-negative Matrix Factorization (NMF) Accompaniment modeling Pitch-based inference Zafar RAFII, Spring 22 3

36 State-of-the-Art Non-negative Matrix Factorization (NMF) Iterative factorization of the mixture spectrogram into non-negative additive basic components Limitations Need to know the number of components! Need a proper initialization! Zafar RAFII, Spring 22 36

37 State-of-the-Art Accompaniment modeling Modeling of the musical accompaniment from the non-vocal segments in the mixture Limitations Need an accurate vocal/non-vocal segmentation! Need a sufficient amount of non-vocal segments! Zafar RAFII, Spring 22 37

38 State-of-the-Art Pitch-based inference Separation of the vocals using the predominant pitch contour extracted from the vocal segments Limitations Cannot extract unvoiced vocals! Harmonic structure of instruments can interfere! Zafar RAFII, Spring 22 38

39 Evaluation REPET [Rafii & Pardo, 2] Automatic (simple) period finder Geometrical mean (instead of median) Binary time-frequency masking (not soft) Competitive method [Hsu et al., 2] Pitch-based inference technique Unvoiced vocals separation Voiced vocals enhancement Zafar RAFII, Spring 22 39

40 Evaluation Data set (MIR-K), song clips (karaoke Chinese pop songs) 4 to 3 seconds for a total of 33 minutes 3 voice-to-music mixing ratios (-,, and db) Zafar RAFII, Spring 22 4

41 Evaluation Comparative results Global separation performance for the voice using competitive method (Hsu), REPET (Rafii) and the ideal binary mask (Ideal) Zafar RAFII, Spring 22 4

42 Evaluation Potential enhancements Separation performance for the voice at voice-to-music mixing ratio of db using REPET and successive enhancements Zafar RAFII, Spring 22 42

43 Evaluation Conclusions REPET can compete with recent (more complex) state-of-the-art music/voice separation methods There is room for improvement: optimal period, optimal tolerance, indices of the vocal frames Average computation time:.26 second for second of mixture (REPET can work in real-time!) Zafar RAFII, Spring 22 43

44 Audio examples REPET vs. Ozerov (accompaniment modeling) Music estimate (Ozerov) Voice estimate (Ozerov) The Prodigy - Breathe Music estimate (REPET) Voice estimate (REPET) Zafar RAFII, Spring 22 44

45 Audio examples REPET vs. Virtanen (NMF + pitch-based) Music estimate (Virtanen) Voice estimate (Virtanen) Unknown Music estimate (REPET) Voice estimate (REPET) Zafar RAFII, Spring 22 4

46 Audio examples REPET vs. FitzGerald (Multi-median-based) Music estimate (FitzGerald) Voice estimate (FitzGerald) Wham! - Freedom Music estimate (REPET) Voice estimate (REPET) Zafar RAFII, Spring 22 46

47 Audio examples REPET (more examples ) RJD2 - Ghostwriter Background estimate Foreground estimate Rebecca Black - Friday Background estimate Foreground estimate Zafar RAFII, Spring 22 47

48 frequency Future REPET is very effective on short excerpts with a relatively stable repeating background -2 seconds similar repetitions fixed period rate Underlying Repeating Structure p 2p 3p 4p p 6p 7p 8p 9p time Zafar RAFII, Spring 22 48

49 frequency Future REPET is more likely to show limitations with full-track musical pieces Varying repeating background (e.g. verse/chorus) Varying period rate (i.e. varying tempo) Underlying Repeating Structure p 2p 3p 4p p 2 2p 2 3p 2 time Zafar RAFII, Spring 22 49

frequency Future REPET for varying repeating structure! [Liutkus, Rafii, Badeau, Pardo, Richard, 22]. Identify local periods using a beat spectrogram 2.

50 frequency Future REPET for varying repeating structure! [Liutkus, Rafii, Badeau, Pardo, Richard, 22]. Identify local periods using a beat spectrogram 2. Model local models using a median filtering 3. Extract the repeating structure using a t-f mask Underlying Repeating Structure p 2p 3p 4p p 2 2p 2 3p 2 time Zafar RAFII, Spring 22

Step 2 Step Step 3 2 3 4 6 Future Mixture Signal x Mixture Spectrogram V Beat Spectrogram B.8.6.4.2 -.2 -.4 -.6 -.8 -.. 2 2. 3 3. 4 4.

51 Step 2 Step Step Future Mixture Signal x Mixture Spectrogram V Beat Spectrogram B V i-p i i i+p i V Filtered Spectrogram S min i Median i-p i 2 i i+p 3 i Filtered Spectrogram S Repeating Spectrogram W Time-Frequency Mask M Zafar RAFII, Spring 22 i p i

52 Conclusions REpeating Pattern Extraction Technique. Identify the repeating period 2. Model the repeating segment 3. Extract the repeating structure Simple music/voice separation method Can be applied for music/voice separation Can compete with state-of-the-art methods Still room for improvement Zafar RAFII, Spring 22 2

53 Thank you! Zafar RAFII, Spring 22 3

54 References M. Piccardi, Background Subtraction Techniques: a Review, IEEE International Conference on Systems, Man and Cybernetics, The Hague, Netherlands, October -3, 24. A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs, IEEE Transactions on Audio, Speech, and Language Processing, vol., no., pp , July 27. T. Virtanen, A. Mesaros, and M. Ryynänen, Combining Pitch-based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music, ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition, Brisbane, Australia, pp. 7-2, September 2, 28. C.-L. Hsu and J.S. R. Jang, On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-K Dataset, IEEE Transactions on Audio, Speech, and Language Processing, vol. 8, no. 2, pp. 3-39, February 2. D. FitzGerald and M. Gainza, Single Channel Vocal Separation using Median Filtering and Factorisation Techniques, ISAST Transactions on Electronic and Signal Processing, vol. 4, no., pp , 2. Z. Rafii and B. Pardo, A Simple Music/Voice Separation Method based on the Extraction of the Underlying Repeating Structure, in IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, May 22-27, 2. A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, and G. Richard, Adaptive Filtering for Music/Voice Separation exploiting the Repeating Musical Structure, in IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, March 2-3, 22. Zafar RAFII, Spring 22 4

Pitch Estimation of Singing Voice From Monaural Popular Music Recordings

Pitch Estimation of Singing Voice From Monaural Popular Music Recordings Kwan Kim, Jun Hee Lee New York University author names in alphabetical order Abstract A singing voice separation system is a hard