Signal Processing Algorithms for Music, Marine Mammals and Speech

Size: px

Start display at page:

Download "Signal Processing Algorithms for Music, Marine Mammals and Speech"

Eustace Farmer
6 years ago
Views:

1 for, for, University of Crete, Computer Science Dept., Multimedia Informatics Lab AUTH 2008 June 23rd

2 for,

3 Based on Dynamic Periodicity Warping for, In collaboration with: Andre Holzapfel It was presented at ICASSP 2008, Las Vegas : [1]

4 What is used for? for, Organize your huge collection of songs according to their rhythm. Help ethnomusicologists to categorize and reveal musical structure of field recordings from some country.

5 Approaches to the problem for, Beat spectra, cosine measure (J. Foote et al., 2002) [2] Tempo based spectra (G. Peeters, 2005) [3] Tactus based patterns (J. Paulus et al., 2002) [4] We suggest the use of continuous periodicity spectra and a warping strategy to cope with large variations in tempo.

6 Approaches to the problem for, Beat spectra, cosine measure (J. Foote et al., 2002) [2] Tempo based spectra (G. Peeters, 2005) [3] Tactus based patterns (J. Paulus et al., 2002) [4] We suggest the use of continuous periodicity spectra and a warping strategy to cope with large variations in tempo.

7 Periodicity Spectra for, Computation of onset strength signal, p(t) (D. Ellis, MIREX2006, beat tracking contest 1 ) Modeling of p(t) p(t) = N e i (t) δ(t kt ) k Ki i=1 Periodicity Spectra: N P(f ) = 1 T E i(f ) δ(f k T ) i=1 k Ki where f < 1000bpm (16.7Hz) 1 Beat Tracking Results

8 Periodicity Spectra for, Computation of onset strength signal, p(t) (D. Ellis, MIREX2006, beat tracking contest 1 ) Modeling of p(t) p(t) = N e i (t) δ(t kt ) k Ki i=1 Periodicity Spectra: N P(f ) = 1 T E i(f ) δ(f k T ) i=1 k Ki where f < 1000bpm (16.7Hz) 1 Beat Tracking Results

9 Periodicity Spectra for, Computation of onset strength signal, p(t) (D. Ellis, MIREX2006, beat tracking contest 1 ) Modeling of p(t) p(t) = N e i (t) δ(t kt ) k Ki i=1 Periodicity Spectra: N P(f ) = 1 T E i(f ) δ(f k T ) i=1 k Ki where f < 1000bpm (16.7Hz) 1 Beat Tracking Results

10 Example of periodicity spectra for, bpm bpm Two examples of periodicity spectra of Siganos dance: Upper panel is a faster example of that in the lower panel. Window length is 8s.

11 Rhythm similarity based on Dynamic Periodicity Warping (DPW) for, P 1 (f) P 2 (f) NORM SIM S ρ REFLINE DP w DP W PROJ Σ d DP W

12 Example of DPW computation for,

13 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

14 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

15 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

16 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

17 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

18 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

19 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

20 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

21 Databases and baseline Distances for, Databases: D1: 698 songs from eight classes of ballroom dances D2: 90 songs from six classes of Cretan dances Baseline Distances Cosine distance (inner product) Euclidean distance Cost of warping, (d Cost ) (J. Paulus et al., 2002)[4] Cosine distance after warping, d CosPost Our measure: d DPW

22 More on Cretan dances database (D2) for, Table: Tempi of D2 and Listeners accuracy Dance Tempo Range ( ) Listeners acc. (%) Kalamatianos Siganos Maleviziotis Pentozalis Sousta Chaniotis Mean 75.6

23 Results on D1: Ballroom dances for, Table: Classification Accuracies on D1 wknn knn Cosine 85.5 (k=7) 84.5 (k=3) Euclidean 83.8 (k=6) 82.7 (k=3) d Cost 72.4 (k=14) 70.7 (k=7) d CosPost 70.7 (k=32) 69.2 (k=17) d DPW 82.1 (k=11) 80.9 (k=20) 10 repetitions of 10-fold stratified cross-validation

24 Results on D2: Cretan dances for, Table: Classification Accuracies on D2 wknn knn Cosine 53.8 (k=1) 53.8 (k=1) Euclidean 48.9 (k=1) 48.8 (k=1) d Cost 51.8 (k=18) 48.5 (k=8) d CosPost 51.1 (k=19) 48.7 (k=12) d DPW 69.0 (k=4) 64.4 (k=5) 10 repetitions of 10-fold stratified cross-validation

25 for, detection using the Teager-Kaiser operator and Phase Spectra In collaboration with: Varvara Kandia Presented at: ECS 2008 (The Netherlands), 3rd Workshop on Detection and Classification of Mammals, Boston nd Workshop on Detection and Classification of Mammals, Monaco 2006 :[5][6][7]

26 Why to do it? for, Localization and tracking with passive acoustics Study animal behavior Abundance estimation Correlations with physiology (size of animals, sound production mechanism)

27 Examples of clicks from Sperm whales for, Regular clicks: (a) Amplitude Creak clicks: Amplitude Time in ms (a) Time in ms

28 Examples of clicks from Beaked whales for,

29 Approaches/Softwares for click detection for, Rainbow click (D. Gillespie, 1997)[8] Moby click (O. Jäke, 1996)[9] Ishmael (D. Mellinger, 2001)[10]

30 Teager-Kaiser energy operator[5][6] for, Definition for a discrete time signal Ψ[s(n)] = s 2 (n) s(n + 1)s(n 1) For a signal with 3 components: interference x[n], transient y[n], and noise u[n], so s[n] = x[n] + y[n] + u[n]: Ψ[s(n)] = Ψ[x(n)] + Ψ[y(n)] + Ψ[u(n)] + T [n] we may show that: Ψ[s(n)] Ψ[y(n)] + w(n)

31 Teager-Kaiser energy operator[5][6] for, Definition for a discrete time signal Ψ[s(n)] = s 2 (n) s(n + 1)s(n 1) For a signal with 3 components: interference x[n], transient y[n], and noise u[n], so s[n] = x[n] + y[n] + u[n]: Ψ[s(n)] = Ψ[x(n)] + Ψ[y(n)] + Ψ[u(n)] + T [n] we may show that: Ψ[s(n)] Ψ[y(n)] + w(n)

32 Synthetic example for, Amplitude (a) Amplitude Time (ms) (b) Time (ms)

33 Applied on clicks for, From Sperm whales, Regular clicks: (a) Raw file, (b) after TK Amplitude Amplitude Time (ms) (b) (a) Time (ms)

34 Applied on clicks for, From Sperm whales, Creak clicks: (a) Raw file, (b) after TK Amplitude Time (ms) (b) 1 Amplitude (a) Time (ms)

35 Comparison with Rainbow click for, Det. Score: = Correctly detected hand labeled clicks Total hand labeled clicks 100 Table: Percentage (%) of correctly identified clicks per file. Tolerance of 2ms. TK RB File name clicks score (%) clicks score (%) clicks F1 266 (0) F2 944 (549) F3 689 (414) F4 529 (242) F5 435 (155)

36 In terms of ROC curves for, Detection Rate (%) Approximate ROC Tolerance (ms) TK RB

37 Phase Spectrum[7] for, Group delay: or where: τ(ω) = dφ(ω) dω τ(ω) = X R(ω)Y R (ω) + X I (ω)y I (ω) X (ω) 2 X (ω) = F(x[n]) = X R (ω) + jx I (ω) Y (ω) = F(nx[n]) = Y R (ω) + jy I (ω)

38 Motivation for,

39 Motivation for,

40 Application on the Beaked whales example for,

41 Note: Triangles denote hand labels Zoom on in an area of clicks for, After applying an appropriate modulation and low-pass filtering to the original recordings.

42 Results on Beaked and Sperm Whales for, Raw data/with TK Species clicks Det (%) Corr (%) MAE (ms) Beaked Whales / / /0.9 Sperm Whales / / /0.97 Det = Number of clicks correctly detected Total 100 Corr = Total Deleted Inserted Total 100

43 A Mathematical Model for Accurate Measurement of Jitter for, In collaboration with: Miltiadis Vasilakis It was presented at MAVEBA 2007, Florence : [11]

44 Jitter for, Definition Jitter is defined as perturbations of the glottal source signal that occur during vowel phonation and affect the glottal pitch period.

45 Definitions for, Let u[n] be the pitch period sequence. Local Absolute 1 N 1 N 1 n=1 1 N u(n + 1) u(n) n=1 N u(n) N 1 1 u(n + 1) u(n) N 1 n=1 Relative average Perturbation 1 N 2 N 2 n=1 1 N 2u(n+1) u(n) u(n+2) 3 n=1 N u(n)

46 Definitions for, Let u[n] be the pitch period sequence. Local Absolute 1 N 1 N 1 n=1 1 N u(n + 1) u(n) n=1 N u(n) N 1 1 u(n + 1) u(n) N 1 n=1 Relative average Perturbation 1 N 2 N 2 n=1 1 N 2u(n+1) u(n) u(n+2) 3 n=1 N u(n)

47 Definitions for, Let u[n] be the pitch period sequence. Local Absolute 1 N 1 N 1 n=1 1 N u(n + 1) u(n) n=1 N u(n) N 1 1 u(n + 1) u(n) N 1 n=1 Relative average Perturbation 1 N 2 N 2 n=1 1 N 2u(n+1) u(n) u(n+2) 3 n=1 N u(n)

48 Our approach for, 1 P ε P + ε P ε P + ε amplitude time (samples)

49 In mathematical terms for, We model the glottal impulse train as: p[n] = + δ[n (2k)P] + + k= k= δ[n + ɛ (2k + 1)P] We may show that its power spectrum is then: P(ω) 2 = H(ɛ, ω) + S(ɛ, ω)

50 In mathematical terms for, We model the glottal impulse train as: p[n] = + δ[n (2k)P] + + k= k= δ[n + ɛ (2k + 1)P] We may show that its power spectrum is then: P(ω) 2 = H(ɛ, ω) + S(ɛ, ω)

51 Examples of power spectrum for, On synthetic glottal signal power (db) H(0, ω) S(0, ω) H(1, ω) S(1, ω) H(2, ω) S(2, ω) 40 radian frequency (ω)

52 Examples of power spectrum for, power (db) power (db) frequency (khz) harmonic & subharmonic parts of the power spectrum synthetic signal (fs = 48kHz, ε = 5): power spectrum of a single frame the circles indicate crossings between the harmonic and subharmonic parts frequency (khz) H(ε, ω) S(ε, ω) power (db) P(ω) 2 a closer look at the first crossing accepted crossing rejected crossings frequency (khz)

53 Experiments for, Goal: discriminate pathological from normal voices, based on Database: Massachusetts Eye and Ear Infirmary (MEEI) [12] Sustained vowels, 53 subjects with normal voice, 657 subjects with a wide variety of pathological conditions Jitter estimation methods: PRAAT2007 (P. Boersma and D. Weenink) [13] Multi-Dimensional Voice Program (MDVP), (Kay-Pentax elemetrics, 2007) [14] Our approach [11]

54 Experiments for, Goal: discriminate pathological from normal voices, based on Database: Massachusetts Eye and Ear Infirmary (MEEI) [12] Sustained vowels, 53 subjects with normal voice, 657 subjects with a wide variety of pathological conditions Jitter estimation methods: PRAAT2007 (P. Boersma and D. Weenink) [13] Multi-Dimensional Voice Program (MDVP), (Kay-Pentax elemetrics, 2007) [14] Our approach [11]

55 Results in ROC curves for, True Positive Rate MDVP Jita Proposed method, fixed frame, sequence average Proposed method, variable frame, sequence average Praat Jitter (local, absolute) False Positive Rate

56 for, A. Holzapfel and Y.. similarity of music based on dynamic periodicity warping. In IEEE ICASSP Jonathan Foote, Matthew D. Cooper, and Unjung Nam. Audio retrieval by rhythmic similarity. In Proc. of ISMIR rd International Conference on Information Retrieval, Geoffroy Peeters. Rhythm classification using spectral rhythm patterns. In Proc. of ISMIR th International Conference on Information Retrieval, pages , Jouni Paulus and A.P. Klapuri. the similarity of rhythmic patterns. In Proc. of ISMIR rd International Conference on Information Retrieval, V. Kandia and Y.. Detection of creak clicks of sperm whales in low SNR conditions. In CD Proc. IEEE Oceans, Brest, France, V. Kandia and Y.. Detection of sperm whale clicks based on the Teager-Kaiser energy operator. Applied Acoustics, 67(11-12): , V. Kandia and Y.. Detection of clicks based on group delay. Accepted in Canadian Acoustics, D. Gillespie.

57 for, An acoustic survey for sperm whales in the Southern Ocean sanctuary conducted from the R/V Aurora Australis. Rep. Int. Whal. Comm., 47: , O. Jäke. Acoustic Censusing of sperm whales at Kaikoura, New Zealand: An inexpensive method to count clicks and whales automatically. Master Thesis, University of Otago, Dunedin, New Zealand, D. K. Mellinger. Ishmael 1.0 Users Guide. NOAA, NOAA/PMEL/OERD, 2115 SE OSU Drive, Newport, OR , Technical Memorandum OAR PMEL-120. M. Vasilakis and Y.. A mathematical model for accurate measurement of. In MAVEBA 2007, Florence, Italy, Kay Elemetrics. Disordered Voice Database (Version 1.03), Paul Boersma and David Weenink. Praat: doing phonetics by computer (Version ) [Computer program], Kay Elemetrics. Multi-Dimensional Voice Program (MDVP) [Computer program], 2007.

58 for,

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004