Time- frequency Masking

Size: px

Start display at page:

Download "Time- frequency Masking"

Melina Wiggins
6 years ago
Views:

1 Time- Masking EECS 352: Machine Percep=on of Music & Audio Zafar Rafii, Winter 214 1

2 STFT The Short- Time Fourier Transform (STFT) is a succession of local Fourier Transforms (FT) Time signal Real spectrogram Imaginary spectrogram STFT j* window i Zafar Rafii, Winter 214 2

3 STFT If we used a window of N samples, the FT has N values, from to N- 1; e.g., if N = 8 Time signal Real spectrum Imaginary spectrum FT j* window i N values N values j* Zafar Rafii, Winter 214 3

4 STFT Frequency index is the DC component; it is always real (it is the sum of the =me values!) Time signal Real spectrum Imaginary spectrum FT j* window i j* Zafar Rafii, Winter 214 4

5 STFT Frequency indices from 1 to floor(n/2) are the unique complex values (a j*b) Time signal Real spectrum Imaginary spectrum FT j* window i j* Zafar Rafii, Winter 214 5

6 STFT Frequency indices from floor(n/2) to N- 1 are the mirrored complex conjugates (a - j*b) Time signal Real spectrum Imaginary spectrum FT j* window i j* Zafar Rafii, Winter 214 6

7 STFT If N is even, there is a pivot component at index N/2; it is always real! Time signal Real spectrum Imaginary spectrum FT j* window i j* Zafar Rafii, Winter 214 7

8 STFT Summary of the indices and values in the STFT (in colors!) N values = to N- 1 Frequency = DC component (always real) Real spectrogram j* Imaginary spectrogram Frequency 1 to floor(n/2) = unique complex values Frequency N/2 = pivot component (always real) Frequency floor(n/2) to N- 1 = mirrored complex conjugates Zafar Rafii, Winter 214 8

9 Spectrogram The (magnitude) spectrogram is the magnitude (absolute value) of the STFT Real spectrogram Imaginary spectrogram Magnitude spectrogram j* abs Zafar Rafii, Winter 214 9

10 Spectrogram For a complex number aj b, the absolute value is aj b = a 2 b 2 Real spectrum Imaginary spectrum Magnitude spectrum j* abs j* = Zafar Rafii, Winter 214 1

11 Spectrogram All the N values ( indices from to N- 1) are real and posihve (abs!) Real spectrum Imaginary spectrum Magnitude spectrum j* abs N values j* = Zafar Rafii, Winter

12 Spectrogram Frequency indices from to floor(n/2) are the unique values (with DC and pivot) Real spectrum Imaginary spectrum Magnitude spectrum j* abs j* = Zafar Rafii, Winter

13 Spectrogram Frequency indices from floor(n/2)1 to N- 1 are the mirrored values Real spectrum Imaginary spectrum Magnitude spectrum j* abs j* = Zafar Rafii, Winter

14 Spectrogram Since they are redundant, we can discard the values from floor(n/2)1 to N- 1 Real spectrum Imaginary spectrum Magnitude spectrum j* abs floor(n/2)1 unique values j* = Zafar Rafii, Winter

15 Spectrogram The spectrogram has therefore floor(n/2)1 unique values (with DC and pivot) Real spectrogram Imaginary spectrogram Magnitude spectrogram j* abs Zafar Rafii, Winter

16 Spectrogram Why the magnitude spectrogram? Easy to visualize (compare with the STFT) Magnitude informa=on more important Human ear less sensi=ve to phase Magnitude spectrogram Time signal Zafar Rafii, Winter

17 Spectrogram When you display a spectrogram in Matlab imagesc: data is scaled to use the full colormap 1*log1(V): magnitude spectrogram in db set(gca, YDir, normal ): y- axis from boiom to top Magnitude spectrogram Time signal Zafar Rafii, Winter

18 Spectrogram The signal cannot be reconstructed from the spectrogram (phase informa=on is missing!) Magnitude spectrogram??? Imaginary spectrogram Real spectrogram??? istft Time signal Zafar Rafii, Winter

19 Time- Masking Suppose we have a mixture of two sources: a music signal and a voice signal Music signal Voice signal Mixture signal Music spectrogram Voice spectrogram Mixture spectrogram Zafar Rafii, Winter

20 Time- Masking We assume that the sources are sparse = most of the =me- bins have null Music signal Voice signal energy Mixture signal Music spectrogram Voice spectrogram Mixture spectrogram Zafar Rafii, Winter 214 2

21 Time- Masking We assume that the sources are sparse = most of the =me- bins have null Music signal Voice signal energy Mixture signal Mostly low energy bins Music spectrogram Mostly low energy bins Voice spectrogram Mixture spectrogram Zafar Rafii, Winter

22 Time- Masking We assume that the sources are disjoint = their =me- bins do not overlap Music signal Voice signal Mixture signal Music spectrogram Voice spectrogram Mixture spectrogram Zafar Rafii, Winter

23 Time- Masking We assume that the sources are disjoint = their =me- bins do not overlap Music signal Voice signal Mixture signal Music spectrogram Voice spectrogram Not a lot of overlapping Mixture spectrogram Zafar Rafii, Winter

24 Time- Masking Assuming sparseness and disjointness, we can discriminate the bins between mixed sources Music signal Voice signal Mixture signal Music spectrogram Voice spectrogram Mixture spectrogram Zafar Rafii, Winter

25 Time- Masking Assuming sparseness and disjointness, we can discriminate the bins between mixed sources Music signal Voice signal Mixture signal Music spectrogram Voice spectrogram Source 1 = bright Source 2 = dark Mixture spectrogram Zafar Rafii, Winter

26 Time- Masking Bins that are likely to belong to one source are assigned to 1, the rest to = binary masking! Music signal Voice signal Binary mask 1 Source of interest Music spectrogram Interfering source Voice spectrogram Mixture spectrogram Zafar Rafii, Winter

27 Time- Masking By mul=plying the binary mask to the mixture spectrogram, we can preview the es=mate Binary mask Mixture spectrogram Masked spectrogram 1.x Zafar Rafii, Winter

28 Time- Masking However, we cannot derive the es=mate itself because we cannot invert a spectrogram! Binary mask Mixture spectrogram Masked spectrogram 1.x Music estimate Zafar Rafii, Winter

29 Time- Masking We mirror the redundant frequencies from the unique frequencies (without DC and pivot) Binary mask 1 Binary Binary mask mask 1 Zafar Rafii, Winter

30 Time- Masking We then apply this full binary mask to the STFT using a element- wise mul=plica=on Binary mask Imaginary spectrogram 1.x Real spectrogram Binary Binary mask mask 1 Zafar Rafii, Winter 214 3

31 Time- Masking The es=mate signal can now be reconstructed via inverse STFT Binary mask Masked imaginary Masked real 1 istft Music estimate Zafar Rafii, Winter

32 Time- Masking Sources are not really sparse or disjoint in =me- in the mixture Music signal Voice signal Mixture signal Music spectrogram Voice spectrogram Mixture spectrogram Zafar Rafii, Winter

33 Time- Masking Bins that are likely to belong to one source are close to 1, the rest close to = sop masking! Music signal Voice signal Soft mask 1 Source of interest Music spectrogram Interfering source Voice spectrogram Mixture spectrogram Zafar Rafii, Winter

34 Time- Masking Let s listen to the results! Music signal Mixture signal Music estimate mix demix Zafar Rafii, Winter

35 Ques=on How can we efficiently model a binary/som =me- mask for source separa=on?... To be con=nued Mixture spectrogram Soft mask 1??? Zafar Rafii, Winter

Rhythm Analysis in Music

Rhythm Analysis in Music EECS 352: Machine Percep;on of Music & Audio Zafar Rafii, Winter 24 Some Defini;ons Rhythm movement marked by the regulated succession of strong and weak elements, or of opposite