Monophony/Polyphony Classification System using Fourier of Fourier Transform

Size: px

Start display at page:

Download "Monophony/Polyphony Classification System using Fourier of Fourier Transform"

Tracey Richard
6 years ago
Views:

1 International Journal of Electronics Engineering, 2 (2), 2010, pp Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye 3 1 Department of Electronics and Communication Engineering, Manoharbhai Patel Institute of Engineering and Technology, Gondia, Maharashtra, India, kalyaniakant@gmail.com. 2 Department of Electronics Engineering, Shri Ramdeobaba Kamla Nehru Engineering College, Nagpur, Maharashtra, India, panderaj@yahoo.com. 3 Department of Electronics Engineering, Jhulelal Institute of Technology, Nagpur, Maharashtra, India, shyam_limaye@hotmail.com. Abstract: In this paper we have proposed a method of classifying monophonic signals from polyphonic ones using Fourier of Fourier Transform (FFT2). Pitch estimation for monophonic signals is much simpler than polyphonic signals. Also prior knowledge of number of notes played (in case of polyphony) facilitates multi pitch estimation. One may use different methods for estimation of pitch in monophonic and polyphonic context. Hence identifying signal as monophonic or polyphonic becomes essential. Investigation of harmonic pattern of the sound in frequency domain gives us fundamental frequency (pitch). The periodicity of the Fourier transform is detected by again taking its Fourier transform to obtain the Fourier of Fourier transform [7] (FFT2). Classification is based on the fact that music signals are harmonic. For monophonic signals, we get series of peaks in FFT2 domain at near bin difference related to pitch of single note. Whereas for polyphonic signals this regularity will be disturbed, as spectrum in FFT2 will contain multiple series of peaks corresponding to multiple notes. We have tested our method on the database available at [15]. Keywords: Monophony, Polyphony, Fourier of Fourier Transform, Pitch. INTRODUCTION Many methods have been proposed for estimation of pitch in literature [1]. In case of monophony, the pitch is relatively easy to determine than polyphony. The problem of pitch estimation of monophonic signals is said to be solved, whereas multi-pitch estimation is still a challenging issue. Few methods of monophonic pitch estimation are, time domain methods: [2], [3], and [4], frequency domain methods: [5], [6], and [7]. Few methods of pitch estimation in polyphonic context are [8], [9], and [10]. In [11], monophony/polyphony classification is done based on a confidence indicator used by de Cheveign e [12]. Short term mean and variance of this indicator was calculated and bivariate repartition of these two parameters was modeled with Weibull bivariate distributions for each class. The classification was made by computing the likelihood over one second for each class and taking the best one. The problem of singing voice detection in monophonic and polyphonic contexts is addressed in [13] where, again the method by by de Cheveign e [12] is used for signal classification as mono/polyphony. Our method is based on the fact that music signals are harmonic. The harmonicity was detected in FFT2 [7] domain. For monophonic signals all peaks in FFT2 spectrum will observe harmonic relation. For polyphonic signals the FFT2 spectrum will be mixture of multiple harmonic peaks corresponding to multiple notes. Hence all peaks will not follow harmonic relation. Knowing if all peaks are harmonically related, signal is classified as mono/ polyphonic. Our main objective is singing voice detection from mono recordings in view of query by humming applications. This method is part of main objective. This paper is organized as follows. Section 1 presents the details of Fourier of Fourier Transform. Proposed method is explained in Section 2. Result and conclusion are given in Sections 3 and 4 respectively. 1. FOURIER OF FOURIER TRANSFORM In our analysis we have used two Fourier transforms in sequence referred as Fourier of Fourier Transform (FFT2). Our method works very well in the case harmonic sounds, i.e. sounds rich in harmonics. It is not suited for pure sinusoids. Fourier transform, FT (first Fourier transform of the signal) of a typical musical sound has a series of peaks in its magnitude spectrum corresponding to the harmonics of the sound, at frequencies close to multiples of the fundamental frequency F. The peak showing fundamental frequency may not always be dominant. Hence single Fourier transform is inefficient to identify correct peak. Fourier of Fourier Transform is of great interest in locating this peak, which helps to overcome the possibility of octave error. To find out Fourier of Fourier Transform,

2 300 International Journal of Electronics Engineering we compute magnitude spectrum of the Fourier transform of singing voice. Magnitude spectrum of the Fourier transform of the above magnitude spectrum is then computed. Note that this transform is not the same as the well-known Cepstrum, which is the (inverse) Fourier transform of the logarithm of the spectrum resulting from the Fourier transform. Figure 1 shows the FT of piano C# of 5th octave. This FT has a series of uniformly spaced peaks as shown in Fig. 1, corresponding to the harmonics of fundamental frequency. Fig. 1: Fourier Transform of Piano C# of 5th Octave We can clearly see that, peak corresponding to fundamental frequency is not dominant. If fundamental frequency is F, the distance between two consecutive peaks corresponds to a period of 1 bins where: 1 = F N 1... (1) N1: Size of the first Fourier transform. : Sampling frequency. The first peak is at bin 0 and it corresponds to the DC level. The difference between second peak (shown by an arrow in Fig. 1) and the first peak is 1 bins. Figure 2 shows the spectrum of Fourier of Fourier Transform of piano C# of 5th octave. In this spectrum of FFT2, there are series of peaks. Here also, the first peak is at bin 0 and it corresponds to the DC level. The second peak is shown by an arrow in Fig. 2. The distance between two consecutive peaks corresponds to a period of 2 bins where: 2 = N 2 1 N 2 : Size of the second Fourier transform. From Eqs (1) and (2), we get 2 = N2 ( N1 ) F... (2)... (3) Fig. 2: Fourier of Fourier Transform of Piano C# of 5th Octave If size of first and second Fourier transform is same (N 2 = N 1 ), Fundamental frequency F is given by, F = Advantage of FFT2 Over FT... (4) The peaks in FFT2 are more widely spaced as illustrated in the Table 1. Here 12 notes in the 4th octave are analyzed with sampling frequency of Hz and FFT size as 4096 and the bin index numbers in FT and FFT2 algorithms are tabulated. (Note that due to slight mistuning of the Piano, A is having a frequency of Hz rather than 440 Hz). Frequency of note is found by applying parabolic interpolation [14] to the peak found in FFT2.

3 Monophony/Polyphony Classification System using Fourier of Fourier Transform 301 Table 1 Indices of Harmonics in Terms of Bins in FT and FFT2 for Notes in 4th Octave Musical Index Index in Frequency MIDI note note in FT FFT2 of note number C Hz 60 C # Hz 61 D Hz 62 E b Hz 63 E Hz 64 F Hz 65 F # Hz 66 G Hz 67 A b Hz 68 A Hz 69 B b Hz 70 B Hz 71 We observe from above table that, in FT there is only one or two bins difference for a semitone, while in FFT2 there is five to nine bins difference. Also, as we move to the lower octaves, index in FT goes on reducing while index in FFT2 goes on increasing. For some of the two consecutive semitones in third octave, the indices in FT are same but in actual their frequencies are different. Estimation of fundamental frequency without parabolic interpolation would give same value for these semitones. Hence it is parabolic interpolation which plays important role in finding correct frequency of such semitones. Another feature of FFT2 is its ability to detect peaks of harmonics corresponding to multiple pitches. 1.2 Ability of FFT2 to Detect Multiple Pitches In FFT2 domain the spectral peaks are not as closely placed as in FT domain, so it becomes easier for peak detector to locate the peaks without any ambiguity. Figure 3 shows the FFT2 spectrum when A flat and C sharp of 4th octave played together. This ability of FFT2 is of great interest in music segregation in polyphonic environment. 2. PROPOSED METHOD Step 1: Signal of frame size N is selected, FFT2 of this frame is computed. Size of first and second FT was chosen to be 2 N to improve frequency resolution. In our case, N = Step 2: Bin numbers of all peaks in FFT2 spectrum from 0 to N are stored in a vector V. V = {V 1, V 2.., V n Step 3: Bin number of maximum amplitude in the FFT2 spectrum is detected. Let s denote this by K.(If bin0 is at 1, as in case of matlab, K should be considered K 1). Step 4: If singing voice lies in the frequency range from f1 to f2, FFT2 bin numbers of maximum amplitude in the spectrum will be from /f2 to /f1. All the bin numbers in this range from vector V are stored in vector X. Let X = {X 1, X 2.., X m Step 5: From X, those bin numbers whose peak values are less than 30% of peak value at K are rejected. Let the remaining bin numbers are Y = {Y 1, Y 2.., Y i Step 6: Now, it is checked whether bin numbers + K or + K 1 in case of matlab for (1 j i) fall in the vicinity of V j 5 to V j + 5 for (1 j n). If this happens, then the signal is monophonic else polyphonic. Above condition is critical for monophonic signals, so probability of misclassifying monophonic signals is more than polyphonic ones. So, we have tested our method for large database of monophonic signals at [15]. 3. RESULTS Algorithm is explained using following examples. For the frame in Fig. 4, K = 195, K 1 = 194. V = {1, 49, 97, 145, 195, 245, 292, 340, 391, 441, 487, 535, 586, 636, 682, 730, 781, 831, 877, 925, 976, 1027, 1073, 1119, 1171, 1222, 1269, 1314, 1366, 1418, 1464, 1508, 1559, 1616, 1661, 1698, 1735, 1768, 1800, 1832, 1907, 1955, 2006 We considered f1 = 100 Hz, f2 = 800 Hz. If = Hz, vector X in Step 4 will be the bin numbers from 55 to 441. So, X in this example is {97, 145, 195, 245, 292, 340, 391, 441 Y = {145, 195, 245, 340, 391, 441, Now Step 6 is performed. Fig. 4: FFT2 Spectrum of one Frame of Note of Frequency Hz Fig. 3: FFT2 Spectrum when A Flat and C Sharp of 4th Octave Played Together Condition in Step 6 is satisfied, hence signal is monophonic.

4 302 International Journal of Electronics Engineering Table 2 Illustration of Step 6 for Monophonic Signal + K 1 If fall in the vicinity of V j 5 to V j + 5 for (1 j n) Yes (element 340 in V) Yes (element 391 in V) Yes (element 441 in V) Yes (element 535 in V) Yes (element 586 in V) Yes (element 636 in V) For the frame in Fig. 5, K = 330, K 1 = 329. V = {1, 44, 83, 126, 202, 246, 285, 330, 380, 421, 468, 534, 579, 617, 663, 721, 762, 804, 847, 882, 925, 1005, 1052, 1096, 1139, 1179, 1212, 1253, 1294, 1334, 1379, 1418, 1455, 1495, 1536, 1579, 1620, 1662, 1700, 1726, 1765, 1808, 1858, 1894, 1926, 1963, 2002, 2038 We considered f1 = 100 Hz, f2 = 800 Hz. If = Hz, vector X in Step 4 will be the bin numbers from 55 to 441. So, X in this example is (83, 126, 202, 246, 285, 330, 380, 421 Y = {83, 126, 202, 246, 285, 330, 380 Now Step 6 is performed. Fig. 5: FFT2 Spectrum of one Frame of Polyphonic Signals Table 3 Illustration of Step 6 for Polyphonic Signal + K 1 If fall in the vicinity of V j 5 to V j + 5 for (1 j n) No No Yes (element 534 in V) Yes (element 579 in V) Yes (element 617 in V) Yes (element 663 in V) No Condition in step 6 is not satisfied, hence signal is polyphonic. Accuracy of our algorithm is tested using global error rate: Error = Number of misclassified seconds / Total number of seconds. We observed that error reduces for the frames, whose signal amplitude is more. Signal amplitude in a frame is calculated by adding modulus of each sample value in a frame. We run the algorithm for those frames whose amplitude is more than threshold. If threshold is set at larger value, error reduces. In the following table, Threshold/ Maximum amplitude of signal = 0 means threshold = 0, hence algorithm will be run for entire signal. This effect is shown in Table 4. All the files are available at [15]. Table 4 % Error Name of file Threshold/Maximum % error amplitude of signal AltoFlute_ff_C4B BassFlute_pp_C4B Bassoon_pp_C4B EbClar_pp_C4B Flute_novib_pp_B3B Horn_pp_C4B TenorTrombone_pp_C4B CONCLUSION AND FUTURE WORK Real world signals are noisy. Our algorithm may fail for noisy signals. So, signal should be band pass filtered ( Hz) prior to the application of this algorithm to reject peaks corresponding to noise. This algorithm will be merged with our main goal: pitch tracking of singing voice in polyphonic context. Monophonic pitch tracking is simple and requires less time. Once the signal is classified at each frame, different algorithms will be run for pitch tracking for each class. REFERENCES [1] Zhenyu Zhao, Lyndon J. Brown, Musical Pitch Tracking using Internal Model Control Based Frequency Cancellation, 42nd IEEE Conference on Decision and Control, 5, December 2003, pp [2] L.R. Rabiner, et.al. A Comparative Performance Study of Several Pitch Detection Algorithms, IEEE Trans. ASSP, 24 (5), pp , October 1976.

5 Monophony/Polyphony Classification System using Fourier of Fourier Transform 303 [3] J.C. Brown and M.S. Puckette, Calculation of a Narrowed Autocorrelation Function, J. Acoust. Soc. Am., 85 (4), pp , April [4] J.C. Brown and B. Zhang, Musical Frequency Tracking using the Methods of Conventional and Narrowed Autocorrelation, J. Acoust. Soc. Am., 89 (5), pp , May [5] M. Piszczalski and B.A. Galler, Predicting Musical Pitch from Component Frequency Ratios, J. Acoust. Soc. Am., 66 (3), pp , September, [6] J.C Brown, Musical Fundamental Frequency Tracking using a Pattern Recognition Method, J. Acoust. Soc. Am., 92 (3), pp , September [7] Sylvain Marchand, An Efficient Pitch-tracking Algorithm using a Combination of Fourier Transforms, Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-01), Limerick, Ireland, December 6 8, [8] Walmsley P.J., Godsill S.J., Rayner P.J.W., Polyphonic Pitch Tracking using Joint Bayesian Estimation of Multiple Frame Parameters Department of Engineering, Cambridge University, Proc IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltc, New York, Oct , 1999, pp [9] Klapuri A.P., Multiple Fundamental Frequency Estimation Based on Harmonicity and Spectral Smoothness, IEEE Transactions on Speech and Audio Processing, 11 (6), 2003, pp [10] Chunghsin Yeh, Robel A., Rodet X., Multiple Fundamental Frequency Estimation of Polyphonic Music Signals, IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP '05). 3, pp. iii/225 iii/228. [11] Lachambre H., Andre-Obrecht R., Pinquier J., Monophony vs Polyphony: A New Method Based on Weibull Bivariate Models, Content-Based Multimedia Indexing, CBMI '09, pp [12] A. de Cheveign e and H. Kawahara. Yin, A Fundamental Frequency Estimator for Speech and Music. Journal of the Acoustical Society of America, 111 (4), , April [13] H el`ene Lachambre, R egine Andr e-obrecht, Julien Pinquier, Singing Voice Detection in Monophonic and Polyphonic Contexts, 17th European Signal Processing Conference (EUSIPCO 2009). [14] J.O. Smith and X. Serra, PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation, Proceedings of the 1987 International Computer Music Conference, International Computer Music Association, San Francisco, 1987, pp [15]

Transcription of Piano Music

Transcription of Piano Music Rudolf BRISUDA Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia xbrisuda@is.stuba.sk