Research on Extracting BPM Feature Values in Music Beat Tracking Algorithm

Size: px

Start display at page:

Download "Research on Extracting BPM Feature Values in Music Beat Tracking Algorithm"

Margery Gilbert
5 years ago
Views:

1 Research on Extracting BPM Feature Values in Music Beat Tracking Algorithm Yan Zhao * Hainan Tropical Ocean University, Sanya, China *Corresponding author( yanzhao16@163.com) Abstract With the rapid development of Internet and network technology, people can access to a large number of online music data, such as music acoustic signal, lyrics, music style or the classification of contents and other network users list and so on. Music information retrieval is an interdisciplinary research field, which involves musicology, psychology, music academic research, signal processing, machine learning and so on. Beat tracking is one of the basic problems in music information retrieval. The process of people who spontaneously follow the music to stamp or nod is called the beat tracking, and the beat tracking algorithm of the computer is the simulation of the human perception process. In this paper, based on the study of research results of beat tracking, combining with the basic theory of music and audio signal technology, a beat tracking algorithm based on maximum and minimum distance algorithm is proposed. The music signal is carried out with short-time Fourier transform, and then the spectrum is obtained. According to the perceptual properties of the human auditory system, the logarithmic processing of spectral amplitude is conducted, and through the half wave rectifier, the intensity curve and the phase information of peak value are output. BPM feature values are extracted based on auto-correlation of endpoint intensity curves. Keywords: Beat tracking algorithm; music signal; BPM eigenvalue 1. INTRODUCTION Music is an international language that people all over the world can participate and enjoy. In different cultures, the form of music is different, but the essence of music is independent of cultural factors. Music almost appears at the same time with the language, and even earlier. Language focuses on rational communication, while music is more about expressing emotions. Music is the combination of science and art. From the connotation and emotion, music is an art form with rich connotation. It is the call from the heart and the sublimation of human emotion. Of course, from its scientific nature, it can be said that music is just a kind of sound. Human beings are born with the ability to understand music, and even people who do not understand the theory of music can enjoy music. With the application of signal processing method in music signal, as well as the development of computer intelligence, the content of music signal processing is constantly enriched, and an extremely challenging field - music information retrieval comes into being. The so-called music information retrieval is to use the computer to simulate and realize the human auditory system s perception and understanding for music, and beat tracking is the basic part. Beat tracking is the detection of a "pulse" or a significant periodic musical event. In the aspect of music information retrieval, beat tracking is often used in chord recognition, songs detection, music segmentation, transcription and so on. In an impromptu playing or singing, the suitable accompaniment music is supposed to be shown. Some chord recognition algorithms also take beat tracking as the basis. The music fountain set in some squares can bring tourists with the visual and hearing enjoy at the same time. In large parties, dazzling lights changes the color and lightness with the rhythm of music. Beat analysis of the received music signal, robots with dancing actions, some professional software (such as Sonic Foundry Acid), DJ console, and even songs similarity detection all applied the beat tracking algorithm. It is seen that, music beat tracking has broad prospects for development, but due to the complexity and diversity of the music itself, if it is expected to make the computer cognitive exactly match the human auditory system, the study on BPM algorithm feature values in the music beat tracking algorithm has important significance. 2. RESEARCH ON THE EXTRACTION OF BPM FEATURE VALUES IN MUSIC BEAT TRACKING ALGORITHM At the beat of music signals, it is often accompanied by changes in pitch or intensity. As a result, beats are hidden in the energy mutation position of music signal waveform. As shown in Figure 1, the positions that the beat points appear are mostly the same as those of the music signal peak. 209

2 Figure 1. Music signal waveform (white mark represents the beat point) For people, beat tracking refers to the behaviour of people who spontaneously nod their heads by following the melody when they listen to the music; for the computer, the beat tracking is to extract the beat by imitating the human perception. Intuitively, the beat sequence can be regarded as the same time interval perception sequence that corresponds to the pulse sequence produced by nodding or clapping of the person who listens to music. The beat tracking algorithm includes the following two aspects: the first one is the detection of the starting point of each musical event in the music signal, that is, the endpoint detection; the second one is the detection of the potential period of the music signal, that is, the calculation of the music speed. Most of the beat tracking algorithm frameworks are shown in Figure 2. After extracting the music signal characteristics, the cycle of analysis signal (Music speed) and phase information (beat point location), feature extraction and cycle calculation are the most important parts. The features can be the endpoint information, the chord change, the energy envelope or the spectral features and so on. The selection of features depends mainly on the algorithm of music signal cycle and phase. For the estimation of cycle, autocorrelation, comb filter, histogram and other algorithms are widely used. For the extraction of the beat sequence, usually based on endpoint detection, the peak value of the endpoint intensity curve is selected, and finally the position of the specific beat point is obtained. Music signal Feature extraction Cycle calculation Phase calculation Beat sequence Figure 2. Beat tracking algorithm diagram The proposed beat tracking algorithm based on the maximum and minimum distance method, as shown in Figure 3, the core part is the determination of the beat starting point, BPM (Beat Per Minute) feature values extraction and effective peak extraction these three parts. It mainly uses energy spectrum analysis, short-time Fourier transform, cycle signal autocorrelation, maximum and minimum distance clustering and a variety of signal processing and pattern recognition methods, specifically as follows: (1) Input the music signal and conduct the pretreatment. If the sampling frequency is not equal to the preset frequency, carry out re-sampling, and change the signal into a single channel and make normalized processing. (2) Make time domain analysis of the music signal and determine the starting point of beats. (3) Conduct frequency domain analysis of the music signal, and output the endpoint intensity curve t by the endpoint detection function. (4) Make use of the endpoint intensity curve and the delay characteristics of the signal to extract the BPM feature values. (5) According to the relation between the music speed and the beat, calculate the peak value based on the clustering by the maximum and minimum distance method. (6) Output the music signal with a beat sequence. 210

3 Music signal Pretreatment Start rhythm detection Endpoint detection to generate endpoint strength curves t Extract BPM eigenvalues Peak clustering based on maximum and small distance method Output the beat value and the break point Output music signal Figure 3. Flow chart of beat tracking algorithm 2.1 Pretreatment The Nyquist sampling theorem pointed out that, in the process of analog / digital signal conversion, when the sampling frequency is more than 2 times the highest frequency of the signal, digital signal after sampling completely reserves the information in the original signal. At present, in order to ensure the quality of the music signal and save more original information, the sampling frequency of most music signals is 44100Hz. The beat information of the music signal mainly exists in the low frequency. Therefore, before the beat extraction, the music signal is resampled and the frequency is reduced to 22050Hz. All of the music signals are converted into mono signals. Subsequently, the signal is normalized according to the formula (1), and the signal amplitude is normalized to the range of [-1,1]. x( n) - x y n D D D (1) min ( ) max - min min xmax -xmin In the above formula, xn ( ) indicates the input signal, ( ) refer to the maximum and minimum of xn ( ), respectively, and yn represents the output signal, minimum in the normalized range, respectively. In the experiment, we set max 1 signal described in the following part refers to the music signal after being normalized. x max and x min Dmax and D min suggest the maximum and D, D min 1. The music 2.2 Beat starting point detection For the beat starting point of the music signal, the energy will usually have a significant change. Therefore, to find out the energy mutation point is the reliable basis to determine the starting point of the beat. According to the starting point of the beat and a number of beat values, we can get all the beat point positions. As a result, it is important to determine the starting point of the beat. Since the BPM value of the music signal is usually between 60 and 240, that is, the time interval of the beat is 0.25s ~ 1s, only by a fragment of 1s can a beat point be detected. In this paper, all the test signals are the intercepted music signals. The signal is not stable at the beginning of the 1s, so in the experiment, we select 1s ~ 2s for the detection fragment. Due to the characteristics of the music signal itself, the characteristic can be regarded as a quasi-steady state in a short time range from 10 to 30ms. That is to say, it has short-time feature. In consequence, the shorttime energy method can be used to determine the starting point of the beat. The energy spectrum of music fragments is analysed and measured. In the experiment, the frame length is set to be n 12ms, frame shift length m n 4ms and there is 66% overlap of adjacent frames. Figure 4 shows the energy spectrum of the music segment. In the curve, the most obvious point of the mutation is the starting point B 0 of the beat. 211

4 (a) Time domain waveform (b) Energy spectrum Figure 4.Time domain and energy spectrum of music audio of 1-2s 2.3 Endpoint detection Before the endpoint detection algorithm is introduced, the concept related to endpoint in the music signal is introduced. As shown in Figure 5, the simplest note is taken as an example. What is above is the waveform of the note, and what is below is the definition of its different stages. Ascending region: it is a region of rapid increase in signal energy. Endpoint: it is the beginning moment of the rise region, which can also be described as the moment when the signal energy begins to increase. Transient area: it cannot be described with precise time, which refers to the region of rapid changes in signal energy, but does not include the area of signal attenuation. Attenuation area: the range where the signal energy value decreases gradually from the maximum value. (a) Note waveform 212

5 (b) Different stages define the diagram Figure 5. Waveform and endpoint Endpoint detection is actually the detection of the starting point of all music events in the music signal. It is the basis of a deep analysis of music information (such as beat tracking, chord recognition and so on), which plays an important role in music signal processing. Endpoint detection function is a function based on one or more characteristics mutations in the detection music signal. It can be regarded as the intermediary of music signal and the music characteristic, and it generally consists of three parts: time-frequency transform, detection function generation and peak value detection. Its significance lies in that the position of the starting point is determined by the change of some parameters that occur when the signal is excited (or transient). It is quite vital in the music understanding. At the same time, another purpose of endpoint detection is to reduce the number of sampling points and reduce the computational complexity. The output of the endpoint detection algorithm is a frequency curve with low sampling frequency and peak representing the energy mutation. The real music signal is not as simple in structure as in Figure 5, and it may contain a variety of sounds. Therefore, in the actual endpoint detection, it is usually necessary to make necessary processing of the input signal, to get the corresponding endpoint intensity curve, and then according to the peak point value to determine the endpoint of the original music signal, as shown in Figure 6. Figure 6. Schematic diagram of endpoint detection At present, many kinds of endpoint detection algorithms have been proposed. Taking note as an example, when playing a note, the end point is often accompanied by a sudden increase in signal energy, and through the changing moment contained in the signal amplitude envelope, the location of the endpoint can be determined. One of the difficulties of endpoint detection is that the endpoint does not all follow the sudden changes. For 213

6 instance, the traditional string, there is often a weak or not obvious transition between notes. At present, the main idea of endpoint detection is to detect the abrupt increase of signal energy spectrum. The endpoint detection algorithm in this paper also uses the short-time Fourier transform. To conduct a comprehensive analysis of the frequency spectrum of the signal, the phase of music, chord and harmony can achieve accurate detection of most types of music signal endpoints. Firstly, the spectrum X of the music signal is obtained by the short-time Fourier transform: X X ( k, t), k 1,2,..., K t 1,2,..., T (2) kt In the formula, K indicates the sampling points of each frame; T is the signal frames; X ( k, t ) suggests the k-th sampling point for the t-th frame. The choice of window length greatly affects the results of endpoint detection. If the window length is too long, the peak value is not obvious; if the window length is too short, the amount of calculation increases, resulting in reduced efficiency of the algorithm. At present, most of the algorithms with good detection all choose 23ms as the frame length, so this paper also uses 23ms. After the frequency spectrum is obtained, the amplitude X of the spectrum is processed by logarithm, and Y log10(1 C X ) is obtained, which is called compressed spectrum where the constant C equals to 1000 [8]. The purpose of calculating the compressed spectrum is to adjust the dynamic range of the music signal, and enhance the weak transient resolution, especially the weak transient resolution of the high frequency region. At the same time, compared with the linear computation, the logarithm operation is more in line with the mathematical relationship between the subjective feeling of the loudness of the human voice. The sudden increase in the amplitude of the compressed spectrum Y is the sudden increase point of signal energy. By calculating the discrete derivative of the squeezing spectrum, we obtain the endpoint intensity curve t : K (3) t Y k Y t t+1, 0 k1 x, x 0 x 0 0, x 0 When the signal energy suddenly increases, the spectrum will appear broadband noise. This kind of noise is difficult to be detected in the low frequency part of the signal, and the beat information is mainly stored in the low frequency component of the music signal. 2.4 Extract BPM eigenvalue The autocorrelation function is an average measure of the signal in the time domain, which is used to describe the dependence relationship of the value of the signal at one time and another. The mathematical expression is: (5) (m) (m ) R k x x k m By analysing the expression, we can find that the essence of autocorrelation function is the average value of the signal xm ( ) and its time shift signal x( m k). The autocorrelation function mainly studies the synchronization and periodicity of the signal itself, which has the following properties: (1) if the signal xm ( ) is a periodic signal, the autocorrelation function is also a periodic signal, and its cycle is the same as xm ( ). (2) the autocorrelation function is an even function, namely R( k) R( k). (3) when k 0, the autocorrelation function achieves the maximum value. If the signal is determined, then the value is the signal energy; if it is the random signal, the value is the average power of the signal. Autocorrelation exists in any regular periodic structures. Music, as a highly structured form of expression, its periodicity is mainly reflected in the rhythm structure. By means of calculating the autocorrelation function of the music signal, we can determine the periodic characteristic of the non-typical periodic signal. The continuity of the beat is reflected the average speed of the music, which has the unit of BPM (the number of beats per minute). The average velocity of the music is extracted by using the endpoint intensity curve and the delay characteristic, that is, the BPM eigenvalue. (4) t 214

7 The music signal belongs to the non-stationary signal, so it is necessary to use the short-time autocorrelation function for the autocorrelation processing of the music signal. The block diagram is shown in Figure 7, and the mathematical expressions can be written as: Rk x( m) x( m k) hk ( n m) m (6) hk ( n) w( n) w( n k) In (6), wn ( ) indicates the window function, and n refers to that the window function participates at the n- th point. xn h n R k k n Delay Figure 7. The block diagram of short-time autocorrelation function Based on the auditory characteristics of human ears, for the melody with BPM=120, human auditory system is more likely to accept or like. According to this characteristic, this paper uses the perceptual weighting window to filter the original autocorrelation curve, filter out the peak value that is greatly different from the value, and select the peak value that is more in line with the human auditory system. The calculation speed cycle is shown in the following formula: TPS( ) W ( ) ( t) ( t ) In (7), W ( ) represents a Gauss weighting function: log2 1 0 W ( ) exp 2 (7) Here, refers to the periodic variable; 0 suggests the periodic deviation center of rhythm; determines the width of the weight curve. In the experiment, we set and 0.9. that makes TPS have the maximum value is the unit cycle. Due to the different perceptions of rhythm, people have a rapid and slow perception sense of rhythm of the same melody. According to the prosodic structure of the music segment, the fast rhythm is generally 2 or 3 times of the slow rhythm. Consider this phenomenon, we select variables multiplied by the unit cycle in 0.33,0.5,2.3, to improve the algorithm of music speed, as shown in (9): t TPS2 = TPS +0.5TPS TPS TPS 2 1 (9) TPS3 = TPS +0.33TPS TPS TPS 3 1 Taking into account the 2 or 3 times speed of the rhythm in the above formula, we use 1/2 or 1/3 of the rhythm as the adjacent measurement standards, to calculate the relative peak of the two estimates to get the relative weights. Because this algorithm intends to simulate the perception process of the human auditory system, but not the music theory research, for the speed of the music, we consider only 2 or 3 times cases. This assumption can cover most of the music genres. for obtaining the maximum value of TPS 2 TPS3 the desired music speed - BPM eigenvalue. (8) is 3. CONCLUSION In this paper, we study the BPM feature in the music beat tracking algorithm. First of all, through the analysis of the energy spectrum of the 1-2s segment of the music signal, we determine the starting point of the beats. Secondly, through the spectrum analysis and processing of the music signal, we get the endpoint intensity 215

8 curve and the phase information of the peak value. Then, according to the autocorrelation characteristics of the endpoint intensity curve and the general rules of the music rhythm, we extract the BPM feature values. The beat tracking algorithm put forward in this paper is almost suitable for any genre or form of music, and it has certain advantages in the overall accuracy and continuous correctness these two aspects. References Itohara, T., Otsuka, T., Mizumoto, T., Lim, A., Ogata, T., & Okuno, H. G. (2012). A multimodal tempo and beattracking system based on audiovisual information from live guitar performances. EURASIP Journal on Audio, Speech, and Music Processing, 2012(1), 6. Ludick, D. J., Tonder, J. V., & Jakobus, U. (2014). A hybrid tracking algorithm for characteristic mode analysis. International Conference on Electromagnetics in Advanced Applications, Burger, Martin, Markowich, Alexander, P., Pietschmann, & Jan-Frederik. (2014). Continuous limit of a crowd motion and herding model: analysis and numerical simulations. Kinetic & Related Models, 4(4), Li, H., & Wei, Y. (2012). Classification and rigidity of self-shrinkers in the mean curvature flow. Journal of the Mathematical Society of Japan, 66(3), Ohkita, M., Bando, Y., Nakamura, E., Itoyama, K., & Yoshii, K. (2017). Audio-Visual Beat Tracking Based on a State- Space Model for a Robot Dancer Performing with a Human Dancer. Journal of Robotics and Mechatronics, 29(1), 125. Krebs, F., Böck, S., Dorfer, M., & Widmer, G. (2016). Downbeat Tracking using Beat-Synchronous Features and Recurrent Neural Networks. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York, NY, USA, Srinivasamurthy, A., Holzapfel, A., Cemgil, A. T., & Serra, X. (2016, March). A generalized Bayesian model for tracking long metrical cycles in acoustic music signals. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, Shafiee, M., Feghhi, S. A. H., & Rahighi, J. (2016). Analysis of de-noising methods to improve the precision of the ILSF BPM electronic readout system. Journal of Instrumentation, 11(12),

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004