Automatic Guitar Chord Recognition

Size: px
Start display at page:

Download "Automatic Guitar Chord Recognition"

Transcription

1 Registration number Automatic Guitar Chord Recognition Supervised by Professor Stephen Cox University of East Anglia Faculty of Science School of Computing Sciences

2 Abstract Chord recognition is a well explored area of music information retrievel, but guitar chord recognition remains a relatively unexplored area. The purpose of this project was to develop a system that could recognise chords played in real time on a guitar and use this to form the basis of a software package that could provide musicians with a teaching or transcription tool. Initial research was carried out on single note and two note chord recognition which achieved accuracies of 74% and 90% respectively. The final system for recognising chords containing an arbitrary number of notes used an implementation of Pitch Class Profiles. A comparison was made between a learnt Bayesian approach and a non-learnt Nearest Neighbour approach. For common chords, the learnt approach produced an accuracy of 99%, compared to 94% using the non-learnt approach. When extended to complex chords, the learnt approach produced an accuracy of 84% and 80% with the non-learnt approach. A real-time demonstration was then produced in MATLAB. Acknowledgements I would like to thank Stephen Cox for his support and guidance throughout this project. Thank you to mum and Hattie for proof-reading despite not understanding and a further thank you to Hattie for helping me through the tough stages of work.

3 Contents 1. Introduction Background and Motivations Aims Objectives Literature Review Motivations Background Music Theory General Music Theory Guitar Characteristics Approaches Early Work Pitch Class Profiles Template Approaches Probabilistic Approaches Technical Background Fourier Transform Spectral Leakage Frequency Resolution Fast Fourier Transform Distance Measures Euclidean Hamming Cosine Naive Bayes Multinomial Model Multivariate Bernoulli Model Nearest Neighbour Musical Terms Reg: iii

4 4. Development of the System Data Collection Segmenting Obtaining the Frequency Spectrum Peak Picking Single Note Recognition Two Note Chords Revised Peak Picker Feature Extraction Classification Pitch Class Profiles Feature Extraction Template Classification Probabilistic Classification Results Single Note Recognition Two Note Recognition Pitch Class Profiles Nearest Neighbour Naive Bayes Complex Four and Five Note Chords Chords on Acoustic Guitar Development of a Real-Time Application System Structure User Interface Discussion and Future Work Discussion Future Work Summary Reg: iv

5 References 45 A. Single Note and Two Note Recognition Confusion Matrices 47 B. Nearest Neighbour Confusion Matrix 48 C. Naive Bayes Confusion Matrix 49 D. Nearest Neighbour and Naive Bayes for Acoustic Chords 50 Reg: v

6 1. Introduction 1.1. Background and Motivations Current research into chord recognition from audio input focuses on extracting a sequence of chords from an audio file stored on a computer, often a piece of popular music. There is currently very little research available that can extract the identity of individual chords, despite the potential of this to aid musicians. Beginner guitarists often learn by simply following guitar tabulature, and are unaware of the characteristics of the chords they are playing and how they fit in with the music theory behind the instrument. It is important for a musician to understand the chord they are playing and having a tool to help aid recognition could be very beneficial. This project aims to develop a system that focusses on extracting single chords in real time from an audio signal played by a guitar. The research will focus on comparing the effectiveness of template approaches with those that are probabilistic, before applying them to a real time system Aims The aim of this project is to develop a system that can label chords played in real time on an (electric) guitar. Through the use of signal processing techniques, a system will be developed that can label chords containing any number of notes in real time. The system will be extensively tested across a wide range of chord shapes, as well as on both electric and acoustic guitars. A real time demonstration will then be developed which could serve as a tool for beginner musicians, or even be extended to a transcription tool Objectives At the completion of the project, it was hoped that the following key objectives would be met: Explore and evaluate effectiveness of various template and probabilistic approaches. Reg:

7 Develop a system capable of detecting common guitar chords using both a template and a probabilistic approach with accuracy of over 90%. Extend the system to recognise more complex four and five note chords. 2. Literature Review 2.1. Motivations Automatic guitar chord recognition can be regarded as an aspect of music information retrieval (MIR). In order to begin this project, it is important to be aware of the research which has previously been conducted in the field of MIR. Despite much research into the structure of music over the last century, significant strides in MIR have only been made recently. Because music is now commonly stored as files on computers connected by the internet, MIR has become more possible to achieve, and also very useful. One particular interesting area of music processing is chord recognition. Research into this area has focused on the labelling of continuous chords in music recordings using template and probabilistic approaches. As the core research in this project involves a comparison into the effectiveness of template and probabilistic approaches, it is important to investigate the research that has been carried out on these techniques. Therefore this literature review focuses on some template and probabilistic techniques, as well as other areas of knowledge required for the project, including background music theory Background Music Theory General Music Theory To be able to implement a guitar chord recogniser, some background knowledge of music theory is required. Firstly, a chord is formed from three or more notes, with no maximum (although this is limited by the number of strings or fingers available). The most basic form of chord is a triad which as it s name suggests is made up of three notes. The first note of this chord is called the root, and another two notes are added above it. Reg:

8 In a triad, the added notes are always the third and the fifth. For instance, in a C Major triad, the base note is C and the added notes are E and G. Chords can be classified as being Major, Minor, Augmented and Diminished (Taylor, 1989). Chords can be extended further through inversions. A chord s inversion describes the relation between its bass to the other tones in the chord. In an inverted chord, the root is not the bass. Inversions are numbered in the order their bass tones would appear in a closed root position chord. For example, in the first inversion of a C Major triad, the 3rd is an E, with the 5th and root stacked above it. In the second inversion of a C Major, the bass is now G, and is positioned at the 5th of the triad, with the root and 3rd above it. Minor triads can also be extended to be either diminished or augmented. A minor triad becomes diminished by lowering the 5th scale degree by half a step. Therefore for a Major triad to become diminished, the 3rd and 5th scale degrees are lowered by half a step. To augment a Major chord, the 5th note is raised by a semitone (Lerdahl and Jackendoff, 1985) Guitar Characteristics The guitar as an instrument has many specific characteristics in the sound it produces. The fundamental frequencies of notes have overtones which are frequency components that are higher multiples of the fundamental frequency. The method of picking the string will either exaggerate or diminish the strength of these overtones, which helps to produce the guitars unique timbre. The guitar I am using for this project is an electric guitar, therefore the sound is detected through magnetic coils inside pickups. As a string is strummed, it vibrates and disturbs the magnetic field around it, which is detected by the pick-ups. The length of the string determines the pitch of the note being played. This explains why the further down the fretboard you play, the higher the pitch of the note. This sound is then further amplified through an amplifier. Some recording may also be made on an hollow body acoustic guitar. The tone of an acoustic guitar is heavily reliant on its shape and material, and the vibration of the strings is amplified by the wooden body. (Alexander, 2014). Reg:

9 2.3. Approaches Early Work Traditionally, the task of chord recognition was treated as a polyphonic transcription to identify individual notes. This approach was fundamentally flawed as it suffered from recognition errors caused by noise, and overlapping harmonics in the spectrum of the input signal. The first alternative approach was proposed by Leman, who developed the Simple Auditory Model (SAM). This was the first approach that used an intensity map of the twelve semitone pitch classes, calculated from a spectrum, and therefore achieved more robustness than the note names used previously (Chafe, 1986) Pitch Class Profiles Pitch Class Profiles (PCPs), proposed by Fujishima (1999), used SAM as a framework to produce the first robust method of chord detection. PCPs, also referred to as chroma vectors, are twelve dimensional vectors that represent the intensities of the twelve members on the chromatic scale without regard to their octave (Fujishima, 1999). Fujishima s algorithm works by first taking the Discrete Fourier Transform (DFT) on a fragment of the input signal, and mapping these values on the spectral bins that further map the twelve semitones on the chromatic scale. A vector of twelve intensities are produced for each frame, which are then summed to produce the PCP of the whole note. Chords are then classified using either the Nearest Neighbour method or the weighted sum method. Fujishima also applied several heuristics to the PCP vectors which included smoothing over past PCPs using an averaging operation to reduce noise. He found that smoothing did reduce noise, however it created over-smoothness, blurring the chord change points. He also applied chord change sensing by monitoring the change in direction of the vector and this helped to preserve the chord change points. When using synthesised sounds, Fujishima s algorithm produced very high accuracy using both Nearest Neighbour and weighted sum pattern matching. However when used with real musical recordings, the results produced were not meaningful as the accuracy percentage was not high enough (Fujishima, 1999). Due to these shortcomings, extensions to PCPs have been proposed. Reg:

10 A Chromagram is a collection of PCPs, sometimes referred to as chroma vectors, Despite the popularity of chromagrams, they are not without their flaws. Chroma extraction algorithms often represent the chroma vector using binary values at each bin. This approach does not work for real world recordings, as acoustic instruments produce overtones as well as fundamental notes. This means that regardless of whether the fundamental notes are extracted correctly, there will be non-zero intensities at all twelve points on the chroma scale. These noisy chroma vectors could cause problems later on in the recognition process (Lee, 2006). Lee (2006) proposed an extension to PCPs called the Enhanced Pitch Class Profile (EPCP). EPCPs aim to enhance traditional chroma vectors by making them more similar to their binary type, like the templates used in pattern matching. EPCPs are calculated from the Harmonic Product Spectrum (HPS), which is an extension of the DFT of an input signal. The guitar produces a sound that has harmonics at the integer multiples of its fundamental frequency, so decimating the original magnitude spectrum by an integer number will also yield a peak at its fundamental frequency. The HPS is calculated by multiplying the magnitude spectrum, and the peak in the HPS is the fundamental frequency. The chroma vectors are then calculated from the HPS instead of the DFT (Lee, 2006). Cremer (2004) proposed an alternative which is to derive the chromas from a frequency warped Fast Fourier Transform (FFT) followed by the erasing of overtones and the separation of tonal components from transients. Gómez (2006) took a different approach and instead found the peaks on the spectrum by using the local maximum, and estimated the peak magnitudes by quadratic interpolations. The chroma vector can be calculated by weighting each peak by its contribution to each chroma vector bin (Stark and Plumbley, 2009). The use of an EPCP was found to outperform the conventional PCP vector both at the original frame-rate as well as when the signal was smoothed. The difference in performance becomes more apparent when there is a greater degree in confusion between harmonically related chords, as EPCPs are much less sensitive to this confusion. Other extensions to traditional PCPs have been proposed including one by Oudre et al. (2009), which states that the twelve dimensional chroma vector should be made up of the amplitudes present in the chord that are larger than those of the non-played chromas. By Reg:

11 introducing the chord templates for different chord types and roots, the chord present should be the one that is closest to the chroma vector according to a specific measure of fit Template Approaches Template-based chord recognition methods are built on the methodology that only the chord template is needed for recognition (Oudre et al., 2011a). A chord template is a twelve dimensional vector representing the twelve semitones on the chromatic scale (Fujishima, 1999). The simplest chord templates have a binary structure, with values of 1 at chromas present in a chord definition, and 0 for other chromas. In common approaches, each chord is modelled by a binary Chord Type Template (CTT). The detection is then performed by first calculating the scores for every root and chord type. These scores are computed from both chroma vectors and hand-tuned variations of the original CTT (Oudre et al., 2011a), and the best score is then selected. Harte and Sandler (2005) further improved this method by applying a frequency tuning algorithm. They first define CTTs for only four chord types and then calculate the dot product between the chroma vectors and chord templates. The recognition is then conducted by applying low-pass filtering on the chromogram and median filtering to the detected chord sequence (Harte and Sandler, 2005). Lee (2006) also used a binary chord template, however carried out recognition on the EPCP by maximising the correlation between the chroma vectors and chord templates (Lee, 2006). These approaches described above, all use derivatives of a Nearest Neighbour method or a weighted sum method. The Nearest Neighbour method involves finding the Euclidean distance between chord templates and chroma vectors (Stark and Plumbley, 2009). The weighted sum method works by manually tuning each chord template to give them different weights, so as to reflect the number of notes in the chord type, and the probability of the emergence of the chord type. Negative weights can also be applied for better separation among similar chord types (Fujishima, 1999). Reg:

12 Probabilistic Approaches The most common probabilistic approach is the use of Hidden Markov Models (HMM), which are most helpful for recognising sequences of chords. A HMM constitutes a number of states, with an initial state, a state transition probability matrix which gives the probability of moving from one state to another, and an observation probability which gives the likelihood of a particular state being selected (Rabiner, 1989). In typical chord recognition systems every chord is represented by a hidden state and the chromagram frames are the observations. The chord recognition involves finding the most likely sequence of hidden states that could have generated the output sequence. The HMM parameters are either based on music theory, learned on real data, or a combination of the two (Oudre et al., 2011b). The first HMM used in chord recognition was proposed by Sheh and Ellis (2003). Their system is comprised of 147 hidden states each representing a chord and corresponding to seven types of chords - Major, Minor, Dominant, Seventh, Augmented and Diminished. There are also twenty one root notes represented as states. The HMM parameters are then trained with an EM algorithm (Sheh and Ellis, 2003). This model was improved by Bello and Pickens (2005) who proposed a compete rebuilding of the HMM, by reducing the hidden states to twenty four, by consideration of the Major and Minor chords only. The HMM initialisation is then inspired my music theory which naturally introduces musical knowledge into the model. The state transition and initial state probabilities remain the same, however the observation probabilities are fixed, giving each chord a clear predetermined structure. (Bello and Pickens, 2005). Others have experimented with twenty four states but with different sets of input features. (Ryynänen and Klapuri, 2008). 3. Technical Background 3.1. Fourier Transform The Fourier Transform is a method for extracting the magnitude and phase spectrum from a sound signal. For the purposes of note recognition, only the magnitude spectrum Reg:

13 is considered, as the human ear is relatively insensitive to phase. The magnitude spectrum can be thought of as a decomposition of a signal into its frequency components. In application to sound signals it is known as the Discrete Fourier Transform (DFT), which is equivalent to the continuous Fourier Transform for sampled signals. In other words, it acts on sequential discrete sections of the signal as opposed to treating it as continuous (Harris, 1978). Equation 1 is the function for computing the DFT of a signal where X(n) is the result, x(n) is the input signal, N is the length of the signal, n is the nth time-domain sample and m is the mth frequency bin. X(n) = n=0 x(n)e j2πnm/n wherem = 0,1,...,N 1 (1) N 1 The discrete nature of the DFT means that the result is only ever an approximation of the signal s frequency spectrum. The resulting values are complex numbers that contain real and imaginary parts. From these complex numbers the phase angle, X θ, and magnitude, X can be retrieved. The magnitude spectrum of the result is equivalent to the frequency spectrum of the signal, and it the only part of the DFT result that will be used in this project Spectral Leakage The DFT computation assumes that a signal is periodic in N, which is the length of the signal being analysed. When the DFT of a non-periodic signal is computed, the resulting frequency spectrum suffers from leakage. This leakage results in the signal energy being smeared out over a wide frequency range. The dispersed nature of the DFT produced makes it harder to determine the frequency content of the signal. The most effective way of tackling spectral leakage is to apply a windowing function to the signal. By default, all discrete signals have a rectangular window applied to them that multiplies every point by 1. Windowing functions, such as a Hamming window, taper the amplitudes of the signal at the start and end of the signal, to a smaller amplitude than the peak value in the centre. The DFT of a window has a peak at the applied frequency, and smaller side lobe peaks on either side. The height of these side lobes indicate the effect that the windowing Reg:

14 function will have on frequencies around the applied frequency. Generally, the lower the side lobe, the more the window will reduce leakage in the DFT Frequency Resolution The DFT returns a discrete spectrum, as opposed to a continuous one, therefore the frequency component of the signal is resolved to a finite number of bins. The resolution of an N-point DFT is the frequency spacing between each of these bins. The frequency resolution is calculated using equation 2 as shown below. f resolution = f s (2) N Where f s is the sample rate and N is the number of bins. For example, a point DFT of a signal sampled at 16 khz will have a frequency resolution of 0.5 Hz. Therefore the frequency spacing between each DFT bin is 0.5 Hz. The value of N varies depending on the time of the signal the DFT is taken on, and the sampling rate Fast Fourier Transform Although the DFT has a simple implementation and produces correct results, it is an inefficient algorithm and therefore not applicable to the real-time needs of this project. The first efficient implementation of the DFT, called the Fast Fourier Transform (FFT) was introduced by Cooley and Tukey (1965) and produces the same output as the DFT, but with many of the redundant calculations removed. The traditional DFT often performs the same calculations several times, which is avoided when using the FFT. In MATLAB, the fft function is based on the Cooley-Tukey algorithm. The execution time for the FFT depends on the length of the transform and is fastest for transforms whose lengths are powers of 2. The FFT is actually slower than the DFT for lengths that are prime or have large prime factors, although it is still faster when lengths only have small prime factors Distance Measures Distance measures, sometimes refereed to as metrics, are functions that define the distance between two vectors. The distance metrics used throughout this project are defined Reg:

15 below Euclidean The Euclidean distance between two vectors is the length of the line segment connecting them in N-dimensional space. The distance between vectors p and q is given by equation 3: d(p,q) = n i=1 where p i and q i are the values of p and q in the ith dimension. (p i q i ) 2 (3) Hamming The Hamming distance between two vectors of equal length is the number of positions at which the corresponding symbols are different. It measures the minimum number of errors that could have transformed one vector into the other, as shown in equation 4. D H = m x i y i k=1 x i = y i D i = 0 x i y i D i = 1 (4) Cosine The Cosine distance is a measure of the angle between two vectors. If both vectors are equal then the angle between them is 0, therefore the cosine distance between them is 1, and as cos(x), where x = 0, is 1, the -1 in equation 5 is needed. d(p,q) = 1 n i=1 p i q i n i=1 (p2 i ) n i=1 (q2 i ) (5) 3.3. Naive Bayes Bayesian methods of classification are based on probability theory and play a critical role in probabilistic learning and classification. Bayesian models build a generative Reg:

16 model that approximates how the data is produced. This is done using the prior probability of each class given no information about an item. The categorization produces a posterior probability distribution over the possible classes given a description of an item (Manning et al., 2008). The probability of chord d being in class c is computed using equation 6: P(c d) P(c) P( f k c) (6) 1 k n d where P( f k c) is the conditional probability of note f k occurring in a chord of class c. We interpret P( f k c) as a measure of how much evidence there is that c is the correct class. P(c) is the prior probability of a chord occurring in class c Multinomial Model Multinomial Naive Bayes only considers notes that are in the query chord therefore only the presence of a note is considered. The goal of this is to find the best class for the chord. The best class in Multinomial classification is the most likely given by the equation: C map = arg maxp(c d) = arg maxp(c) P( f k c) (7) 1 k n d In equation 7, the conditional probabilities are multiplied together. As the values for the probabilities are often very small, many multiplications can result in a floating point underflow. It is therefore more common to perform the calculation by adding the logarithms of probabilities as opposed to multiplying them. The class with the highest log probability score is still the most probable. Hence, equation 7 can be rewritten as: [ ] C map = arg max logp(c) + logp( f k c) 1 k n d In equation 8, log p( f k c) is a weight that indicates how good an indicator f k is for class c. The sum of these together with the logarithmic prior (P(c)), is then a measure of how much evidence there is for the chord being in the class, and equation 8 selects the class for which there is the most evidence. The parameter P(c) is found using the maximum likelihood estimate which for prior (8) Reg:

17 probabilities is found using equation 9: P(c) = N c N where N c is the number of chords in class c and N is the total number of chords. The maximum likelihood estimate for each note, P( f c), is estimated as the relative frequency of note f in chords belonging to class c, as shown by equation 10: (9) P( f c) = F c f f V F c f (10) where F c f is the number of occurrences of f in the training documents from class c, including multiple occurrences Multivariate Bernoulli Model An alternative to the Multinomial Model is the Multivariate Bernoulli Model. The Bernoulli Model estimates p( f k c) as the fraction of documents of class c that contain note f k. Bernoulli uses all the notes in the vocabulary and so takes into account the absence of a note in the query as well as its presence. The parameter P( f c) is calculated using equation 11: p( f c) = D k=1 δ( f,c) + 1 D + 2 (11) where D is the number of chords in class c and: δ( f,c) = { 1 if note f occurs in chordd 0 otherwise (12) 3.4. Nearest Neighbour Nearest Neighbour is a very simple but powerful classification technique that is based on the premise that vectors which are close to each other in a vector-space belong to the same class. The simplest form of Nearest Neighbour is 1NN classification, where each chord is assigned the class of its Nearest Neighbour. The more common form of Nearest Neighbour is knn classification, which is much more powerful technique than 1NN. For knn, we assign each chord to the majority Reg:

18 class of its k closest neighbours where k is a parameter. knn is more robust than 1NN as it does not rely on single examples in the training data. Nearest Neighbour has some advantages over other classification techniques. Firstly, it does not require any feature selection, which is often necessary for Naive Bayes classification. Secondly, it scales well for large numbers of classes, as there is no need to train n classifiers for n classes. It is also possible run an 1NN classifier without any training data, although some sort of library is still needed Musical Terms Throughout this project, terms relating to the music theory of western music will be used. In order for them to be fully understood and referred to, they are defined below as: Chromatic Scale: A musical scale with twelve pitches, each a semitone above or below each other. Chord: A chord is defined as any harmonic set of three or more notes that is heard as if sounding simultaneously. The most common types of chords are called triads, and contain three distinct notes. Further notes may be added to triads to produce seventh or ninth chords. Power Chords: Power chords are a type of chord specific to electric guitar music and are diads that contain a root note and the fifth. Note: A note is a single tone from the chromatic scale without any regard to the octave. For example C or C. Pitch: Pitch is when a note is written with its harmonic information included, for example A4 is an A note at the fourth harmonic. Reg:

19 4. Development of the System 4.1. Data Collection In order to properly evaluate the effectiveness of the methods investigated, a large library of notes and chords had to be recorded. For chords played on the electric guitar, chords were recorded at Hz through a USB interface using the software package Logic Pro X. A sample rate of Hz was chosen as it is good practice to keep your library at a high sample rate as recordings can be resampled to lower rates when required, but can not be upsampled effectively from a lower sample rate. Chords recorded on an acoustic guitar were recorded through a microphone at Hz through the software package Audacity Hz was chosen as this was the highest sample rate the microphone used could record at. Table 1 shows the chords recorded on the electric guitar and Table 2 shows the chords recorded on the acoustic. Table 1: Chords recorded on the electric guitar Range Sets Single Notes E2 - E5 5 Fifths E5(2) - E5(3) 25 Major C - B 25 Minor C - B 25 Major 7th C - B 25 Major 9th C - B 25 Table 2: Chords recorded on the acoustic guitar Range Sets Major C - B 15 Minor C - B 15 Reg:

20 Before classification, each of the recordings was down-sampled to Hz. This figure was chosen in order to remove as much of the unnecessary high frequency information as possible. As the highest frequency possible when sampling at Hz is 5000 Hz, the chosen sample rate sufficiently covered the frequency range produced by a guitar Segmenting In order to speed up the process of separating the recordings into their individual chords, an automatic segmenter function was written, which was designed to take in a recording, and produce an array containing the start and end points of each chord in the recording. It works by splitting up the recording into a set of small frames and then taking the energy of each of these frames. The frames with a high energy content are chords and frames with low energy are gaps between chords. Although not formally tested, the segmentation process appeared to work perfectly, albeit after occasional changes to the frame size, as all sets of segments produced were of the correct size. It would have been obvious if the segments were faulty as the frequency spectrum produced would have been unusual Obtaining the Frequency Spectrum The method used for obtaining the frequency spectrum was identical throughout this project and therefore can be explained at this point. The method was applied to a segment of a recording, sampled at Hz. Whilst still in the time domain, a Hamming window was applied to the signal in order to reduce spectral leakage. The resulting signal was then passed into the Matlab fft function. The output from this produces an array of complex numbers so the magnitude of each value was taken. The frequency resolution was calculated as the sample rate, divided by the length of the input signal. This ensured that the frequency resolution was deep enough in order to avoid the frequencies of notes overlapping. The smallest gap between notes of interest was 7 Hz, which is the gap between an E2 and F2. As the smallest input length was likely to be around two seconds, the frequency resolution was 0.5 Hz, which was sufficient. The frequency spectrum produced could then be analysed by the peak picker. Reg:

21 Peak Picking The performance of the peak picking algorithm is vital to this project as it underpins the methods of recognition used throughout this project. The purpose of the peak picker is to extract the frequencies that are dominant in the frequency spectrum. High energy frequencies present in a frequency spectrum are identifiable by the fact they are very high peaks in an otherwise flat graph. A peak picker needs to be able to correctly identify these peaks and their exact locations in the spectrum. After initial research into peak picking it was decided that it would be acceptable to use the built in MATLAB function findpeaks. The findpeaks function provides a comprehensive peak finding algorithm that contains several configuration parameters. The parameters relevant to this project are: MinPeakHeight - The minimum height of a peak. Threshold - The minimum height difference between a peak and its neighbours. MinPeakDistance - The minimum peak separation. npeaks - The maximum number of peaks to return. In order to properly test the effectiveness of findpeaks under different configurations, two sets of single notes had their peaks manually labelled. For each note, the fft was taken and the number of peaks, as well as their locations was recorded. For each configuration, the precision, recall and F-measure was taken for a range of values. The precision was calculated as the proportion of correctly labelled peaks against the total number of peeks detected. Recall is the proportion of peaks detected against the total number of expected peaks. The F-measure is the weighted harmonic mean of the precision and recall calculated using equation 13. F = 2 precision recall (precision + recall) (13) The first configuration tested was the threshold. The input value for threshold was tested from 1 to 50. Figure 1 shows the change in precision, recall and F-measure across this range. In terms of precision, the score starts low, however increases very quickly as the Reg:

22 threshold is increased. The score continues to rise until threshold reaches 22, where the precision starts to fall again. However, although the also recall rises very quickly, it s score begins to decrease after threshold reaches 8, and this is the same for the F-measure. As a result, 8 was chosen to be the optimum value for threshold. Figure 1: Effect of changing threshold The second configuration tested was the minimum distance. Again, this was carried out on an input value range of 1 to 50. However, the threshold was set to the optimum from the previous experiment of value 8. Figure 2 shows the change in precision, recall and F-measure. The scores for all three measures starts very low, rises quickly for low input values, and levels off after 14. Although a small increase was seen in precision and F-measure after 40, this was not chosen as the optimum as the increase was too small to be useful. Therefore the optimum was chosen to be 14. Figure 2: Effect of changing minimum distance The final configuration tested was the minimum height. This test was carried out again with threshold at 8, and now minimum distance at 14 across the range of 1 to Reg:

23 50. Figure 3 shows the change in precision, recall and F-measure. With the threshold and minimum distance parameters at their optimum, changing the minimum height had little effect on accuracy scores, however when the input value was 32, there was a small increase in precision and F-measure, whilst the recall stayed the same. Therefore the optimum was chosen to be 32. Figure 3: Effect of changing minimum height With the optimum configurations now set for findpeaks, the peaks could now be successfully extracted from the frequency spectrum. Figure 4 shown the frequency spectrum of a C Major chord with the peaks clearly marked. The findpeaks function returns two arrays, containing the locations of the peaks, and their amplitudes. The peak locations were scaled to the frequency resolution of the frequency spectrum before being used for classification. Figure 4: Frequency Spectrum of C Major with Peaks Reg:

24 4.2. Single Note Recognition The first classification technique approached was on simple single note recognition. This was chosen in order to test the effectiveness of the peak picking algorithm and the application of the fft as these techniques were the basis for the remainder of the project. A very simple approach would be to take the fft and label the note based on the value of the strongest peak, assuming this to be the fundamental harmonic. However, this is prone to error as the peak with the highest amplitude is not guaranteed to be the fundamental and therefore the note can be recognised incorrectly. In order to classify notes correctly, a method was developed that took into account all the harmonics present in the frequency domain, called the mean distance method. The mean distance method works by classifying notes using similarity of the mean distance between the detected harmonics and the fundamental harmonics of the possible notes. As every harmonic after the fundamental is a multiple of the fundamental frequency, and the distance between each harmonic in the frequency spectrum is equal to that of the fundamental. In order to classify the notes, a note library containing the names of notes ranging from E2 - E5 as well as their fundamental frequencies was used. The first step is to obtain the frequency spectrum from the recorded note and obtain the detected harmonics using findpeaks. The peaks returned are inserted into a npeaks 1 array in the order they were detected, and the absolute distance between each peak is then taken. The mean of these distances is subsequently compared to the fundamentals of the notes in the note library and the note the recording is classified as, is the closest note. Table 3 shows an E2 note with all harmonics detected correctly. Despite some slight variation in the distances between peaks, the mean distance is still clearly an E2 (82 Hz). A requirement for the mean distance method is that all the harmonics have to be identified correctly. However, in practice, it is common for the peak picker to either wrongly detect two peaks that are very close together, or miss an harmonic entirely. If these erroneous peaks were included in the calculation for the mean distance, then they would alter an otherwise well defined note. To remove the erroneous peaks, the list of detected peak differences was sorted in Reg:

25 Table 3: Mean distances of an E2 note Frequencies Distances Mean Distance 82.6 ascending order. If a difference was below 77Hz then the difference was removed, as this is below the acceptable threshold for the distances between the harmonics of an E2, which is the lowest note on a guitar. The differences between the differences was then taken. If this value was greater than 7Hz then the second difference was discounted, and if it was less than 7Hz, then the first difference was removed. 7Hz was chosen as this is the smallest difference between semitones at the lowest frequency of interest and a difference greater than this implies no relation between harmonics. Table 4 shows an E2 note with a missing peak, the harmonic at 245 Hz. As a result of this, the mean distance is , therefore the note will be incorrectly labelled. Taking the distance between the distances shows that distance 2, is too different from the other differences so can be discounted. Therefore the new mean distance is calculated as being 82.6, which is correct. Reg:

26 Table 4: Mean distances of an E2 note Peak Frequencies(Hz) Distances a 2nd Distances b Final Distances Mean Distance a Distances between the frequencies b Distances between the distances 4.3. Two Note Chords Classification by the mean distance method works well for single notes. However, when applied to two note chords, it fails because the frequency spectrum now contains two sets of harmonics. Therefore the distances between peaks are no longer of any use, so a new method of classification had to be developed Revised Peak Picker After initial research into the characteristics of the frequency spectrum produced by two note chords, it became clear that the findpeaks function was no longer proving to be suitable. Despite its large number of configuration parameters, the algorithm itself was simply not designed for the type of peaks produced by the fft. The gradient of these peaks increases so quickly, findpeaks is unable to recognise the fact that there is an up-slope. Therefore a new algorithm was developed. The revised algorithm worked by first zeroing all points in the spectrum that were below a certain threshold. This removed all the redundant information at very low amplitudes and also created well defined segments, which held the harmonic peaks. Within Reg:

27 each of these segments, the maximum points were found using the second differential, and the highest maximum point was taken to be the peak. Similar tests to the ones performed on findpeaks were carried out on this new algorithm. However, the only threshold that could be changed was the height of values in the frequency spectrum to be zeroed. Figure 5 shows the effect on the precision, recall and F-measure. The experiment showed the optimum input value to be 25, as although the precision reaches higher values, the recall and F-measure both start to fall after 25. The optimum configuration was used for the remainder of the project. Figure 5: Effect of changing threshold Feature Extraction For feature extraction, a new algorithm was developed in order to extract the notes present from the frequency spectrum. The following algorithm used a library of 48, 10-dimensional vectors that represented the frequencies of each harmonic in the note. The first step of the algorithm involved collecting all the notes present in the frequency spectrum. Every peak was compared with the fundamental frequencies from the note library and assigned a number between 1 and 12 (1 being C and 12 being B). The reason each note was assigned a number was simply to simplify any comparisons that needed to be made later on. The number of the harmonic was also recorded. These values were put into a npeaks 2 array in the order they appeared in the frequency spectrum. The second step was to normalise the harmonic number assigned to each note. The assumption being made here was that the first few notes found were the fundamentals Reg:

28 of the remaining notes. Therefore in order to retain the information regarding a note s harmonic, the remaining note s harmonic number was adjusted to match the harmonic of the first note found of its type. The final step was to count the occurrences of each note. The feature vector was passed over and any note found was put into a new feature vector along with a count of its occurrence. If a note was already present in the new feature vector then the count value was simply incremented. The result of this stage was an nnotes 3 feature vector. As an example, Table 5 shows the process of extracting an E5 chord. An E5 contains an E2 and B2, which both appear in the chord, along with an erroneous F3. However this erroneous peak does not affect the labelling of the chord as there is only 1 occurrence of it, compared to 4 and 3 occurrences of E2 and B2 respectively. Table 5: Feature extraction process of an E5 Notes Frequencies Notes and Harmonics a After Normalising a E B E F E B E B Note Count b Final Chord a a (Note Harmonic) (Note Harmonic Occurrence) Reg:

29 Classification From the resulting feature vector, only the two most common notes were put forward for classification. This was based on the assumption that any erroneous note detected would not appear more often than any correct note. The now 2 2 feature vector was compared to its correct label. When the classification was carried out, a matrix containing the correct pairs of notes in the order they appeared in the recordings was used to evaluate the accuracy of the method Pitch Class Profiles The method described for two note recognition is perfectly adequate for two note chords, but cannot be realistically applied to larger chords. This is because it requires prior knowledge of the size of chord being recognised, so therefore cannot be applied to a real time application where this knowledge is not available. There is also the problem with simply selecting the most common occurring chords, because there is a greater possibility of interference from erroneous notes when chords get larger. As a result, a more versatile approach had to be taken. Pitch class profiles were chosen for three note recognition and beyond as they require no knowledge of the chord size before recognition, so are more suitable for a real time application. Pitch class profiles are 12 dimensional binary vectors, where each position represents one semitone on the chromatic scale. In the vector, a 1 represents the presence of a note, and 0 the absence of a note. For example, Table 6 contains the pitch class profiles for the Major chords Feature Extraction The purpose of feature extraction was to translate the detected peaks into the chord s pitch class profile. The process was very similar to the two note feature extraction process. For each peak detected, its frequency was compared with the fundamentals in the note library and the closest match was taken to be the note of the peak. However, for pitch class profiles, when a note was detected the corresponding position in the chord s pitch class profile was changed to 1. If the position was already set to 1 then nothing Reg:

30 Table 6: Pitch class profiles of the Major chords C C D D E F F G G A A B C Major C Major D Major D Major E Major F Major F Major G Major G Major A Major A Major B Major was changed, as pitch class profiles do not contain information about the occurrence of a note. To evaluate the performance of pitch class profiles, features were extracted from the entire chord library, as shown in tables 1 and 2, except for the single notes. For classification, a template approach using Nearest Neighbour, as well as a probabilistic approach, using Naive Bayes, were investigated Template Classification The first pitch class profile classification technique investigated was a template based approach, using Nearest Neighbour. The rational behind using Nearest Neighbour was to use an non-learnt method of classification as any chord played can be recognised without the need for training data. In order to apply Nearest Neighbour, a set of training examples is needed. However, for pitch class profiles all that is needed for recognition is a library containing the correct representation for each chord. Usually, 1NN is not as effective as knn as 1NN does not Reg:

31 protect from error in the training data. However, as the chord library contains perfect information about each chord, 1NN will still be successful. The chord library used for testing contained the pitch class profiles for 23 different chord types, for all 12 notes on the chromatic scale. This was represented as a dimensional matrix. One of the key components in Nearest Neighbour classification is the similarity measure used and three different similarity measures were used in this project. These were cosine distance, Hamming distance and euclidean distance. Cosine distance was used as it is generally considered to be the most effective distance measure. Hamming distance was chosen as the pitch class profiles are binary vectors, therefore hamming distance should produce very similar results as the cosine distance. Finally, Euclidean distance was used as it is the similarity metric used by Oudre et al. (2011b), who also used pitch class profiles for classification. To test the effectiveness of Nearest Neighbour, the technique was tested on every chord example in the chord library, excluding single notes. The classification was based on the closest vector in the training set to the test vector being the correct chord. Therefore, the distance between each test pitch class profile, and every training vector was taken. Tables 7, 8 and 9 show the top 5 cosine, Hamming and euclidean distances between a C Major chord, and the chords in the chord library. Table 7: Cosine distances between a C Major, and chords in the chord library. CMa jor CMa jor7 C5 CMinor EMinor CMa jor Table 8: Hamming distances between a C Major, and chords in the chord library. CMa jor C5 CMa jor7 CMinor EMinor C Major Reg:

32 Table 9: Euclidean distances between a C Major, and chords in the chord library. CMa jor C5 CMa jor7 CMinor EMinor C Major The final accuracy of Nearest Neighbour was calculated as the sum of correctly identified chords, divided by the total number of chords classified. The classification function produced an nchords nexamples matrix where nchords was the number of different chord types tested, and nexamples was the number of examples for each. Every position in the matrix was a number representing the name of the chord recognised. This matrix was compared to a matrix of the same size containing the correct results. A comparison between these matrices produced a confusion matrix Probabilistic Classification The second pitch class profile classification technique investigated was a probabilistic approach using Naive Bayes. Where Nearest Neighbour is an non-learnt classification technique, Naive Bayes is a learnt approach, as training data is required for classification. The two types of Naive Bayes models used for classification were a Multinomial and a Multivariate Bernoulli model. In order to train the models, each set of chords was split into 15 training examples and 10 testing examples. This was carried out because it is very easy to obtain erroneous high accuracy on a model that has been trained and tested on the same data. In order to achieve a better idea of real world performance, the data had to be split. For the Multinomial model, the conditional probabilities were calculated using the equation 10. For example, table 10 shows the calculated conditional probabilities for a C Major. A C Major contains the notes C, E and G, therefore their corresponding conditional probabilities are much larger than those of the notes that do not appear. The conditional probability of D is also slightly higher weighted than the others, which could be due to one of the chords in the training set containing an erroneous occurrence of D. Reg:

33 Table 10: Conditional Probabilities of a C Major using a Multinomial Model C C D D E F F G G A A B CMa j For the Bernoulli model, the conditional probabilities were calculated using equation 11. For example, table 11 shows the calculated conditional probabilities for a C Major. The spread of weights is very similar to the Multinomial model, with occurring notes being weighted much high than non-recurring ones. As the PCP is a binary vector, the value for occurrences of a note and chords containing a note will be identical, so the only difference in performance between Multinomial and Bernoulli will be observed at test time. Table 11: Conditional Probabilities of a C Major using a Bernoulli Model C C D D E F F G G A A B CMa j In order to classify a chord using Naive Bayes, the class with the most evidence produced for being the chord, has to be calculated. For the Multinomial model, equation 8 is used. The logarithmic prior probabilities of all the classes are set equal, using equation 9. Then for each occurring note in the chord, the logarithmic conditional probability is added to the prior. For example, Table 13 shows the 5 classes with the highest evidence for a C Major chord. For the Bernoulli model, the prior probabilities of all the classes are set equal using equation 9. Following on from this, for each occurring note in the chord, the probability is added to the prior. However, unlike when using Multinomial, for every absent note, 1 P( f c) is added to the prior. For example, Table 12 shows the 5 classes with the highest evidence for a C Major chord. Reg:

34 Table 12: Classes with most evidence for a C Major using Multivariate CMa jor C5 CMa jor7 EMinor AMinor CMa jor Table 13: Classes with most evidence for a C Major using Multinomial CMa jor CMa jor7 C5 A5 CMinor CMa jor Results 5.1. Single Note Recognition Classification using the mean distance method for single notes produced an accuracy of 77%. This was a disappointing result, because for a relatively simple task of picking one frequency value from the spectrum, a higher accuracy would be expected. However, on inspection of the confusion matrix, as shown in appendix A, many of the notes were recognised correctly 4 out of 5 times, which suggests that one poor set of recordings has skewed the final accuracy. There were also many instances of notes being incorrectly recognised as E2s which suggests possible interference from the E string. This interference could have originated from the possibility that many of the note s were on the A string, whilst the E string was being muted. Therefore if the E was not being muted properly, then the E2 would appear in the frequency spectrum. The mean distance method could have been improved by refining the method for discounting erroneous peaks. In it s current form, a few bad peaks at the start of the note can throw the entire classification, as the distances used for comparison would be wrong. One possible solution could be to use the lowest common multiple of the peaks when discounting erroneous ones. An entirely different, and more sophisticated approach, could be to use K-means clustering for recognition, where each peak would be matched to a cluster, and the most occurring cluster would be the note. Reg:

35 5.2. Two Note Recognition The tests for classifying two note chords produced an accuracy of 90%. This was a reasonable result, however there is still room for improvement. The improvement in performance over single note recognition could be due to the superior method of removing erroneous notes, as the position of the error did not effect the final classification. The improved peak picker also helped to boost performance as fewer erroneous peaks were detected. One observation from the confusion matrix, as shown in appendix A, was that no chords were classified incorrectly. All chords were either recognised or not. This suggests that any failed chords either contained only one detected note, or one of the notes was a semitone out, therefore no chord existed for it to be classified as such. There were three chords that consistently failed to be recognised, A3 E4, A 3 D 3 and G 2 D3. This could simply be related to poor playing during the recordings. This method could be improved with a more sophisticated approach to picking the two notes in the chord. Currently, the two most occurring notes are chosen, but this is susceptible to interference from erroneous notes, especially when few notes are recognised. In addition, in the situation where three or more notes all occur equally many times, the decision as to which two notes are selected is left to the MATLAB sort function to decide. One solution would be some form of pattern matching algorithm, which has prior knowledge of the relationship between the notes in the chords. The note selection would subsequently be made based on the relationship between the notes detected Pitch Class Profiles Nearest Neighbour Using Nearest Neighbour, an accuracy of 93.3% with cosine similarity, and 94% using both Hamming and euclidean distance was achieved, as shown in Table 14. This was a good result as it exceeded the 90% target. The confusion matrix produced is in appendix B. It is interesting to note that generally, chords that were incorrectly recognised more commonly all contained an E. This may be due to the low E string producing lower Reg:

36 energy than other strings, especially higher up the fretboard and therefore the frequency data for an E may have been consistently missed. This may help to explain why chords that failed consistently during two note recognition, also included the low E string. There is also a small group of 7 chords, ranging from D Major to A Major that were all incorrectly recognised on several occasions. These chords are all at the extreme ends of the fretboard, with D Major being played at the 10th fret and A Major on the 5th. At the higher end of the fretboard, chords were incorrectly recognised as fifths, which suggests that in some of the recordings the 3rd string was not being fretted properly, due to the small fret size. At the lower end of the fretboard, the chords were labelled as chords outside the scope of the confusion matrix. This may be because of interference from the open strings that caused greater interaction between peaks. This interaction would have distorted the frequency spectrum therefore creating errors during peak picking. When the tests were first run on a PCP library that contained only the chords being tested, the accuracy was recorded at 98%. Subsequently, when the PCP library was extended to contain over 300 chords, the accuracy dropped to 94%. This demonstrates the disadvantages of an non-learnt approach, as chords with extra or missing notes are recognised incorrectly. As the chord combinations are so large, it is likely that a slightly incorrect chord can appear to be a completely different chord Naive Bayes Classification using Naive Bayes achieved an accuracy of 99% with a Multinomial model, and 98% with a Multivariate model respectively, as shown in Table 14. This was an excellent result as the accuracies were close to perfect. It can be observed from the confusion matrix, in appendix C, that both models failed on the same chords. Both incorrectly recognised a C as a C Major7 on two occasions, as well as a G Minor as a G 5. The performance of Naive Bayes could have been improved by applying some feature selection to the conditional probabilities. This would help refine the training data as erroneous notes in the training data will be ignored. The feature selection could come in the form of mutual information or the Chi square statistic. Another method of improving performance would be to extend the training set. Currently, the training is performed Reg:

37 on 15 examples of each chord. Increasing this value may well increase performance, however as the accuracy is already at 99% it may make very little difference. Table 14: Classification Accuracy for Common Chords Method Accuracy Nearest Neighbour - Cosine 93.3% Nearest Neighbour - Hamming 94% Nearest Neighbour - Euclidean 94% Multinomial 98.75% Multivariate 99.25% Complex Four and Five Note Chords In order to test the robustness of the system, tests had to be carried out on more complex chords. Currently, the only four note chords tested were the Major 7ths. Although these are four note chords, the shape played produces a very clear and distinctive sound, therefore feature extraction was very simple. The complex chords introduced were the four note 6th chords, and the five note 6/9th chords. These chords were chosen as their shape is one that is relatively hard to play, which should introduce errors relating to missing notes. These chords are also played very close to the chords currently in the library, therefore some confusion is possible. The addition of the 6th and 6/9th chords had a detrimental effect on the accuracy scores. The accuracy of Nearest Neighbour fell to 80.75% for cosine distance and 79.11% for hamming and euclidean distance, as shown in Table 15. This was a disappointing result as the overall accuracy has dropped by a significant 15%. On studying the confusion matrix, it is obvious that the 6th chords were recorded poorly, as many examples only achieved around 12 out of 25 correct examples. This split suggests that possibly either the testing set, that contained 10 examples, or the training set which contained 15 examples, was of a poorer quality than the other set. The performance of the 6th notes is also disappointing when compared to the performance of the 6/9th chords, Reg:

38 which generally scored 18 out of 25. This result showed that when recordings were of a sufficient quality, the method can still achieve acceptable levels of accuracy. The accuracy of Naive Bayes still fell, with Multinomial scoring 83.47% and Multivariate scoring 84.5%, as shown in Table 15. Again, as with Nearest Neighbour, this was a significant drop in accuracy. It was also interesting having studied the confusion matrix that no real pattern in the the incorrect recognition could be found. The 6th chords were again those that failed on more occasions, although the disparity was not as profound as with Nearest Neighbour, as the 6th were successful on average 5 out of 10 times, compared to 7 out of 10 times for 6/9ths. The fact that Naive Bayes remained around 5% more accurate than Nearest Neighbour reinforced the advantage that a learnt approach has over an non-learnt approach. Table 15: Classification Accuracy for Complex Chords Method Accuracy Nearest Neighbour - Cosine 80.7% Nearest Neighbour - Hamming 79.11% Nearest Neighbour - Euclidean 79.11% Multinomial 83.47% Multivariate 84.5% Chords on Acoustic Guitar The pitch class profile classification methods were also tested on chords recorded on an acoustic guitar. The performance of these methods is expected to be lower as the system has been developed from the beginning to perform on an electric guitar. However, it is still a reasonable test of the robustness of the system. Using Nearest Neighbour, an accuracy of 60% was achieved for all distance measures. However using Naive Bayes, an accuracy of 77% was achieved using both Multinomial and Multivariate, as shown in Table 16. The confusion matrix produced is in appendix D. With Nearest Neighbour, it can be observed that where Major chords are recognised Reg:

39 almost all the time, the Minor chords are very poorly recognised which could be due to one of the higher strings being out of tune when the recordings were made. There were many occasions where Minor chords were recognised as chords a tone away, which suggests that one or two strings were out of tune. There were also some chords, namely F Major, F Major and C Major, which were recognised as chords outside the scope of the confusion matrix. This implies that there were either additional peaks that suggested a four or five note chord, or a missing peak that suggested a fifth chord. These results help to show the advantages of using a learnt approach, as the chords with missing, or extra peaks, were still classified correctly. Table 16: Classification Accuracy for Chords on the Acoustic Guitar Method Accuracy Nearest Neighbour - Cosine 60% Nearest Neighbour - Hamming 60% Nearest Neighbour - Euclidean 60% Multinomial 77.5% Multivariate 77.5% 6. Development of a Real-Time Application After testing had been completed, a real time system was developed in order to demonstrate the methods implemented in a real world situation. A Guided User Interface (GUI) was implemented in MATLAB, which could take a live recording of a chord, and process it in order to display back to the user the chord played. The GUI was developed in MATLAB because all the implementations of the classification techniques had already been written in MATLAB, therefore requiring little modification. Reg:

40 6.1. System Structure The system requires the several parameters at launch. Firstly, a PCP library, which is used for Nearest Neighbour classification. This is a matrix that contains all the possible chord examples. Secondly, a set of terms. This a cell array that contains the names of all the chords in the PCP library. Lastly, the training PCPs that contain all the training sets for the chords. The first stage in the system is the initialisation process. Here, all input parameters are assigned to handles, which means they can be accessed by the rest of the functions in the GUI. The conditional probabilities for Multinomial and Multivariate models are also calculated using the training PCPs. These are returned to handles and the system is ready to be used. The next stage relies on the user pressing the start recording button. When this is pressed, the getrecording function is called. This initialises an audiorecorder object and begins recording 7 seconds of data. The audio is recorded in stereo, as this is required when recording through a USB interface, and sampled at Hz. This sample rate was chosen as it is the same as the value used throughout the testing stage. The data is subsequently extracted from the audiorecorder object, converted to mono, and returned to the system. The audio data is then segmented in order to extract the chord from the audio which has been recorded. The segmenter is the same as the one used during testing and returns values for the position of the start, and end of the chord. At this point, the segmented chord is then plotted in the time domain. The chord is sent to the getpcp function. The getpcp function first applies a hamming window to the chord, and then applies the fft in order to extract the frequency spectrum. The peaks are then extracted and from these the PCP is formed. This function also returns the frequency spectrum, in addition to the peaks. The next process is to undertake the task of the matching. The method chosen depends on the user s choice. The matching function takes the PCP and returns the name of the chord match, the harmonic of the chord, as well as the PCPs of the top 5 matches. The chord name and harmonic are sent to the chord text box and the PCPs are sent to a table. The final stage is to play the audio back to the user. This process repeats when the start Reg:

41 recording button is pressed again User Interface Figure 6 shows the layout of the user interface after an A Major chord has been played. Figure 6: User interface of application The following are each of the elements contained in the UI: 1. Classification method - Allows the user to select the type of classification they wish to use. 2. Nearest neighbour similarity measure - A drop down menu that contains the different similarity measures available for Nearest Neighbour classification. 3. Start recording - User presses this to start the recording. The button turns red whilst the recording is in progress. 4. Play recording - Plays back the chord currently being held by the system. Plays nothing if there is no chord. 5. Peak threshold - Allows the user to alter the peak threshold, which is the amplitude at which values in the frequency spectrum are zeroed. If it is obvious from the frequency plot that low peaks are being missed then the user should lower this value. The default is 30. Reg:

42 6. Onset frame size - Changes the frame size used in the segmenter. This should be changed if a chord not found error is being returned. 7. Chord name - The name of the chord recognised as well as its harmonic. 8. Time plot - The time domain signal of the segmented chord. 9. Frequency plot - The frequency plot of the chord recognised with the detected peaks indicated. 10. PCP table - Contains the PCP of the chord together with the top 5 matches in addition to their names. 7. Discussion and Future Work 7.1. Discussion Overall, the methods investigated during the project all worked well. The accuracy achieved for single notes using the mean distance method was rather disappointing. The real purpose of this part of the project was to learn how to extract peaks from a frequency spectrum which would be needed to continue with the project. This was achieved, so time was well spent. Despite two note recognition managing to achieve 90% accuracy, at the finish there was a feeling of disappointment as two note recognition with PCPs actually produced near 100% accuracy. However, the techniques applied during the feature extraction process produced a solid base for the PCP feature extraction, and with hindsight, the time spent working on this method may have been better spent improving the PCP methods. The PCP methods were the greatest success of the project. Despite their relative simplicity, the methods were still able to achieve extremely high accuracy for the standard chords, and a decent accuracy for complex chords. It was interesting to note that during Nearest Neighbour tests, hamming distance and euclidean distance were identical. It was expected that hamming distance and cosine distance would be the same throughout, Reg:

43 and euclidean distance be the poor performer, but it proved to be the opposite. However, given the low number of test items, this difference is almost certainly not significant. It was pleasing that the accuracy of Nearest Neighbour remained very high, even when the chord library was extended. This meant that a reasonable comparison could be made between the learnt and non-learnt approaches. Classification with Naive Bayes was consistently the highest performer, always outperforming Nearest Neighbour by around 5%. When comparing the Multinomial model versus the Multivariate model, they were both almost identical. An extension of Naive Bayes which would have been interesting would have been to apply some feature selection when classifying, as this is generally a requirement of a Multivariate model. One of the main objectives of this project was to compare the effectiveness of learnt and non-learnt approaches to recognition. An non-learnt approach has the capacity to recognise any chord without the need for training, whereas a learnt approach requires training. In terms of pure accuracy, the learnt Bayesian approach had the highest accuracy of an almost perfect 99% for common chords. However, a large amount of training data would need to be acquired to apply this approach to real-time recognition and it would be hard to apply to other instruments as a whole other set of training examples would be needed. Therefore it could be argued that Nearest Neighbour would be the correct method choice as its accuracy was still high enough for use in a real time system. It also has the advantage of being more flexible to the expansion of the chord library, as only an example PCP is needed for a chord to be recognised. In addition it can be easily applied to other instruments as the alterations required to accommodate a new instrument are only needed at the feature extraction process. However, a large chord library presents the problem of slow recognition at real time, as hundreds of distances have to be calculated before being sorted. This problem can be solved by writing more efficient code, possibly using vectorisation in MATLAB. The demonstration developed was a success because it was easy to use and also performed very quickly in real time. This was a MATLAB interface, and therefore not portable. Reg:

44 7.2. Future Work In the future, it would be desirable to produce some real world applications for the system. Such an application would be a form of teaching tool, which would communicate to the user which chord to play and be able to inform them if the chord was correct. This could be extended further to other instruments beyond the guitar and could prove to be a very useful tool for self teaching musicians. It could also prove useful to implement this system onto mobile devices, in order to give musicians portable access to chord recognition, as there are currently almost no applications providing this kind of service. An application was studied during this project but the tools for recording pulse code modulated audio from an android phone are poorly documented, and lack of an Apple device meant that an iphone application would be impossible Summary In terms of the project proposal, the aims were to explore the possibility of being able to recognise three note chords and develop a MATLAB interface which would demonstrate this recognition. In reality, these aims were not only met, but exceeded, as the system developed is able to recognise chords with any number of notes, using both a template and probabilistic approach. Another aim was to test methods on both electric and acoustic guitars, which was successfully achieved. In terms of the project roadmap, everything was completed in the correct time period, however with hindsight, possibly less time spent working on the findpeaks function would have been beneficial as it was replaced by the peak picking function halfway through the project. Overall this was a good, enjoyable and successful project which produced many interesting results, and given time it would be fun to further develop the ideas explored. Reg:

45 References Alexander, M. J. (2014). The Practical Guide to Modern Music Theory for Guitarists: Second Edition. CreateSpace Independent Publishing Platform. Bello, J. P. and Pickens, J. (2005). A robust mid-level representation for harmonic content in music signals. In ISMIR, volume 5, pages Chafe, J. (1986). Source separation and note identification in polyphonic music. Centre for Computer Research in Music and Acoustics. Cooley, J. W. and Tukey, J. W. (1965). An algorithm for the machine calculation of complex fourier series. Mathematics of computation, 19(90): Cremer, M. (2004). A system for harmonic analysis of polyphonic music. In Audio Engineering Society Conference: 25th International Conference: Metadata for Audio. Audio Engineering Society. Fujishima, T. (1999). Realtime chord recognition of musical sound: A system using common lisp music. In Proc. ICMC, volume 1999, pages Gómez, E. (2006). Tonal description of music audio signals. PhD thesis, UPF Barcelona. Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete fourier transform. Proceedings of the IEEE, 66(1): Harte, C. and Sandler, M. (2005). Automatic chord identifcation using a quantised chromagram. In Audio Engineering Society Convention 118. Audio Engineering Society. Lee, K. (2006). Automatic chord recognition from audio using enhanced pitch class profile. In Proc. of the International Computer Music Conference. Lerdahl, F. and Jackendoff, R. (1985). A generative theory of tonal music. MIT press. Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to information retrieval, volume 1. Cambridge university press Cambridge. Reg:

46 Oudre, L., Févotte, C., and Grenier, Y. (2011a). volume 19, pages IEEE. Oudre, L., Grenier, Y., and Févotte, C. (2009). Template-based chord recognition: Influence of the chord types. In ISMIR, pages Oudre, L., Grenier, Y., and Févotte, C. (2011b). Chord recognition by fitting rescaled chroma vectors to chord templates. Audio, Speech, and Language Processing, IEEE Transactions on, 19(7): Rabiner, L. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): Ryynänen, M. P. and Klapuri, A. P. (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3): Sheh, A. and Ellis, D. P. (2003). Chord segmentation and recognition using em-trained hidden markov models. ISMIR 2003, pages Stark, A. M. and Plumbley, M. D. (2009). Real-time chord recognition for live performance. Ann Arbor, MI: MPublishing, University of Michigan Library. Taylor, E. (1989). The AB Guide to Music Theory, Part 1 (Pt. 1). Associated Board of the Royal Schools of Music. Reg:

47 A. Single Note and Two Note Recognition Confusion Matrices Reg:

48 B. Nearest Neighbour Confusion Matrix Reg:

49 C. Naive Bayes Confusion Matrix Reg:

50 D. Nearest Neighbour and Naive Bayes for Acoustic Chords Reg:

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su Lecture 5: Pitch and Chord (1) Chord Recognition Li Su Recap: short-time Fourier transform Given a discrete-time signal x(t) sampled at a rate f s. Let window size N samples, hop size H samples, then the

More information

CONCURRENT ESTIMATION OF CHORDS AND KEYS FROM AUDIO

CONCURRENT ESTIMATION OF CHORDS AND KEYS FROM AUDIO CONCURRENT ESTIMATION OF CHORDS AND KEYS FROM AUDIO Thomas Rocher, Matthias Robine, Pierre Hanna LaBRI, University of Bordeaux 351 cours de la Libration 33405 Talence Cedex, France {rocher,robine,hanna}@labri.fr

More information

LCC for Guitar - Introduction

LCC for Guitar - Introduction LCC for Guitar - Introduction In order for guitarists to understand the significance of the Lydian Chromatic Concept of Tonal Organization and the concept of Tonal Gravity, one must first look at the nature

More information

Generating Groove: Predicting Jazz Harmonization

Generating Groove: Predicting Jazz Harmonization Generating Groove: Predicting Jazz Harmonization Nicholas Bien (nbien@stanford.edu) Lincoln Valdez (lincolnv@stanford.edu) December 15, 2017 1 Background We aim to generate an appropriate jazz chord progression

More information

Automatic Transcription of Monophonic Audio to MIDI

Automatic Transcription of Monophonic Audio to MIDI Automatic Transcription of Monophonic Audio to MIDI Jiří Vass 1 and Hadas Ofir 2 1 Czech Technical University in Prague, Faculty of Electrical Engineering Department of Measurement vassj@fel.cvut.cz 2

More information

CHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS

CHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS CHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS Xinglin Zhang Dept. of Computer Science University of Regina Regina, SK CANADA S4S 0A2 zhang46x@cs.uregina.ca David Gerhard Dept. of Computer Science,

More information

Automatic Chord Recognition

Automatic Chord Recognition Automatic Chord Recognition Ke Ma Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706 kma@cs.wisc.edu Abstract Automatic chord recognition is the first step towards complex

More information

AUTOMATED MUSIC TRACK GENERATION

AUTOMATED MUSIC TRACK GENERATION AUTOMATED MUSIC TRACK GENERATION LOUIS EUGENE Stanford University leugene@stanford.edu GUILLAUME ROSTAING Stanford University rostaing@stanford.edu Abstract: This paper aims at presenting our method to

More information

APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS

APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS Matthias Mauch and Simon Dixon Queen Mary University of London, Centre for Digital Music {matthias.mauch, simon.dixon}@elec.qmul.ac.uk

More information

Extraction of Musical Pitches from Recorded Music. Mark Palenik

Extraction of Musical Pitches from Recorded Music. Mark Palenik Extraction of Musical Pitches from Recorded Music Mark Palenik ABSTRACT Methods of determining the musical pitches heard by the human ear hears when recorded music is played were investigated. The ultimate

More information

Discrete Fourier Transform (DFT)

Discrete Fourier Transform (DFT) Amplitude Amplitude Discrete Fourier Transform (DFT) DFT transforms the time domain signal samples to the frequency domain components. DFT Signal Spectrum Time Frequency DFT is often used to do frequency

More information

Music and Engineering: Just and Equal Temperament

Music and Engineering: Just and Equal Temperament Music and Engineering: Just and Equal Temperament Tim Hoerning Fall 8 (last modified 9/1/8) Definitions and onventions Notes on the Staff Basics of Scales Harmonic Series Harmonious relationships ents

More information

Chapter 5 Window Functions. periodic with a period of N (number of samples). This is observed in table (3.1).

Chapter 5 Window Functions. periodic with a period of N (number of samples). This is observed in table (3.1). Chapter 5 Window Functions 5.1 Introduction As discussed in section (3.7.5), the DTFS assumes that the input waveform is periodic with a period of N (number of samples). This is observed in table (3.1).

More information

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES Jean-Baptiste Rolland Steinberg Media Technologies GmbH jb.rolland@steinberg.de ABSTRACT This paper presents some concepts regarding

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

REAL-TIME BEAT-SYNCHRONOUS ANALYSIS OF MUSICAL AUDIO

REAL-TIME BEAT-SYNCHRONOUS ANALYSIS OF MUSICAL AUDIO Proc. of the th Int. Conference on Digital Audio Effects (DAFx-9), Como, Italy, September -, 9 REAL-TIME BEAT-SYNCHRONOUS ANALYSIS OF MUSICAL AUDIO Adam M. Stark, Matthew E. P. Davies and Mark D. Plumbley

More information

Get Rhythm. Semesterthesis. Roland Wirz. Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich

Get Rhythm. Semesterthesis. Roland Wirz. Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich Distributed Computing Get Rhythm Semesterthesis Roland Wirz wirzro@ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich Supervisors: Philipp Brandes, Pascal Bissig

More information

Guitar Music Transcription from Silent Video. Temporal Segmentation - Implementation Details

Guitar Music Transcription from Silent Video. Temporal Segmentation - Implementation Details Supplementary Material Guitar Music Transcription from Silent Video Shir Goldstein, Yael Moses For completeness, we present detailed results and analysis of tests presented in the paper, as well as implementation

More information

VISUAL PITCH CLASS PROFILE A Video-Based Method for Real-Time Guitar Chord Identification

VISUAL PITCH CLASS PROFILE A Video-Based Method for Real-Time Guitar Chord Identification VISUAL PITCH CLASS PROFILE A Video-Based Method for Real-Time Guitar Chord Identification First Author Name, Second Author Name Institute of Problem Solving, XYZ University, My Street, MyTown, MyCountry

More information

ME scope Application Note 01 The FFT, Leakage, and Windowing

ME scope Application Note 01 The FFT, Leakage, and Windowing INTRODUCTION ME scope Application Note 01 The FFT, Leakage, and Windowing NOTE: The steps in this Application Note can be duplicated using any Package that includes the VES-3600 Advanced Signal Processing

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Laboratory Assignment 4. Fourier Sound Synthesis

Laboratory Assignment 4. Fourier Sound Synthesis Laboratory Assignment 4 Fourier Sound Synthesis PURPOSE This lab investigates how to use a computer to evaluate the Fourier series for periodic signals and to synthesize audio signals from Fourier series

More information

Comparison of a Pleasant and Unpleasant Sound

Comparison of a Pleasant and Unpleasant Sound Comparison of a Pleasant and Unpleasant Sound B. Nisha 1, Dr. S. Mercy Soruparani 2 1. Department of Mathematics, Stella Maris College, Chennai, India. 2. U.G Head and Associate Professor, Department of

More information

AUTOMATIC X TRADITIONAL DESCRIPTOR EXTRACTION: THE CASE OF CHORD RECOGNITION

AUTOMATIC X TRADITIONAL DESCRIPTOR EXTRACTION: THE CASE OF CHORD RECOGNITION AUTOMATIC X TRADITIONAL DESCRIPTOR EXTRACTION: THE CASE OF CHORD RECOGNITION Giordano Cabral François Pachet Jean-Pierre Briot LIP6 Paris 6 8 Rue du Capitaine Scott Sony CSL Paris 6 Rue Amyot LIP6 Paris

More information

ECEn 487 Digital Signal Processing Laboratory. Lab 3 FFT-based Spectrum Analyzer

ECEn 487 Digital Signal Processing Laboratory. Lab 3 FFT-based Spectrum Analyzer ECEn 487 Digital Signal Processing Laboratory Lab 3 FFT-based Spectrum Analyzer Due Dates This is a three week lab. All TA check off must be completed by Friday, March 14, at 3 PM or the lab will be marked

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Lab 3 FFT based Spectrum Analyzer

Lab 3 FFT based Spectrum Analyzer ECEn 487 Digital Signal Processing Laboratory Lab 3 FFT based Spectrum Analyzer Due Dates This is a three week lab. All TA check off must be completed prior to the beginning of class on the lab book submission

More information

Recognizing Chords with EDS: Part One

Recognizing Chords with EDS: Part One Recognizing Chords with EDS: Part One Giordano Cabral 1, François Pachet 2, and Jean-Pierre Briot 1 1 Laboratoire d Informatique de Paris 6 8 Rue du Capitaine Scott, 75015 Paris, France {Giordano.CABRAL,

More information

N. Papadakis, N. Reynolds, C.Ramirez-Jimenez, M.Pharaoh

N. Papadakis, N. Reynolds, C.Ramirez-Jimenez, M.Pharaoh Relation comparison methodologies of the primary and secondary frequency components of acoustic events obtained from thermoplastic composite laminates under tensile stress N. Papadakis, N. Reynolds, C.Ramirez-Jimenez,

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

EE 215 Semester Project SPECTRAL ANALYSIS USING FOURIER TRANSFORM

EE 215 Semester Project SPECTRAL ANALYSIS USING FOURIER TRANSFORM EE 215 Semester Project SPECTRAL ANALYSIS USING FOURIER TRANSFORM Department of Electrical and Computer Engineering Missouri University of Science and Technology Page 1 Table of Contents Introduction...Page

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004

More information

CHAPTER 8: EXTENDED TETRACHORD CLASSIFICATION

CHAPTER 8: EXTENDED TETRACHORD CLASSIFICATION CHAPTER 8: EXTENDED TETRACHORD CLASSIFICATION Chapter 7 introduced the notion of strange circles: using various circles of musical intervals as equivalence classes to which input pitch-classes are assigned.

More information

Copyright MCMLXXIX by Alfred Publishing Co., Inc.

Copyright MCMLXXIX by Alfred Publishing Co., Inc. This CHORD DICTIONARY shows the notation, fingering and keyboard diagrams for all of the important chords used in modern popular, music. In order to make this dictionary useful to the amateur pianist,

More information

FFT analysis in practice

FFT analysis in practice FFT analysis in practice Perception & Multimedia Computing Lecture 13 Rebecca Fiebrink Lecturer, Department of Computing Goldsmiths, University of London 1 Last Week Review of complex numbers: rectangular

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

Reducing comb filtering on different musical instruments using time delay estimation

Reducing comb filtering on different musical instruments using time delay estimation Reducing comb filtering on different musical instruments using time delay estimation Alice Clifford and Josh Reiss Queen Mary, University of London alice.clifford@eecs.qmul.ac.uk Abstract Comb filtering

More information

AP Music Theory 2009 Scoring Guidelines

AP Music Theory 2009 Scoring Guidelines AP Music Theory 2009 Scoring Guidelines The College Board The College Board is a not-for-profit membership association whose mission is to connect students to college success and opportunity. Founded in

More information

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA Department of Electrical and Computer Engineering ELEC 423 Digital Signal Processing Project 2 Due date: November 12 th, 2013 I) Introduction In ELEC

More information

Transcription of Piano Music

Transcription of Piano Music Transcription of Piano Music Rudolf BRISUDA Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia xbrisuda@is.stuba.sk

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time. 2. Physical sound 2.1 What is sound? Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time. Figure 2.1: A 0.56-second audio clip of

More information

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting Julius O. Smith III (jos@ccrma.stanford.edu) Center for Computer Research in Music and Acoustics (CCRMA)

More information

AutoScore: The Automated Music Transcriber Project Proposal , Spring 2011 Group 1

AutoScore: The Automated Music Transcriber Project Proposal , Spring 2011 Group 1 AutoScore: The Automated Music Transcriber Project Proposal 18-551, Spring 2011 Group 1 Suyog Sonwalkar, Itthi Chatnuntawech ssonwalk@andrew.cmu.edu, ichatnun@andrew.cmu.edu May 1, 2011 Abstract This project

More information

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

More information

ACOUSTICS. Sounds are vibrations in the air, extremely small and fast fluctuations of airpressure.

ACOUSTICS. Sounds are vibrations in the air, extremely small and fast fluctuations of airpressure. ACOUSTICS 1. VIBRATIONS Sounds are vibrations in the air, extremely small and fast fluctuations of airpressure. These vibrations are generated from sounds sources and travel like waves in the water; sound

More information

Orthonormal bases and tilings of the time-frequency plane for music processing Juan M. Vuletich *

Orthonormal bases and tilings of the time-frequency plane for music processing Juan M. Vuletich * Orthonormal bases and tilings of the time-frequency plane for music processing Juan M. Vuletich * Dept. of Computer Science, University of Buenos Aires, Argentina ABSTRACT Conventional techniques for signal

More information

Aberehe Niguse Gebru ABSTRACT. Keywords Autocorrelation, MATLAB, Music education, Pitch Detection, Wavelet

Aberehe Niguse Gebru ABSTRACT. Keywords Autocorrelation, MATLAB, Music education, Pitch Detection, Wavelet Master of Industrial Sciences 2015-2016 Faculty of Engineering Technology, Campus Group T Leuven This paper is written by (a) student(s) in the framework of a Master s Thesis ABC Research Alert VIRTUAL

More information

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Monophony/Polyphony Classification System using Fourier of Fourier Transform International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye

More information

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters FIR Filter Design Chapter Intended Learning Outcomes: (i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters (ii) Ability to design linear-phase FIR filters according

More information

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that EE 464 Short-Time Fourier Transform Fall 2018 Read Text, Chapter 4.9. and Spectrogram Many signals of importance have spectral content that changes with time. Let xx(nn), nn = 0, 1,, NN 1 1 be a discrete-time

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Signal Processing First Lab 20: Extracting Frequencies of Musical Tones

Signal Processing First Lab 20: Extracting Frequencies of Musical Tones Signal Processing First Lab 20: Extracting Frequencies of Musical Tones Pre-Lab and Warm-Up: You should read at least the Pre-Lab and Warm-up sections of this lab assignment and go over all exercises in

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

AP Music Theory 2011 Scoring Guidelines

AP Music Theory 2011 Scoring Guidelines AP Music Theory 2011 Scoring Guidelines The College Board The College Board is a not-for-profit membership association whose mission is to connect students to college success and opportunity. Founded in

More information

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue, Ver. I (Mar. - Apr. 7), PP 4-46 e-issn: 9 4, p-issn No. : 9 497 www.iosrjournals.org Speech Enhancement Using Spectral Flatness Measure

More information

Advanced Audiovisual Processing Expected Background

Advanced Audiovisual Processing Expected Background Advanced Audiovisual Processing Expected Background As an advanced module, we will not cover introductory topics in lecture. You are expected to already be proficient with all of the following topics,

More information

Error-Correcting Codes

Error-Correcting Codes Error-Correcting Codes Information is stored and exchanged in the form of streams of characters from some alphabet. An alphabet is a finite set of symbols, such as the lower-case Roman alphabet {a,b,c,,z}.

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at http://www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2009 Vol. 9, No. 1, January-February 2010 The Discrete Fourier Transform, Part 5: Spectrogram

More information

FFT Spectrum Analyzer

FFT Spectrum Analyzer FFT Spectrum Analyzer SR770 100 khz single-channel FFT spectrum analyzer SR7770 FFT Spectrum Analyzers DC to 100 khz bandwidth 90 db dynamic range Low-distortion source Harmonic, band & sideband analysis

More information

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound Paul Masri, Prof. Andrew Bateman Digital Music Research Group, University of Bristol 1.4

More information

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters FIR Filter Design Chapter Intended Learning Outcomes: (i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters (ii) Ability to design linear-phase FIR filters according

More information

Additional Open Chords

Additional Open Chords Additional Open Chords Chords can be altered (changed in harmonic structure) by adding notes or substituting one note for another. If you add a note that is already in the chord, the name does not change.

More information

Long Range Acoustic Classification

Long Range Acoustic Classification Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire

More information

ELECTRONOTES APPLICATION NOTE NO Hanshaw Road Ithaca, NY Nov 7, 2014 MORE CONCERNING NON-FLAT RANDOM FFT

ELECTRONOTES APPLICATION NOTE NO Hanshaw Road Ithaca, NY Nov 7, 2014 MORE CONCERNING NON-FLAT RANDOM FFT ELECTRONOTES APPLICATION NOTE NO. 416 1016 Hanshaw Road Ithaca, NY 14850 Nov 7, 2014 MORE CONCERNING NON-FLAT RANDOM FFT INTRODUCTION A curiosity that has probably long been peripherally noted but which

More information

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

A Parametric Model for Spectral Sound Synthesis of Musical Sounds A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick

More information

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner University of Rochester ABSTRACT One of the most important applications in the field of music information processing is beat finding. Humans have

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

DSP First. Laboratory Exercise #11. Extracting Frequencies of Musical Tones

DSP First. Laboratory Exercise #11. Extracting Frequencies of Musical Tones DSP First Laboratory Exercise #11 Extracting Frequencies of Musical Tones This lab is built around a single project that involves the implementation of a system for automatically writing a musical score

More information

DIGITAL FILTERS. !! Finite Impulse Response (FIR) !! Infinite Impulse Response (IIR) !! Background. !! Matlab functions AGC DSP AGC DSP

DIGITAL FILTERS. !! Finite Impulse Response (FIR) !! Infinite Impulse Response (IIR) !! Background. !! Matlab functions AGC DSP AGC DSP DIGITAL FILTERS!! Finite Impulse Response (FIR)!! Infinite Impulse Response (IIR)!! Background!! Matlab functions 1!! Only the magnitude approximation problem!! Four basic types of ideal filters with magnitude

More information

Fourier Signal Analysis

Fourier Signal Analysis Part 1B Experimental Engineering Integrated Coursework Location: Baker Building South Wing Mechanics Lab Experiment A4 Signal Processing Fourier Signal Analysis Please bring the lab sheet from 1A experiment

More information

INTERNATIONAL BACCALAUREATE PHYSICS EXTENDED ESSAY

INTERNATIONAL BACCALAUREATE PHYSICS EXTENDED ESSAY INTERNATIONAL BACCALAUREATE PHYSICS EXTENDED ESSAY Investigation of sounds produced by stringed instruments Word count: 2922 Abstract This extended essay is about sound produced by stringed instruments,

More information

MUSIC THEORY GLOSSARY

MUSIC THEORY GLOSSARY MUSIC THEORY GLOSSARY Accelerando Is a term used for gradually accelerating or getting faster as you play a piece of music. Allegro Is a term used to describe a tempo that is at a lively speed. Andante

More information

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

RAM Analytical Skills Introductory Theory Primer Part 1: Intervals Part 2: Scales and Keys Part 3: Forming Chords Within Keys Part 4: Voice-leading

RAM Analytical Skills Introductory Theory Primer Part 1: Intervals Part 2: Scales and Keys Part 3: Forming Chords Within Keys Part 4: Voice-leading RAM Analytical Skills Introductory Theory Primer Part 1: Intervals Part 2: Scales and Keys Part 3: Forming Chords Within Keys Part 4: Voice-leading This is intended to support you in checking you have

More information

How to Utilize a Windowing Technique for Accurate DFT

How to Utilize a Windowing Technique for Accurate DFT How to Utilize a Windowing Technique for Accurate DFT Product Version IC 6.1.5 and MMSIM 12.1 December 6, 2013 By Michael Womac Copyright Statement 2013 Cadence Design Systems, Inc. All rights reserved

More information

Extraction of tacho information from a vibration signal for improved synchronous averaging

Extraction of tacho information from a vibration signal for improved synchronous averaging Proceedings of ACOUSTICS 2009 23-25 November 2009, Adelaide, Australia Extraction of tacho information from a vibration signal for improved synchronous averaging Michael D Coats, Nader Sawalhi and R.B.

More information

Statistical Pulse Measurements using USB Power Sensors

Statistical Pulse Measurements using USB Power Sensors Statistical Pulse Measurements using USB Power Sensors Today s modern USB Power Sensors are capable of many advanced power measurements. These Power Sensors are capable of demodulating the signal and processing

More information

Beginner Guitar Theory: The Essentials

Beginner Guitar Theory: The Essentials Beginner Guitar Theory: The Essentials By: Kevin Depew For: RLG Members Beginner Guitar Theory - The Essentials Relax and Learn Guitar s theory of learning guitar: There are 2 sets of skills: Physical

More information

Connections Power Jack This piano can be powered by current from a standard household wall outlet by using the specified AC adaptor. The power jack is located on the rear panel of the piano body. Make

More information

Synthesis Techniques. Juan P Bello

Synthesis Techniques. Juan P Bello Synthesis Techniques Juan P Bello Synthesis It implies the artificial construction of a complex body by combining its elements. Complex body: acoustic signal (sound) Elements: parameters and/or basic signals

More information

Main Screen Description

Main Screen Description Dear User: Thank you for purchasing the istrobosoft tuning app for your mobile device. We hope you enjoy this software and its feature-set as we are constantly expanding its capability and stability. With

More information

PART I: The questions in Part I refer to the aliasing portion of the procedure as outlined in the lab manual.

PART I: The questions in Part I refer to the aliasing portion of the procedure as outlined in the lab manual. Lab. #1 Signal Processing & Spectral Analysis Name: Date: Section / Group: NOTE: To help you correctly answer many of the following questions, it may be useful to actually run the cases outlined in the

More information

Music I. Marking Period 1. Marking Period 3

Music I. Marking Period 1. Marking Period 3 Week Marking Period 1 Week Marking Period 3 1 Intro. Piano, Guitar, Theory 11 Intervals Major & Minor 2 Intro. Piano, Guitar, Theory 12 Intervals Major, Minor, & Augmented 3 Music Theory meter, dots, mapping,

More information

Discrete Fourier Transform

Discrete Fourier Transform 6 The Discrete Fourier Transform Lab Objective: The analysis of periodic functions has many applications in pure and applied mathematics, especially in settings dealing with sound waves. The Fourier transform

More information

Design of FIR Filters

Design of FIR Filters Design of FIR Filters Elena Punskaya www-sigproc.eng.cam.ac.uk/~op205 Some material adapted from courses by Prof. Simon Godsill, Dr. Arnaud Doucet, Dr. Malcolm Macleod and Prof. Peter Rayner 1 FIR as a

More information

Definition of Basic Terms:

Definition of Basic Terms: Definition of Basic Terms: Temperament: A system of tuning where intervals are altered from those that are acoustically pure (Harnsberger, 1996, p. 130) A temperament is any plan that describes the adjustments

More information

Automatic Amplitude Estimation Strategies for CBM Applications

Automatic Amplitude Estimation Strategies for CBM Applications 18th World Conference on Nondestructive Testing, 16-20 April 2012, Durban, South Africa Automatic Amplitude Estimation Strategies for CBM Applications Thomas L LAGÖ Tech Fuzion, P.O. Box 971, Fayetteville,

More information

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido The Discrete Fourier Transform Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido CCC-INAOE Autumn 2015 The Discrete Fourier Transform Fourier analysis is a family of mathematical

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

Topic 6. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

Topic 6. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith) Topic 6 The Digital Fourier Transform (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith) 10 20 30 40 50 60 70 80 90 100 0-1 -0.8-0.6-0.4-0.2 0 0.2 0.4

More information

PHYSICS AND THE GUITAR JORDY NETZEL LAKEHEAD UNIVERSITY

PHYSICS AND THE GUITAR JORDY NETZEL LAKEHEAD UNIVERSITY PHYSICS AND THE GUITAR JORDY NETZEL LAKEHEAD UNIVERSITY 2 PHYSICS & THE GUITAR TYPE THE DOCUMENT TITLE Wave Mechanics Starting with wave mechanics, or more specifically standing waves, it follows then

More information

Measurement of RMS values of non-coherently sampled signals. Martin Novotny 1, Milos Sedlacek 2

Measurement of RMS values of non-coherently sampled signals. Martin Novotny 1, Milos Sedlacek 2 Measurement of values of non-coherently sampled signals Martin ovotny, Milos Sedlacek, Czech Technical University in Prague, Faculty of Electrical Engineering, Dept. of Measurement Technicka, CZ-667 Prague,

More information

The Fundamentals of Mixed Signal Testing

The Fundamentals of Mixed Signal Testing The Fundamentals of Mixed Signal Testing Course Information The Fundamentals of Mixed Signal Testing course is designed to provide the foundation of knowledge that is required for testing modern mixed

More information

FFT 1 /n octave analysis wavelet

FFT 1 /n octave analysis wavelet 06/16 For most acoustic examinations, a simple sound level analysis is insufficient, as not only the overall sound pressure level, but also the frequency-dependent distribution of the level has a significant

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Sinusoids and DSP notation George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 38 Table of Contents I 1 Time and Frequency 2 Sinusoids and Phasors G. Tzanetakis

More information