Electric Guitar Pickups Recognition

Similar documents
Applications of Music Processing

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Design and Implementation of an Audio Classification System Based on SVM

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

An Optimization of Audio Classification and Segmentation using GASOM Algorithm

SOUND SOURCE RECOGNITION AND MODELING

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

CS 188: Artificial Intelligence Spring Speech in an Hour

A multi-class method for detecting audio events in news broadcasts

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Implementing Speaker Recognition

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

8.3 Basic Parameters for Audio

Voice Activity Detection

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Classification of Structural Failure for Multi-rotor UAS CS289A Final Project

Environmental Sound Recognition using MP-based Features

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Basic Characteristics of Speech Signal Analysis

MURDOCH RESEARCH REPOSITORY

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Audio Imputation Using the Non-negative Hidden Markov Model

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

Feature Spaces and Machine Learning Regimes for Audio Classification

Campus Location Recognition using Audio Signals

Guitar Music Transcription from Silent Video. Temporal Segmentation - Implementation Details

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Automatic Morse Code Recognition Under Low SNR

How to Use the Method of Multivariate Statistical Analysis Into the Equipment State Monitoring. Chunhua Yang

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

CHORD RECOGNITION USING INSTRUMENT VOICING CONSTRAINTS

An Improved Voice Activity Detection Based on Deep Belief Networks

Introduction of Audio and Music

Feature Selection and Extraction of Audio Signal

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Advanced Signal Processing and Digital Noise Reduction

Automatic classification of traffic noise

Isolated Digit Recognition Using MFCC AND DTW

SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES

Drum Transcription Based on Independent Subspace Analysis

Gammatone Cepstral Coefficient for Speaker Identification

Infrasound Source Identification Based on Spectral Moment Features

Adaptive Filters Application of Linear Prediction

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Voice Recognition Technology Using Neural Networks

Biometric: EEG brainwaves

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

Multimedia Forensics

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE

Estimating Single-Channel Source Separation Masks: Relevance Vector Machine Classifiers vs. Pitch-Based Masking

Supplementary Information for paper Communicating with sentences: A multi-word naming game model

SpeakerID - Voice Activity Detection

Classification of Voltage Sag Using Multi-resolution Analysis and Support Vector Machine

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS

Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models

Mikko Myllymäki and Tuomas Virtanen

Autonomous Vehicle Speaker Verification System

Roberto Togneri (Signal Processing and Recognition Lab)

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Contents 1 Introduction Optical Character Recognition Systems Soft Computing Techniques for Optical Character Recognition Systems

COM 12 C 288 E October 2011 English only Original: English

Onset Detection Revisited

An Automatic Audio Segmentation System for Radio Newscast. Final Project

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Convention e-brief 310

Modulation Classification of Satellite Communication Signals Using Cumulants and Neural Networks

TODAY, wireless communications are an integral part of

MICROCHIP PATTERN RECOGNITION BASED ON OPTICAL CORRELATOR

An Introduction to Machine Learning for Social Scientists

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Bandwidth Extension for Speech Enhancement

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

Physics in Entertainment and the Arts

Chapter 4 SPEECH ENHANCEMENT

Enhancing 3D Audio Using Blind Bandwidth Extension

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

EE 300W Lab 2: Optical Theremin Critical Design Review

AN IMPROVED NO-REFERENCE SHARPNESS METRIC BASED ON THE PROBABILITY OF BLUR DETECTION. Niranjan D. Narvekar and Lina J. Karam

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Wheel Health Monitoring Using Onboard Sensors

VECTOR QUANTIZATION-BASED SPEECH RECOGNITION SYSTEM FOR HOME APPLIANCES

Discriminative Training for Automatic Speech Recognition

Predicting the outcome of NFL games using machine learning Babak Hamadani bhamadan-at-stanford.edu cs229 - Stanford University

Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music

Transcription:

Electric Guitar Pickups Recognition Warren Jonhow Lee warrenjo@stanford.edu Yi-Chun Chen yichunc@stanford.edu Abstract Electric guitar pickups convert vibration of strings to eletric signals and thus direcly affect the quality of sounds. In this project, two machine learning methods, supporting vector machines (SVM) and Bayesian networks, are applied to classify pickups from sixteen audio features. The result shows that SVM with linear kernel and low penalty term is a good classifier, which has 85% of both training and testing accuracy. In addition, Bayesian networks, which has slightly weaker performance on classification, can easily incorporate more variables and lead to price prediction model of guitars. 1 Introduction Pickup devices are electric transducers that captures vibrations of guitar strings and converts them to electric signals. There are two commonly used pickups: single coil and humbuckers, and they are shown in Figure 1. Ideally, the classification of pickups can be achieved by selecting features from audio records and learning, since pickups directly affect the sound of guitars. On the other hand, guitar pedals, such as overdrive effect, would distort the sound and thus decrease the classification accuracy. Therefore, the guitar sound used in this project should be clean and recorded directly from amplifier or line in. Figure 1: Two guitar pickups: single coil (left) and humbuckers (right) 2 Data Extraction In this project, the data extraction consists of two stages: preprocessing and feature extraction. At first stage, silence and noise are removed from original audio records, since they have no contribution to later machine learning process. This removing process is achieved by audio segmentation algorithm [1], which is demonstrated in Figure 2. The top plot shows the original audio record. The bottom plot demonstrates the audio segmentation algorithm adapts SVM to distinguish high-energy 1

and low-energy short term frames. The high-energy frames correspond to the desired learning samples. The low-energy frames are considered noise or silence and therefore discarded. Figure 2: Demonstration of audio segmentation algorithm. The low-energy frames, such as the rightest one in the figure, are classified as noise/silence and thus discarded. The high-energy frames are remained for later learning processes. After preprocessing, sixteen features are extracted from audio signals: thirteen Mel-frequency cepstral coefficients (MFCCs), spectral spread, spectral centroid and spectral flatness. MFCCs are commonly used in speech recognition systems as short-term power spectrum of sounds. Spectral spread is associated with the brightness of sound. Spectral spread measures the bandwidth of the spectrum. Spectal flatness represents noisiness of the power spectrum. MFCCs and the other three spectral features in a sound are shown in Figure 3 and Figure 4. Figure 3: Variation of thirteen Mel-frequency cepstral coefficients with respect to time frames. Figure 4: Variation of three spectral features with respect to time frames 3 Supporting Vector Machines After obtaining features, SVM is applied to classify two pickups. Note that the training data is arranged chronologically, since the temporal property of music can not be ignored. In our tests, such arrangement can improve the learning curves. SVM is applied with several kernels and various amount of penalty. The following four plots show the learning curves of SVM with linear kernel. In each plot, the green curve is training score (accuracy) versus size of training data. The blue curve is cross-validation score versus size of training 2

data, which can be considered as test accuracy. The desired result is that the green curve and the blue curve converge to the same value. As shown in figures, low penalty C = 0.001 SVM with linear kernel achieves such convergence. Figure 5: SVM with linear kernel and penalty C = 1 Figure 6: SVM with linear kernel and penalty C = 0.1 Figure 7: SVM with linear kernel and penalty C = 0.01 Figure 8: SVM with linear kernel and penalty C = 0.001 SVM with polynomial kernel, which is {1, x, x 2, x 3 }, is also tested. The result is shown in following two plots. It illustrates that penalty does not affect the learning curves under polynomial kernel. In addition, the learning curves indicate SVM with polynomial kernel is over-fitting, since the difference between training accuracy and test accuracy is big. Figure 9: SVM with polynomial kernel and penality C = 1 Figure 10: SVM with polynomial kernel and penality C = 0.001 3

Table 1: Applying the learned SVM to audio files that come from different players on different guitars. Testing File Name Accuracy Sample Size SingleCoil1 (single note) 41.57% 777 SingleCoil2 (mixture) 96.09% 179 SingleCoil3 (mixture) 87.26% 377 Humbucker1 (mixture) 72.77% 459 Humbucker2 (mixture) 91.5% 459 After learning the desired SVM (linear kernel and 0.001 penalty), the next step is to test on new audio files [2], which consist of different pitches, different playing techniques, and different tones. Table 1 shows the accuarcy of the learned SVM on five test audio files. SingleCoil1 is composed of only one note and the other four are mixtures of chords and notes. Table 1 indicates that the SVM performed bad on the single note audio file. This matches our expectation, since the SVM is learned from audio files with several chords and notes. In addition, the learned SVM has high accuarcy on the other four audio files. It demonstrates that SVM is a good classifier for guitar pickups, even if the recording data come from different players on different guitars. 4 Bayesian Networks Bayesian network is a probabilistic graphical model that represents random variables and their conditional dependencies via a directed acyclic graph. It has been widely applied to artificial intelligence, medical diagnosis, etc. However, in this pickup classification problem, there are two challenging points. First, the network structure of Bayesian network is not known in advance. Second, data of features are continuous-valued. To solve the problems, the recent research of one team member at the Stanford Intelligent System Lab has been used. The research applies Bayesian statistics with the proposed priors to find the most probable discretization policy on each continuous variable according to the data of variables in its Markov blanket. In addition, the discretization procedure is incorporated with K2 structure learning algorithm to learn a discrete Bayesian network. For more detail, please refer to [3]. Once the discrete Bayesian network is learned from the continuous data, the prediction on testing data is done as follows: assume X n is the categorical variable and (x 1, x 2,..., x n ) is the testing data, then the prediction is made by calculating P (X n x 1, x 2,..., x n 1 ) P (X n, x 1, x 2,..., x n 1 ), and choosing the value of X n with higher probability. Notice that the joint probability on the RHS can be factorized as P (x 1, x 2,..., x n ) = n i=1 P (x i parent xi ). Figure 11 is the learned discrete Bayesian network. In order to reduce the runtime, only seven important features (MFCC2 to MFCC6, Spectral Spread, Spectral Flatness) are used in the learning process and the upper bound of parents for each node is limited by two. This network has 93% accuarcy on training data and 70% accuracy over all testing data in Table 1 except SingleCoil1. The performace is slightly weeker than SVM. Although Bayesian networks performed worse than SVM in the classification problemsm, they have an advantage which SVM can not easily achieve: incorporating with other features and variables in the network. For example, in the future work, guitar brands and wood materials obtained from image processing of videos might be introduced to determine price of guitars along with pickup information. Then Figure 12 shows a possible network, where price is assumed be directly affected by pickups, wood materials, and brands. 4

M3 M4 M6 SF SS M2 M5 PU Figure 11: The learned Bayesian network from seven selected features and the pickup. M stands for MFCC, SS stands for spectral spread, SF stands for spectral flatness, and PU stands for pickups. PU WD AU audio process BD image process PZ Video file Figure 12: A possible Bayesian network to predict price of guitars from videos. AU stands for audio features, PU stands for pickups, WD stands for wood materials, BD stands for brands, and PZ stands for price. The audio process box corresponds to the network in Fig 11. Wood materials and brands can be learned by image process. Pickups, wood material, and brands can be used to predict prices of guitars. 5 Conclusion In this project, SVM with linear kernel is shown to be a good classifier for electric guitar pickups. For audio files with clean sound and recorded directly from amplifier or line in, SVM has 85% accuracy. Bayesian network, which has weaker performance than SVM, has 70% accuracy and provides more variety of models. These results are promising, since audio data come from different players on different guitars with different brands. However, for more general applications, such as learning from random audio files, the method proposed in the project is not feasible. Mixture of guitar sounds and other audio sources would significantly affect the predict accuracy. Therefore, in the future, a method to distinguish guitar sounds from other sources might be introduced before the pickup classification problem. References [1] Theodoros Giannakopoulos and Aggelos Pikrakis, Introduction to Audio Analysis: A MATLAB Approach. Academic Press, 2014. [2] Ted Drozdowski, http://www.gibson.com/news-lifestyle/features/en-us/tone-hunting-0309-2011.aspx. [3] Yi-Chun Chen, Tim Wheeler, and Mykel Kochenderfer, Learn Discrete Bayesian Network from Continuous Data, http://arxiv.org/abs/1512.02406, submitted to Machine Learning. 5