Drum Transcription Based on Independent Subspace Analysis

Similar documents
SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

A Novel Approach to Separation of Musical Signal Sources by NMF

Applications of Music Processing

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

AUTOMATED MUSIC TRACK GENERATION

Automatic Transcription of Monophonic Audio to MIDI

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Rhythm Analysis in Music

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Survey Paper on Music Beat Tracking

REpeating Pattern Extraction Technique (REPET)

A SEGMENTATION-BASED TEMPO INDUCTION METHOD

AutoScore: The Automated Music Transcriber Project Proposal , Spring 2011 Group 1

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Onset Detection Revisited

Transcription of Piano Music

Rhythm Analysis in Music

Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES

Multiresolution Analysis of Connectivity

ADAPTIVE NOISE LEVEL ESTIMATION

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Speech/Music Change Point Detection using Sonogram and AANN

Application of Hilbert-Huang Transform in the Field of Power Quality Events Analysis Manish Kumar Saini 1 and Komal Dhamija 2 1,2

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

An Optimization of Audio Classification and Segmentation using GASOM Algorithm

Audio Imputation Using the Non-negative Hidden Markov Model

Chapter 4 SPEECH ENHANCEMENT

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

Single-channel Mixture Decomposition using Bayesian Harmonic Models

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Michael Clausen Frank Kurth University of Bonn. Proceedings of the Second International Conference on WEB Delivering of Music 2002 IEEE

Advanced audio analysis. Martin Gasser

Adaptive noise level estimation

United Codec. 1. Motivation/Background. 2. Overview. Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University.

Lecture 14: Source Separation

Automobile Independent Fault Detection based on Acoustic Emission Using FFT

PRIMARY-AMBIENT SOURCE SEPARATION FOR UPMIXING TO SURROUND SOUND SYSTEMS

Voice Activity Detection

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Real-time beat estimation using feature extraction

ALTERNATING CURRENT (AC)

Modern spectral analysis of non-stationary signals in power electronics

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes

SGN Audio and Speech Processing

POLYPHONIC PITCH DETECTION BY MATCHING SPECTRAL AND AUTOCORRELATION PEAKS. Sebastian Kraft, Udo Zölzer

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES

ME 535 Course Project: Analysis of Audio Using PCA

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN

Removal of ocular artifacts from EEG signals using adaptive threshold PCA and Wavelet transforms

SGN Audio and Speech Processing

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis

Monophony/Polyphony Classification System using Fourier of Fourier Transform

MUSIC is to a great extent an event-based phenomenon for

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Sound Synthesis Methods

High-speed Noise Cancellation with Microphone Array

Lecture 5: Sinusoidal Modeling

ICA for Musical Signal Separation

ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS

8.3 Basic Parameters for Audio

EXPLORING PHASE INFORMATION IN SOUND SOURCE SEPARATION APPLICATIONS

Roberto Togneri (Signal Processing and Recognition Lab)

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Rhythm Analysis in Music

Real-time Drums Transcription with Characteristic Bandpass Filtering

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

Musical tempo estimation using noise subspace projections

DISCRIMINATION OF SITAR AND TABLA STROKES IN INSTRUMENTAL CONCERTS USING SPECTRAL FEATURES

ON MEASURING SYNCOPATION TO DRIVE AN INTERACTIVE MUSIC SYSTEM

GSM Interference Cancellation For Forensic Audio

Speech Synthesis using Mel-Cepstral Coefficient Feature

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

Remote Sensing. The following figure is grey scale display of SPOT Panchromatic without stretching.

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Music Signal Processing

Enhancing 3D Audio Using Blind Bandwidth Extension

ANALYSIS OF ACOUSTIC FEATURES FOR AUTOMATED MULTI-TRACK MIXING

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Automatic Evaluation of Hindustani Learner s SARGAM Practice

Auditory modelling for speech processing in the perceptual domain

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Hyperspectral image processing and analysis

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Adaptive Sampling and Processing of Ultrasound Images

Speaker and Noise Independent Voice Activity Detection

Experimental Study on Super-resolution Techniques for High-speed UWB Radar Imaging of Human Bodies

Transcription:

Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford, CA hugo@ccrma.stanford.edu ABSTRACT In automatic music transcription, metadata extraction from recorded audio data or speaker separation in video conferencing, it is a significant prerequisite task to analyze and separate the audio signal into their original source components. In this report, I study and analyze a set of methods of the extraction of percussive instruments metadata from polyphonic music. It mainly focuses on the stage of audio source separation, which consists of methods of Principal Components Analysis, Independent Components Analysis, and Non-negative Matrix Factorization. With this spectrogram decomposition method, different samples of music have been analyzed. The results show very encouraging when considering the extraction of non-pitch information rather than perfect note-to-note transcription. 1. MOTIVATION Rhythm is an essential concept for musical structure, and drum scores are prerequisite for further high level description of any rhythmical content, in that percussive instruments contribute to the rhythmical impression. Drum transcription is described by means of symbolic metadata including onsets timing and the types of drum. This information of rhythmical patterns enables further categorization of musical content such as genre classification and music moods analysis. Also, the measurement of less subjective music elements like tempo and musical meter significantly benefits from the availability of a drum score as well. In addition, the techniques of drum replacement in audio and automatic generation of drum score from recorded music have become increasingly popular in current musical entertainment industries like video games and iphone applications. Thus, automated transcription of the drum score is able to contribute to today s music retrieval algorithms immensely, and stimulates the development of varieties of applications in audio industry as well.

2. SYSTEM OVERVIEW PCM Audio Signals Time Frequency Transformation Peaks Picking & Onsets Detection Principal Components Analysis (PCA) Non-negative Independent Components Analysis (ICA) Non-negative Matrix Factorization (NMF) Features Extraction Sources Classification & Onsets Acceptance Training Sources Spectral Profiles Symbolic Data transcription & Midi Synthesis Drum Scores Figure 1 Drum Transcription System Overview An overview of the drum transcription system is presented in figure 1. The digital audio signals used for further signal processing chain are mono files with 16bit per sample at a sampling frequency of 44.1 khz. A spectral representation of the pre-processed time signal is computed using a Short Time Fourier Transformation (STFT). After the process of differentiation and half-wave rectification of the magnitude spectrogram, a non-negative difference-spectrogram is computed for further processing. Then, the detection of multiple local maxima associated with transient onset events in the musical signal is conducted in a basic peak picking method. The main concept of the further process is the storage of a short excerpt of the difference-spectrogram at the time of the onset t. From these onset frames the significant spectral profiles will be gathered in the next stages of PCA for dimensionality reduction and ICA for audio source separation. The

subsequent sections will give a more in depth account of the source separation stage endorsed the whole signal processing chain. 3. AUDIO SOURCE SEPARATION 3.1 Principal Components Analysis (PCA) Principal Components Analysis is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Since patterns can be difficult to find in high dimensional data, such graphical, audio or video representation, PCA is a powerful method for analyzing data. The other advantage is PCA can reduce the dimensionality of the original data without losing much information. From the preceding steps the information about the time of occurrence t and the spectral composition of the onsets t is deduced. In order to find only a limited number of significant pattern subspaces of the deduced high dimensional onsets spectra, Principal Component Analysis (PCA) is applied to reduce the dimensionality of the percussive sources. By using PCA, the whole set of collected spectra can be broken down to a relatively small number of de-correlated principal components, thus resulting in a good representation of the original spectra with small reconstruction error. First, we calculate the covariance matrix of the original spectra vectors subtracted by their means. Next we compute the covariance matrix s eigenvectors which is orthogonal to each others, and their identical eigenvalues. From the set of eigenvectors the ones related to the d largest eigenvalues are chosen to provide the coefficients for a linear combination of the original spectra vectors according to equation X X t T, where T describes a transformation matrix for the dimensionality reduction, since it turns out that the eigenvectors with the highest eigenvalues is the principle components of the data set. The components in X are decorrelated to each others and are also variance normalized. It can be subsequentially put into the ICA stage in the next section. 3.2 Non-negative Independent Components Analysis (ICA) Non-negative Independent Components Analysis is one of the approaches of Independent Component Analysis to separate a set of linear mix signals into their original sources. A requirement for optimum performance of the algorithm is the statistical independency of the sources, which can be satisfied by the process of PCA. Non-Negative ICA uses the very intuitive concept of optimizing a cost function describing the non-negativity of the components. This cost function is related to the reconstruction error introduced by axis pair rotations of two or more variables in the positive quadrant of the joint probability density function [2]. There are two assumptions for this ICA algorithm: the original source signals are both positive and to some extent linearly independent. The first constraint is always fulfilled because the vectors processed ICA are from the amplitude-spectrogram X, which has been differentiated

half-wave rectified in early stage, so it does not contain any negative values. For the second constraint, the spectra collected at onset times can be regarded as the superposition of a small set of original source spectra representing the involved percussive instruments. It may safely be assumed that there are some characteristic properties inherent to spectral profiles of drum sounds [4][1] that allow us to separate the whitened components into their potential percussive sources F according to equation F A X, where A denotes the d x d un-mixing matrix iterately estimated by the ICA optimization process[2], which actually separate the individual components X. The sources vectors F are named spectral profiles [4]. The original spectral profiles of involved percussive instruments for training reference are shown in figure 2, and the spectral profiles for one particular input sample are shown in figure 3. figure 2 spectral profiles reference figure 3 spectral profiles of input sample 3.3. Extraction of Amplitude Bases After computing a certain number of spectral profiles, they can be used to extract the spectrograms amplitude basis from here forward referred to as amplitude envelopes according to equation E F X, there is no further ICA computation applying on the extraction of the amplitude envelopes. But actually the extracted amplitude envelopes E, do offer decent detection functions with peaks and plateaus for the following detection and classification stage.

Figure 4 Extracted Amplitude Envelopes 3.4 Non-negative Matrix Factorization (NMF) Non-negative Matrix Factorization (NMF) [3] is another approach of Independent Component Analysis successfully used in several unsupervised learning tasks and also in the analysis of music signals. It is an identical method for computing the sources spectral profiles and amplitude envelopes. In the case of music signals, NMF has been used to separate the input signal into a sum of sources, each of which has a fixed spectrum and a time-varying gain. This model suits quite well for representing drum signals. The signal model for spectrum X t ( f ) in frame t can be N t ( n, t n 1 written as a weighted sum of source spectra S n ( f ), X f ) a S ( f ), where the ( f ) actually the spectral profiles of the involved drum sounds deducted from the ICA computation, n S n is and a n, t are identical to the extracted amplitude envelopes discussed in last section, as percussive events detection functions with peaks and plateaus. a, In NMF, both the spectra S n ( f ) and gains n t are restricted to be non-negative. In the case of audio source separation, this means the spectrograms are purely additive. It has turned out that the non-negativity constraint alone is sufficient for separating sources [5]. The method of NMF is similar to the computation of ICA, minimizing a cost function between the observed spectrum and the model to converge. The divergence is minimized by an iterative algorithm, which uses multiplicative updates [3]. The main difference between ICA and NMF is that in NMF both spectral profiles and gain envelops are obtained by iterative estimation algorithm. 3.5 Components Classification and Onsets Acceptance The assignment of spectral profiles to the pre-trained profiles of drum instruments is provided by a simple k-nearest neighbor classifier with spectral profiles of single percussive instruments as training database. The distance function is calculated from the correlation coefficient between reference profile and incoming profile. Drum-like onsets events are detected in the amplitude envelopes by using traditional peak picking approach. The value of the amplitude envelope s magnitude is assigned to every onset candidate at its position. If this value exceeds a predetermined adaptive threshold then the onset is accepted. The threshold varies over time according to the amount of energy in a relatively larger range surrounding the onsets. 4. RESULTS To quantify the abilities of the algorithm, the ground truth drum scores of 20 excerpts were extracted from identical midi files as a reference. Each excerpt consists of 40 seconds duration at 44.1 khz sampling rate and 16 bits quantization resolution. Different musical genres are contained among these examples featuring rock, pop, latin, and rap. They were chosen because of their distinct musical characteristics, and the intention to confront the system with a significant variety

of possible percussive instruments and sounds. In this research, we only consider a limited number of percussive instruments, including bass drum, snare drum, hi-hat, cymbal and tom. The results shown in Table 1 is based on ICA algorithm (NMF is under implementation), and they are evaluated by typical statistical method of recall rate, precision rate and F-score. Class Precision Rate Recall Rate F-Score Bass Drum 90% 93% 91% Snare Drum 82% 88% 85% Hi-Hat 74% 85% 79% Cymbal 68% 73% 70% Tom 73% 84% 78% Table 1 Drum Transcription Results From the results above, we found that the recall rate is basically better than the precision rate, which means the detection system is over sensitive so that adds more non-drum events. In addition, the results of high frequency instruments like hi-hat and cymbals are less accurate in contrast to lower sounding drums. Because the presence of very prominent and dynamical harmonic sustained instruments like expressive singing voice or guitar solos tends to affect the purity of the separated sources, so that increase the number of found onsets. 5. CONCLUSIONS This report presented a source separation method of independent subspace analysis for automatic detection and classification of percussive instruments in recorded audio signals. The results are promising when considering the extraction of non-pitch information rather than perfect note-to-note transcription. Further improvements will be made with regards to the stage of classification and the onset acceptance by seeking more adaptive methods. In addition, training data of the spectral profiles have to be improved by using larger and more standard datasets. 6. REFERENCES [1] C. Uhle, C. Dittmar and T. Sporer, Extraction of Drum Tracks from polyphonic Music using Independent Subspace Analysis, in Proc. of the Fourth International Symposium on Independent Component Analysis, Nara, Japan, 2003 [2] M. Plumbley, Algorithms for Non-Negative Independent Component Analysis, in IEEE Transactions on Neural Networks, 14 (3), pp 534-543, May 2003 [3] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, pages 556 562. MIT Press, 2001. [4] C. Dittmar, C. Uhle, Further Steps towards Drum Transcription of Polyphonic Music, Proc. Of the AES 116th Convention, Berlin, 2004 [5] J. Paulus and T. Virtanen, Drum Transcription with non-negative spectrogram factorisation, submitted in EUSIPCO 2005, Antalya, Turkey, Sept. 4-8. 2005.