SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

Similar documents
Drum Transcription Based on Independent Subspace Analysis

Automatic Drum Transcription and Source Separation

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

Rhythm Analysis in Music

MODAL ANALYSIS OF IMPACT SOUNDS WITH ESPRIT IN GABOR TRANSFORMS

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

Blind Blur Estimation Using Low Rank Approximation of Cepstrum

Rhythm Analysis in Music

Survey Paper on Music Beat Tracking

Applications of Music Processing

Removal of ocular artifacts from EEG signals using adaptive threshold PCA and Wavelet transforms

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror

SPEECH ENHANCEMENT WITH SIGNAL SUBSPACE FILTER BASED ON PERCEPTUAL POST FILTERING

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

Auditory modelling for speech processing in the perceptual domain

ICA & Wavelet as a Method for Speech Signal Denoising

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Blind fault detection using spectral signatures

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Source Separation and Echo Cancellation Using Independent Component Analysis and DWT

Voice Activity Detection

ADAPTIVE NOISE LEVEL ESTIMATION

Speech and Music Discrimination based on Signal Modulation Spectrum.

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

Real-time beat estimation using feature extraction

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

FPGA implementation of DWT for Audio Watermarking Application

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Multiresolution Analysis of Connectivity

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Convention Paper Presented at the 120th Convention 2006 May Paris, France

Single-channel Mixture Decomposition using Bayesian Harmonic Models

LOCAL MULTISCALE FREQUENCY AND BANDWIDTH ESTIMATION. Hans Knutsson Carl-Fredrik Westin Gösta Granlund

Advanced audio analysis. Martin Gasser

Guitar Music Transcription from Silent Video. Temporal Segmentation - Implementation Details

A New Approach for Speech Enhancement Based On Singular Value Decomposition and Wavelet Transform

AUTOMATED MUSIC TRACK GENERATION

INDEPENDENT COMPONENT ANALYSIS OF ELECTROMYOGRAPHIC SIGNAL ABSTRACT

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

A Blind Array Receiver for Multicarrier DS-CDMA in Fading Channels

Different Approaches of Spectral Subtraction Method for Speech Enhancement

8.3 Basic Parameters for Audio

Mikko Myllymäki and Tuomas Virtanen

EE 791 EEG-5 Measures of EEG Dynamic Properties

ON BEDROSIAN CONDITION IN APPLICATION TO CHIRP SOUNDS

Time-Delay Estimation From Low-Rate Samples: A Union of Subspaces Approach Kfir Gedalyahu and Yonina C. Eldar, Senior Member, IEEE

Reducing comb filtering on different musical instruments using time delay estimation

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Joint Transmit and Receive Multi-user MIMO Decomposition Approach for the Downlink of Multi-user MIMO Systems

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

Chapter 2 Channel Equalization

Chapter 5. Signal Analysis. 5.1 Denoising fiber optic sensor signal

Image analysis. CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror

Location of Remote Harmonics in a Power System Using SVD *

A Novel Adaptive Method For The Blind Channel Estimation And Equalization Via Sub Space Method

Chapter 4 SPEECH ENHANCEMENT

TWO-DIMENSIONAL FOURIER PROCESSING OF RASTERISED AUDIO

Audio Fingerprinting using Fractional Fourier Transform

Rhythm Analysis in Music

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES

High-speed Noise Cancellation with Microphone Array

Long Range Acoustic Classification

Adaptive noise level estimation

SOUND SOURCE RECOGNITION AND MODELING

Detection Algorithm of Target Buried in Doppler Spectrum of Clutter Using PCA

IOMAC' May Guimarães - Portugal

N J Exploitation of Cyclostationarity for Signal-Parameter Estimation and System Identification

Audio Imputation Using the Non-negative Hidden Markov Model

FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

Large-scale cortical correlation structure of spontaneous oscillatory activity

Speech/Music Change Point Detection using Sonogram and AANN

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Transcription of Piano Music

AN EFFECTIVE EVALUATION FUNCTION FOR ICA TO SEPARATE TRAIN NOISE FROM TELLURIC CURRENT DATA

HUMAN speech is frequently encountered in several

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

Nonlinear postprocessing for blind speech separation

TIME-FREQUENCY REPRESENTATION OF INSTANTANEOUS FREQUENCY USING A KALMAN FILTER

SGN Audio and Speech Processing

An analysis of blind signal separation for real time application

Multiple Input Multiple Output (MIMO) Operation Principles

Multiple Sound Sources Localization Using Energetic Analysis Method

The psychoacoustics of reverberation

UNIVERSITÉ DE SHERBROOKE

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

Automatic Transcription of Monophonic Audio to MIDI

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

ICA for Musical Signal Separation

Physical Layer: Modulation, FEC. Wireless Networks: Guevara Noubir. S2001, COM3525 Wireless Networks Lecture 3, 1

Real-time Drums Transcription with Characteristic Bandpass Filtering

Jaswant 1, Sanjeev Dhull 2 1 Research Scholar, Electronics and Communication, GJUS & T, Hisar, Haryana, India; is the corr-esponding author.

Transcription:

SUB-BAND INDEPENDEN SUBSPACE ANALYSIS FOR DRUM RANSCRIPION Derry FitzGerald, Eugene Coyle D.I.., Rathmines Rd, Dublin, Ireland derryfitzgerald@dit.ie eugene.coyle@dit.ie Bob Lawlor Department of Electronic Engineering, National University of Ireland, Maynooth rlawlor@eeng.may.ie ABSRAC While Independent Subspace Analysis provides a means of separating sound sources from a single channel signal, making it an effective tool for drum transcription, it does have a number of problems. Not least of these is that the amount of information required to allow separation of sound sources varies from signal to signal. o overcome this indeterminacy and improve the robustness of transcription an extension of Independent Subspace Analysis to include sub-band processing is proposed. he use of this approach is demonstrated by its application in a simple drum transcription algorithm.. INRODUCION.. Independent Subspace Analysis Independent Subspace Analysis (ISA) was first proposed by Casey and Westner as a means of sound source separation from single channel mixtures of sounds []. ISA is based on the concept of reducing redundancy in time-frequency representations of signals, and represents sound sources as low dimensional subspaces in the time-frequency plane. ISA makes a number of assumptions about the nature of the signal and the sound sources present in the signal. he first of these is that the single channel sound mixture signal is assumed to be a sum of p unknown independent sources, p () s () t = s q () t q= Carrying out a Short-ime Fourier ransform (SF) on the signal and using the magnitudes of the coefficients obtained yields a spectrogram of the signal, Y of dimension n m, where n is the number of frequency channels, and m is the number of time slices. From this it can be seen that each column of Y contains a vector which represents the frequency spectrum at time, with m. Similarly each row can be seen as the evolution of frequency channel k over time, with k n. It is assumed that the overall spectrogram Y results from the superposition of l unknown independent spectrograms Y. As the superposition of spectrograms is a linear operation in the time-frequency plane this yields: l Y = Y (2) = It is then assumed that each of the Y can be uniquely represented by the outer product of an invariant frequency basis function f, and a corresponding invariant amplitude envelope or weighting function t which describes the variations in amplitude of the frequency basis function over time. his yields Y = f t (3) Summing the Y yields l Y = f t (4) = In practice the assumption that the frequency basis functions are stationary means that no change in pitch can occur within the spectrogram. Casey and Westner overcome this assumption by breaking the signal into smaller blocks, inside of which the pitch can be considered stationary. However when dealing with sources that can be assumed to be stationary in pitch, such as most drum sounds, this step can be removed. he independent basis functions correspond to features of the independent sources, and each source is composed of a number of these independent basis functions. he basis functions that compose a sound source form a low-dimensional subspace that represents the source. he basis functions that compose a source are then grouped together using a mean-field clustering algorithm. Once the low-dimensional subspaces have been identified the independent sources can be resynthesised if required. here remains the problem of estimating the underlying basis functions to allow decomposition of the spectrogram in the manner described above. One method of doing this is Principal Component Analysis (PCA). PCA linearly transforms a set of correlated variables into a number of uncorrelated variables that are termed principal components. he first principal component contains the largest amount of the total variance as possible, and each successive principal component contains as much of the total remaining variance as possible. As a result of this property one of the uses of PCA is as a method of dimensional reduction, by discarding components that contribute minimal variance to the overall data. One method of carrying out PCA is singular value decomposition (SVD), which decomposes Y, an n m matrix into Y = USV (5) DAFX-

where U is an n n orthogonal matrix, V is an n m orthogonal matrix and S is an n m diagonal matrix of singular values. he columns of U contain the principal components of Y based on frequency, while the columns of V contain the principal components of Y based on time. As the number of sources p is very much smaller than n or m, we keep only the first few principal components and take these to contain our independent basis functions that describe the sources. However PCA does not return a set of statistically independent basis functions. o obtain independent basis functions a further procedure, known as Independent Component Analysis (ICA), must be carried out [2]. Independent Component Analysis attempts to separate a set of observed signals that are composed of mixtures of a number of independent non-gaussian sources into a set of signals that contain the independent sources. he independent sources are assumed to have been mixed linearly. Using vector-matrix notation this can be stated as: x = As (6) where x contains the observed mixture signals, s contains the independent non-gaussian sources, and A is the mixing matrix. o recover the independent sources ICA makes use of a corollary of the central limit theorem. he central limit theorem states that mixtures of non-gaussian signals will tend towards a gaussian distribution as the number of signals increases. As a result the mixture signals in x will have probability density functions that are closer to gaussian than the source signals in s. From this it can be seen that the original sources will have probability density functions that are more non-gaussian than any mixture of the sources. herefore finding an unmixing matrix that gives a set of signals that are as non-gaussian as possible given the data in the mixtures will in most cases result in the recovery of the independent sources. It should be noted that ICA cannot recover the signals at their original amplitudes or in the order in which the signals are presented. However in practice these restrictions do not affect the usefulness of ICA methods. here are numerous algorithms publicly available for performing ICA, such as FastICA and Jade [3,4]. Good reviews of ICA methods can to be found in [2,5]. ICA is performed on the basis functions that have been retained from the PCA step to yield a set of independent basis functions. It should be noted that the basis functions retained can be taken from either U or V. If taken from U the basis functions obtained after ICA will be independent in frequency. Similarly if taken from V the basis functions obtained will be independent in time. Once the independent basis functions have been obtained the corresponding amplitude envelopes or frequency basis functions can be obtained from matrix multiplication of the pseudo-inverse of the independent basis functions with the original overall spectrogram. Once these have been obtained a spectrogram of an independent subspace can be obtained as shown in equation (3). As ISA works on the magnitudes of the SF coefficients there is no phase information available to allow resynthesis. A fast but crude way of obtaining phase information is to reuse the phase information from the original SF. However the quality of the resynthesis using this method varies widely from signal to signal..2. Optimal Information for Source Separation Estimating the optimal amount of information to keep remains a problem. he amount of information contained in a given number of basis functions can be estimated from the normalised cumulative sum of the singular values. A threshold can then be set for the amount of information to be retained, and the following inequality can be used to solve for the number of basis functions required: ρ σ φ i= i (7) n σ i= i where σ i is the singular value of the i th basis function, φ is the threshold and ρ is the required number of basis functions. here is a trade-off between the amount of information to retain and the recognisability of the resulting features. Setting φ = results in a set of basis functions which support a small region in the frequency range. When φ <<, the basis functions are recognisable spectral features with support across the entire frequency range. It is this case which is of interest in determining independent subspaces which represent features of the source signals. Figure. ISA of drum loop (4 basis functions) Figure 2. ISA of drum loop (5 basis functions) DAFX-2

.3. Limitations of Independent Subspace Analysis While ISA does provide an effective means of separating sound mixtures it should be noted that there are a number of problems with ISA. hese are discussed below from the point of view of separating and transcribing drums. he first problem is that the amount of information that needs to be retained following the PCA step for successful separation varies depending on the frequency characteristics of the sounds and their relative amplitudes. In testing the ISA method using input signals containing mixtures of three drums the number of basis functions required to effectively separate the drums was found to vary from 3-6 basis functions. Using the threshold method described previously did not always result in the correct separation of the test signals. oo low a threshold resulted in missing sources, too high a threshold resulted in the recovery of spectral features which were not usable for the purposes of drum transcription. he problem of estimating the required information is illustrated in Figures & 2. he figures show the amplitude envelopes obtained from performing ISA on a drum loop containing snare, kick drum and hi-hats. Figure shows the result obtained from keeping 4 basis functions, and Figure 2 shows the result obtained from keeping 5 basis functions. As can be seen above, retaining an extra basis function allows the separation of the hi-hats. he indeterminacy in the number of basis functions required for a given separation affects the robustness of any drum transcription system using ISA, and means that the presence of an observer is required to identify the correct number of basis functions required for separation of the drums. Secondly, as drums are broadband noise based instruments there are regions of overlap between the sounds, and as a result sometimes other drums show up as small peaks in the amplitude envelopes of the separated drums. However when good separation is obtained a simple thresholding operation is usually sufficient to identify the required events. he quality of separation also depends on the length of the signal input. For instance a signal containing ust one hi-hat and snare played simultaneously will not separate correctly. For the hi-hat/snare separation 2-4 events are typically required, depending on the frequency and amplitude characteristics of the drums used. he method also has limitations on the number of sources it can recover, working best on signals with less than five sources. his is a result of the trade-off between the need to keep more information to allow recovery of the sources, and the loss of recognisability of the features recovered as the amount of information retained increases. However in most cases the number of drums occurring in the segment analysed will be less than five. As can be seen from the above there are a number of limitations in the ISA method. However once these limitations are taken into account ISA provides an effective means of overcoming the masking problem encountered by Sillanpää et al when trying to identify mixtures of drums [6]. Figure 3. SF of a section of a drum loop 2. SUB-BAND INDEPENDEN SUBSPACE ANALYSIS 2.. Motivation As noted previously the number of basis functions required to separate the sources varies depending on the frequency characteristics and relative amplitudes of the sources present. o overcome this problem it is proposed to add a sub-band processing step to the ISA method. he addition of sub-band processing to the ISA method is motivated by observing some general properties of drums as used in popular music. he drums in a standard rock kit can be divided into two types, drums where a skin is struck, including snares, toms, and kick drums, and drums where metal is struck, including hi-hats and cymbals. he skinned drums have most of their energy in the low end of the frequency range, below khz and the metal drums have most of their energy spread out over the spectrum above 2 khz. his is illustrated in Figure 3, where the intense regions below khz correspond to the occurrence of skinned drums. Also in most popular music the skinned drums are mixed louder in the recordings than the metal drums. his means that the skinned drums dominate in ISA analysis of the input signals. It is proposed to make use of the frequency characteristics of the drums to improve the robustness of the ISA method for transcription purposes by using sub-band processing. he signal is split into two bands, a low pass band for transcribing the skinned drums, and a high pass band for the metal drums. he low pass filter has a cutoff frequency of khz, and the high pass filter has a cutoff frequency of 2 khz. he high pass filter has the effect of removing a large amount of the energy of the skinned drums, thus allowing the metal drums to be identified with greater ease. DAFX-3

results in much clearer separation of the hi-hats than ISA using 5 basis functions. 3. RESULS he system was tested on 5 drum loops containing snares, hihats and kick drums. he drums were taken from various sample CDs and were chosen to cover the wide variations in sound within each type of drum. he drum patterns used are examples of commonly found patterns in rock music, as well as variations on these patterns. he tempos used ranged from 80bpm to 50 bpm and different meters were used, including 4/4, 3/4 and 2/8. Relative amplitudes between the drums were varied between 0 dbs to 24 dbs to cover a wide range of situations and to make the tests as realistic as possible. he same set of analysis parameters was used on all the test signals. he results of the tests are summarized in able. Figure 4. Sub-band ISA of drum loop 2.2. Drum ranscription using Sub-band ISA o demonstrate the robustness of sub-band ISA a simple drum transcription system was implemented in Matlab. he system is limited, but effective within the confines of its limitations. It contains no explicit models of the drum types and contains no rhythmic models, but does make a number of assumptions. Firstly it is assumed that only three drums are present in the test signals, snare drums, kick drums and hi-hats. he basis for this assumption is that the basic drum patterns found in popular music consist largely of these three drums. Secondly it is assumed that the hi-hat occurs more frequently than the snare drum. Again this assumption holds for most drum patterns in popular music. hirdly it is assumed that the kick drum has a lower spectral centroid than the snare drum. his assumption is ustified in that snare drums are perceptually brighter than kick drums, and the brightness of sounds has been found to correlate well with the spectral centroid [7]. he use of sub-band processing ensures that only two basis functions are required in each band to separate the components. Analysis starts with the signal being filtered into two bands as described previously. he low-pass signal is then passed to the ISA algorithm with only two basis functions kept from the PCA step. he spectral centroids of the separated components are calculated, and the component with the lowest centroid identified as the kick drum. he other component is then identified as the snare. As separation of the sounds is not perfect the amplitude envelopes are normalised and all peaks above a threshold are taken as an occurrence of a given drum. Onset times were calculated using a variation of the onset detection algorithm proposed by Klapuri [8]. he high-pass signal is processed in a similar manner, with the hi-hat determined as the basis function that has the most peaks in amplitude over the threshold. he remaining basis function contains the high frequency energy from the snare drum that has not been removed in filtering. Figure 4 shows the performance of sub-band ISA on the same drum loop used in figures 2 & 3. As can be seen sub-band ISA gives the required separation using only 4 basis functions, and ype otal Undetected Incorrect % Snare 2 0 2 90.5 Kick 33 0 0 00 Hats 79 6 6 84.8 Overall 33 6 8 89.5 able. Drum ranscription Results. All the kick drums and snare drums were correctly identified, but two of the kicks were also categorized as snares. he undetetected hi-hats were in fact separated correctly but were ust below the threshold for identification. Six snare hits were also identified as hi-hats due to imperfect separation. It is observed that there is a trade-off in setting the threshold level between detecting low amplitude occurrences of a drum and between incorrectly detecting drums due to imperfect separation. he threshold used was found to represent a good balance between the two. It should be noted that this level of success was achieved without the use of rhythmic models of basic drum patterns. Due to the limitations in the time resolution of the SF, and also due to smearing in time from overlapping windows, the detection of onset times had an average error of 0ms. It should be noted that this error tended to be consistent across all the drums in a given loop, so that inter-onset intervals remained consistent within a given loop. However it is still desirable to improve the accuracy of onset detection in sub-band ISA. 4. CONCLUSIONS AND FUURE WORK his paper has introduced the concept of sub-band ISA as a means of resolving the optimal information of ISA for the purposes of drum transcription. he effectiveness of this approach was demonstrated using a limited drum transcription system. It is proposed to extend this work by incorporating drum models to generalise the drum transcription system and remove the limitations currently imposed. It is also proposed to extend the system to allow drum transcription in the presence of pitched DAFX-4

instruments, and to improve the accuracy of the onset detection in sub-band ISA. 5. REFERENCES [] Casey, M.A. & Westner, A., Separation of Mixed Audio Sources By Independent Subspace Analysis in Proc. Of ICMC 2000, pp. 54-6, Berlin, Germany. [2] A. Hyvärinen and E. Oa. Independent Component Analysis: Algorithms and Applications. Neural Networks, 3(4-5): pp 4-430, 2000. [3] FastICA package for Matlab, http://www.cis.hut.fi/proects/ica/fastica/index.shtml [4] Jade algorithm for ICA, http://www.tsi.enst.fr/icacentral/algos.html [5] Cardoso, J.F., Blind Signal Separation: statistical Principles, Proceedings of the IEEE, Vol.9, No. 0, pp. 2009-2025, Oct 998, [6] Sillanpää, Klapuri, Seppänen, Virtanen. Recognition of acoustic noise mixtures by combining bottom-up and topdown processing. In proc. European Signal Processing Conference, EUSIPCO 2000 [7] Gordon, J., and Grey, J. M., "Perceptual Effects of Spectral Modifications on Orchestral Instrument ones." Computer Music Journal, Vol. 2, N, pp. 24-3, 978 [8] Klapuri. Sound Onset Detection by Applying Psychoacoustic Knowledge. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 999. DAFX-5