ICA for Musical Signal Separation

Similar documents
Nonlinear postprocessing for blind speech separation

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

High-speed Noise Cancellation with Microphone Array

Drum Transcription Based on Independent Subspace Analysis

BLIND SEPARATION OF LINEAR CONVOLUTIVE MIXTURES USING ORTHOGONAL FILTER BANKS. Milutin Stanacevic, Marc Cohen and Gert Cauwenberghs

REAL-TIME BLIND SOURCE SEPARATION FOR MOVING SPEAKERS USING BLOCKWISE ICA AND RESIDUAL CROSSTALK SUBTRACTION

Audio Imputation Using the Non-negative Hidden Markov Model

ROBUST echo cancellation requires a method for adjusting

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Neural Blind Separation for Electromagnetic Source Localization and Assessment

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

Separation of Noise and Signals by Independent Component Analysis

FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

DOWNLINK TRANSMITTER ADAPTATION BASED ON GREEDY SINR MAXIMIZATION. Dimitrie C. Popescu, Shiny Abraham, and Otilia Popescu

Adaptive Waveforms for Target Class Discrimination

TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION

Multiple Sound Sources Localization Using Energetic Analysis Method

MINUET: MUSICAL INTERFERENCE UNMIXING ESTIMATION TECHNIQUE

Different Approaches of Spectral Subtraction Method for Speech Enhancement

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Live multi-track audio recording

Chapter 4 SPEECH ENHANCEMENT

Reducing comb filtering on different musical instruments using time delay estimation

SEPARATION AND DEREVERBERATION PERFORMANCE OF FREQUENCY DOMAIN BLIND SOURCE SEPARATION. Ryo Mukai Shoko Araki Shoji Makino

SAMPLING THEORY. Representing continuous signals with discrete numbers

Joint Time/Frequency Analysis, Q Quality factor and Dispersion computation using Gabor-Morlet wavelets or Gabor-Morlet transform

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Multiple Antenna Techniques

Appendix. Harmonic Balance Simulator. Page 1

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

International Journal of Advancedd Research in Biology, Ecology, Science and Technology (IJARBEST)

Robust Broadband Periodic Excitation Design

Sound Source Localization using HRTF database

Adaptive Beamforming Approach with Robust Interference Suppression

Introduction to Machine Learning

Deblending random seismic sources via independent component analysis

Audiovisual speech source separation: a regularization method based on visual voice activity detection

IT is well known that a better quality of service

VOLD-KALMAN ORDER TRACKING FILTERING IN ROTATING MACHINERY

Real-time Adaptive Concepts in Acoustics

Biosignal filtering and artifact rejection. Biosignal processing, S Autumn 2012

A Novel Hybrid Approach to the Permutation Problem of Frequency Domain Blind Source Separation

Performance analysis of passive emitter tracking using TDOA, AOAand FDOA measurements

Frugal Sensing Spectral Analysis from Power Inequalities

Audio Restoration Based on DSP Tools

A Novel Adaptive Algorithm for

FOUR TOTAL TRANSFER CAPABILITY. 4.1 Total transfer capability CHAPTER

SGN Audio and Speech Processing

Smart antenna for doa using music and esprit

Performance Evaluation of different α value for OFDM System

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

Blind Source Separation for a Robust Audio Recognition Scheme in Multiple Sound-Sources Environment

Background Dirty Paper Coding Codeword Binning Code construction Remaining problems. Information Hiding. Phil Regalia

DURING the past several years, independent component

Estimation of Sinusoidally Modulated Signal Parameters Based on the Inverse Radon Transform

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Convention Paper Presented at the 120th Convention 2006 May Paris, France

Timbral Distortion in Inverse FFT Synthesis

Audio Enhancement Using Remez Exchange Algorithm with DWT

BLIND SOURCE SEPARATION FOR CONVOLUTIVE MIXTURES USING SPATIALLY RESAMPLED OBSERVATIONS

Blind source separation and directional audio synthesis for binaural auralization of multiple sound sources using microphone array recordings

Implementation of Nonlinear Reconfigurable Controllers for Autonomous Unmanned Vehicles

BLIND DETECTION OF PSK SIGNALS. Yong Jin, Shuichi Ohno and Masayoshi Nakamoto. Received March 2011; revised July 2011

Transmit Power Allocation for BER Performance Improvement in Multicarrier Systems

Source Separation and Echo Cancellation Using Independent Component Analysis and DWT

SNR Estimation in Nakagami-m Fading With Diversity Combining and Its Application to Turbo Decoding

ESE531 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

27th Seismic Research Review: Ground-Based Nuclear Explosion Monitoring Technologies

PERFORMANCE MEASUREMENT OF ONE-BIT HARD DECISION FUSION SCHEME FOR COOPERATIVE SPECTRUM SENSING IN CR

CHAPTER 4 MONITORING OF POWER SYSTEM VOLTAGE STABILITY THROUGH ARTIFICIAL NEURAL NETWORK TECHNIQUE

Multi-user Space Time Scheduling for Wireless Systems with Multiple Antenna

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

FROG. In order to measure an event in time, you need a shorter one. So how do you measure the shortest one?

UWB Small Scale Channel Modeling and System Performance

BLIND SOURCE separation (BSS) [1] is a technique for

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

University Ibn Tofail, B.P. 133, Kenitra, Morocco. University Moulay Ismail, B.P Meknes, Morocco

Performance Evaluation of Noise Estimation Techniques for Blind Source Separation in Non Stationary Noise Environment

+ C(0)21 C(1)21 Z -1. S1(t) + - C21. E1(t) C(D)21 C(D)12 C12 C(1)12. E2(t) S2(t) (a) Original H-J Network C(0)12. (b) Extended H-J Network

Generalized Game Trees

Transcription of Piano Music

Basis Pursuit for Seismic Spectral decomposition

Speech Enhancement for Nonstationary Noise Environments

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Short-Time Fourier Transform and Its Inverse

Cracking the Sudoku: A Deterministic Approach

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

CSC475 Music Information Retrieval

A Random Network Coding-based ARQ Scheme and Performance Analysis for Wireless Broadcast

SGN Audio and Speech Processing

Speech Enhancement Using Microphone Arrays

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Transcription:

ICA for Musical Signal Separation Alex Favaro Aaron Lewis Garrett Schlesinger 1 Introduction When recording large musical groups it is often desirable to record the entire group at once with separate microphones for each instrument. This technique allows the group to record a piece as they would perform it while still producing multiple tracs for later balancing and tweaing. The mixing process is made more difficult in this scenario, however, by the sound of each instrument bleeding in to the other microphones such that the recorded instruments are not truly isolated. Ideally we would lie to completely remove this effect by separating the signals generated by each instrument into individual tracs. In general we are unaware of the factors which contribute to the bleeding effect so the problem is an example of blind signal separation (BSS). One of the more common solutions to BSS is Independent Component Analysis (ICA). Most ICA algorithms use a generative model that assumes that the observed signal is generated from a linear combination, i.e., instantaneous mixture, of statistically independent sources. Formally, at each time sample i we observe x (i) = As (i) (1) where s (i) R n are our n source signals at time i and A is an unnown square matrix called the mixing matrix. Given this assumption, the demixing matrix W A 1 is obtained by maximizing the statistical independence of the source signals that we wish to isolate. In practice, instantaneous mixtures of audio signals are quite rare. Microphones in a real recording scenario will pic up not only the direct sound from each source but also their reflections from walls and other obects. Even when such reflections are minimal (as might be the case in a well-equipped recording studio) the sounds will reach each microphone at different times due to propagation delay. A more accurate model describes each observed signal as a linear combination of delayed source signals. Concretely, observed signal at time sample i is given by x (i) = n =1 a s (i t ) (2) where n is the number of signals, a is the, -th element of A, and t is the amount of delay from source to microphone. Given this formulation of the problem we attempt to extend ICA to handle the real world problem of signal separation in musical recordings. In Section 2, we discuss the data that we used to test our methods. Section 3 describes each method and its results in turn. We conclude in Section 4 with a discussion of room for improvement and future wor. 1

2 Data We tested our approach on a number of different data sets. Each of our recordings includes four instruments: an electric guitar, a piano, a tenor saxophone, and a snare drum. With a separate microphone for each instrument we recorded three scenarios: each instrument playing independently (i.e., not the same piece of music), a B maor scale played in unison and with various rhythmic patterns, and a simplified arrangment of Tower of Power s Ain t Nothin Stoppin Us Now. In each case we recorded all of the instruments together to create the bleeding effect and also separately with no bleeding. As a sanity chec we also artificially mixed our separately recorded tracs to recreate the bleeding effect both with and without propagation delay. Our data is stored in a lossless audio format that allows us to easily operate on the time domain (time vs amplitude) of the signal. We also generate spectra for the signals which allow us to operate on the frequency domain (frequency vs amplitude) as well as spectrograms that represent the spectra at different time windows. Figure 1 shows the data generated from the guitar s microphone while playing a B maor scale. Figure 1: Guitar data (a) Time domain (b) Frequency domain (c) Time-frequency domain 3 Methods and Results Although ICA performed well in the time domain on our artificially created instantaneous mixtures, the algorithm s performance degraded rapidly when propagation delays were introduced. The recovered signals from our real world recordings were less isolated than the observed signals. To account for these propagation delays we subsequently focused our efforts on separation in the frequency domain. 2

3.1 ICA in the Frequency Domain Note that after applying the Fourier transform to our signals Equation 2 becomes where ˆx (i) and ŝ (i) ˆx (i) = n =1 a exp( it ω (i) )ŝ (i) (3) are the Fourier transforms of observed signal and source signal, respectively, and ω (i) is the frequency at sample i. Thus propagation delay in the time domain becomes complex rotation in the frequency domain so the observed signals are now instantaneous mixtures of the source signals. Our mixing matrix, however, is now a function of signal frequency. Initially we ignored the frequency dependency in the mixing matrix by running a version of FastICA for complex-valued data (CFastICA [2]) over the Fourier transforms of our observed signals. We recovered the source signals by applying the inverse Fourier transform to the resulting independent components. Our hope was that the propagation delays ( 3ms) would be small enough that the frequency dependent components of the mixing matrix would be negligible. We only had small success in separating tracs using CFastICA. In artificially mixed B scale, the drums were entirely separated out of one trac, though the melodic instruments are all mixed to a greater extent than in the source tracs. In all tracs with propagation delay, both natural and artificial, the outputted signals were more mixed than the source files. This mixture occurred because the source tracs are co-dependent in the frequency domain. We also decided to try running FastICA on the magnitude of our frequency responses as a heuristic to generate the mixing matrix. This greatly simplifies the signal by removing the phase information, which in turn ignores any propogation delays. To recover our signals, we tae the resulting demixing matrix and apply it to the frequency response of our observed signals. We then apply an inverse Fourier transform on the results to get our estimated independent components. Figure 2: Frequency Domain Results (a) Observed (b) Recovered We had success in isolating artificially mixed tracs by running FastICA on the magnitude of the Fast Fourier Transform. In both the artificially mixed scale and am tracs, the piano 3

and snare drum separated well. The snare drum in particular isolated with effectively no audible interference from other sources. Figure 2 shows the observed and recovered frequency domain signals for the snare drum on B scale. We hypothesize that the snare drum isolates particularly well in the frequency domain because its frequencies are the most independent. The guitar, piano, and saxophone play many of the same notes over the course of a trac (and in the case of the B scale, all of the same notes). This means that their frequencies are heavily dependent, leading ICA to perform poorly. However, the snare drum does not vary in frequency over the course of a trac and is in this way the most unique and independent instrument, so ICA is able to recover the drum. 3.2 Frequency Banded ICA We can rewrite Equation 3 in a more familiar form as ˆx (i) = A(ω (i) )ŝ (i) (4) where A(ω (i) ) is our mixing matrix as a function of frequency. Thus the problem in the frequency domain is a set of instantaneous mixtures as in Equation 1. Since the frequency dependencies in the mixing matrix are similar for close values of ω we can run ICA on a number of relatively small frequency bins. The source signals are recovered by appending the resulting independent components and applying the inverse Fourier transform. One issue that arises with this approach is nown as the permutation problem. Given only the observed signals, the permutation of the recovered sources is arbitrary. We must therefore ensure that the permutation of sources recovered by ICA is the same for each frequency bin. A number of approaches have been suggested to overcome the permutation problem [3, 4]. We implemented the simplest of these, which calculates the demixing matrix for the frequency bins one at a time using the matrix calculated for the previous bin as the initial guess for the next bin. Since the neighboring frequency values should be somewhat close to one another this helps to ensure that the permutation will not change from bin to bin. Unfortunately this approach to the permutation problem was insufficient to overcome the complexity of our data. Although we believe that Equation 4 was a good way to view the problem (and the literature would seem to agree [3, 4]), the results we obtained from this method were unsatisfactory. Many of the recovered signals were washed out and clearly contained sounds generated by all of the sources. For our data at least, a more sophisticated solution to the permutation problem is necessary. 3.3 ICA with Linear Regression Our third approach to the propagation delay problem was to modify how the mixing matrix is computed in ICA directly. By inverting the problem we define the, -th element of our demixing matrix as follows w (ω) = c exp(it ω) (5) FastICA uses a deflation method that solves for the source signals one at a time [1]. In the iteration that computes source signal, w is updated to be the mean over all w (i) where w(i) is the estimate for w computed from sample i. To remove the frequency dependency from our model, we modify this update step to instead calculate c and t from the w (i) s. 4

Taing the natural logarithm of Equation 5 we obtain log w = log c + it ω (6) which is linear in ω. We can therefore use linear regression on log w (i) and ω(i) to obtain estimates for log c and it from which we calculate c and t. Once all the c and t s are computed we can use Equation 5 once more to write our demixing matrix as a function of ω and recover the source signals. This change essentially modifies our estimate of w by fitting a 1-deminsional polynomial to the w s at each frequency instead of a 0-dimensional polynomial. While this change seemed promising and logical, its results were not satisfactory. Introducing the new degrees of freedom resulted in FastICA s gradient descent failing to converge in any reasonable amount of time. The values of c and t produced at each iteration appeared at times to be oscillatory and at other times to randomly shift. This method may have the potential to be successful with future wor and investigation but at the time of writing was not successful. 4 Conclusion In conclusion, signal separation on real world data is difficult. We primarily focused our separation methods on accounting for the volume decay and propagation delay present in recording multiple instruments in one setting. However, solving for these variables given the different mixes and the nowledge that the sources are independent pieces of music was a tougher tas then we expected. We started with an algorithm that was capable of separating the observed signals if there was no propagation delay present, and throughout our various methods the best results were to separate out one or two instruments. We attribute this result to the difficulty of simultaneously solving for volume decay and propagation delay as well as the difficulty present in musical data sets - that the sources are not entirely independent. While our results are not indicative of finding an optimal solution to the problem, we do feel that we have made progress. We were able to separate some of the signals and succesfully isolate some of the instruments in our data set. In addition, we also investigated innovative methods to solve for both the volume decay and propogation delay which, given more time and effort, may be able to produce better results. References [1] Hyvärine A., Fast and Robust Fixed-Point Algorithm for Independent Component Analysis, IEEE Trans. on Neural Networs, 10(3):626-634, 1999. [2] Bingham E. and A. Hyvärine, A fast fixed-point algorithm for independent component analysis of complex valued signals, Helsini University of Technology, 2000. [3] Smaragdis P., Information Theoretic Approaches to Source Separation, MAS Department, Massachusetts Institute of Technology, 1997. [4] Mitinoudis N. and M. Davies, Audio Source Separation of Convolutive Mixtures, IEEE Trans. on Speech and Audio Processing, 11(5), 2003. 5