Lecture 14: Source Separation

Similar documents
Monaural and Binaural Speech Separation

Single-channel Mixture Decomposition using Bayesian Harmonic Models

REpeating Pattern Extraction Technique (REPET)

Lecture 5: Sinusoidal Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

VQ Source Models: Perceptual & Phase Issues

Recent Advances in Acoustic Signal Extraction and Dereverberation

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Lecture 9: Time & Pitch Scaling

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

The psychoacoustics of reverberation

Drum Transcription Based on Independent Subspace Analysis

Discriminative Enhancement for Single Channel Audio Source Separation using Deep Neural Networks

High-speed Noise Cancellation with Microphone Array

IMPROVED COCKTAIL-PARTY PROCESSING

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

Binaural Hearing. Reading: Yost Ch. 12

Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music

SGN Audio and Speech Processing

Applications of Music Processing

SGN Audio and Speech Processing

Microphone Array Design and Beamforming

Audio Imputation Using the Non-negative Hidden Markov Model

MINUET: MUSICAL INTERFERENCE UNMIXING ESTIMATION TECHNIQUE

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

PRIMARY-AMBIENT SOURCE SEPARATION FOR UPMIXING TO SURROUND SOUND SYSTEMS

Robust Low-Resource Sound Localization in Correlated Noise

Robust Speech Recognition Group Carnegie Mellon University. Telephone: Fax:

8.3 Basic Parameters for Audio

1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER /$ IEEE

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Stefan Launer, Lyon, January 2011 Phonak AG, Stäfa, CH

A Novel Approach to Separation of Musical Signal Sources by NMF

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Convention Paper Presented at the 120th Convention 2006 May Paris, France

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation

Towards an intelligent binaural spee enhancement system by integrating me signal extraction. Author(s)Chau, Duc Thanh; Li, Junfeng; Akagi,

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Auditory System For a Mobile Robot

Proceedings of Meetings on Acoustics

Robotic Spatial Sound Localization and Its 3-D Sound Human Interface

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Real-time Adaptive Concepts in Acoustics

ONLINE REPET-SIM FOR REAL-TIME SPEECH ENHANCEMENT

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

Pitch Estimation of Singing Voice From Monaural Popular Music Recordings

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Multiple Sound Sources Localization Using Energetic Analysis Method

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Mel Spectrum Analysis of Speech Recognition using Single Microphone

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Transcription of Piano Music

Audio Restoration Based on DSP Tools

WIND NOISE REDUCTION USING NON-NEGATIVE SPARSE CODING

Sound Synthesis Methods

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Mikko Myllymäki and Tuomas Virtanen

Complex Sounds. Reading: Yost Ch. 4

Guitar Music Transcription from Silent Video. Temporal Segmentation - Implementation Details

Calibration of Microphone Arrays for Improved Speech Recognition

Lecture 2: Acoustics

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

ROBUST MULTIPITCH ESTIMATION FOR THE ANALYSIS AND MANIPULATION OF POLYPHONIC MUSICAL SIGNALS

ORCHIVE: Digitizing and Analyzing Orca Vocalizations

SPARSITY LEVEL IN A NON-NEGATIVE MATRIX FACTORIZATION BASED SPEECH STRATEGY IN COCHLEAR IMPLANTS

Computational Perception. Sound localization 2

ROTATIONAL RESET STRATEGY FOR ONLINE SEMI-SUPERVISED NMF-BASED SPEECH ENHANCEMENT FOR LONG RECORDINGS

ROBUST SPEECH RECOGNITION. Richard Stern

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

Deep learning architectures for music audio classification: a personal (re)view

Auditory Distance Perception. Yan-Chen Lu & Martin Cooke

Binaural Segregation in Multisource Reverberant Environments

Speech Signal Analysis

ICA for Musical Signal Separation

Spatialization and Timbre for Effective Auditory Graphing

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

A BROADBAND BEAMFORMER USING CONTROLLABLE CONSTRAINTS AND MINIMUM VARIANCE

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Introduction of Audio and Music

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components

Fundamentals of Music Technology

REAL-TIME BLIND SOURCE SEPARATION FOR MOVING SPEAKERS USING BLOCKWISE ICA AND RESIDUAL CROSSTALK SUBTRACTION

Wavelet Speech Enhancement based on the Teager Energy Operator

Speech Enhancement Techniques using Wiener Filter and Subspace Filter

L19: Prosodic modification of speech

ESTIMATION OF TIME-VARYING ROOM IMPULSE RESPONSES OF MULTIPLE SOUND SOURCES FROM OBSERVED MIXTURE AND ISOLATED SOURCE SIGNALS

Survey Paper on Music Beat Tracking

Transcription:

ELEN E896 MUSIC SIGNAL PROCESSING Lecture 1: Source Separation 1. Sources, Mixtures, & Perception. Spatial Filtering 3. Time-Frequency Masking. Model-Based Separation Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu http://www.ee.columbia.edu/~dpwe/e896/ E896 Music Signal Processing (Dan Ellis) 13--9-1 /19

1. Sources, Mixtures, & Perception Sound is a linear process (superposition) no opacity (unlike vision) sources auditory scenes (polyphony) frq/hz 3 1 6 8 1 1 time/s - - -6 level / db _m+s-15-evil-goodvoice-fade Analysis Voice (evil) Rumble Stab Voice (pleasant) Strings Choir Humans perceive discrete sources.. a subjective construct E896 Music Signal Processing (Dan Ellis) 13--9 - /19

Spatial Hearing People perceive sources based on cues spatial (binaural): ITD, ILD Blauert 96 R L head shadow (high freq) source path length difference shatr78m3 waveform.1.5 Left -.5 Right -.1..5.1.15..5.3.35 time / s E896 Music Signal Processing (Dan Ellis) 13--9-3 /19

Auditory Scene Analysis Spatial cues may not be enough/available single channel signal Brain uses signal-intrinsic cues to form sources onset, harmonicity Bregman 9 Reynolds-McAdams Oboe 3 3 1 1-1 - -3 -.5 1 1.5.5 3 3.5 time / sec level / db E896 Music Signal Processing (Dan Ellis) 13--9 - /19

Auditory Scene Analysis Imagine two narrow channels dug up from the edge of a lake, with handkerchiefs stretched across each one. Looking only at the motion of the handkerchiefs, you are to answer questions such as: How many boats are there on the lake and where are they? (after Bregman 9) Quite a challenge! E896 Music Signal Processing (Dan Ellis) 13--9-5 /19

Audio Mixing Studio recording combines separate tracks into, e.g., channels (stereo) different levels panning other effects Stereo Intensity Panning manipulating ILD only constant power more channels: use just nearest pair? E896 Music Signal Processing (Dan Ellis) 13--9-6 /19 1.5 1.5.5 1 L R

. Spatial Filtering N sources detected by M sensors degrees of freedom (else need other constraints) Consider x case: directional mics m 1 s 1 a a 1 11 a a 1 mixing matrix: m 1 = a 11 a 1 s 1 ŝ 1 m a 1 a s ŝ m s = Â 1 m E896 Music Signal Processing (Dan Ellis) 13--9-7 /19

Source Cancelation Simple x case example: m 1 m = m 1 (t) 1.5.8 1 s 1 s m 1 (t) =s 1 (t)+.5s (t) m (t) =.8s 1 (t)+s (t).5m (t) =.6s 1 (t) if no delay and linearly-independent sums, can cancel one source per combination E896 Music Signal Processing (Dan Ellis) 13--9-8 /19

Independent Component Analysis Can separate blind combinations by maximizing independence of outputs Bell & Sejnowski 95 m 1 a 11 a s x 1 1 m a 1 a s δ MutInfo δa kurtosis kurt(y) =E y µ 3 for independence? mix.8.6. s1 Mixture Scatter s kurtosis 1 1 8 Kurtosis vs.. 6 -. -. -.6 -.3 -. -.1.1..3. mix 1...6.8 1 E896 Music Signal Processing (Dan Ellis) 13--9-9 /19 s1 s /

Microphone Arrays If interference is diffuse, can simply boost energy from target direction e.g. shotgun mic - delay-and-sum Benesty, Chen, Huang 8 λ = D x = c. D - λ = D - λ = D + D + D + D off-axis spectral coloration many variants - filter & sum, sidelobe cancelation... E896 Music Signal Processing (Dan Ellis) 13--9-1/19

3. Time-Frequency Masking What if there is only one channel? cannot have fixed cancellation but could have fast time-varying filtering: 8 6 Brown & Cooke 9 Roweis 1 8 6.5 1 1.5.5 3 The trick is finding the right mask... E896 Music Signal Processing (Dan Ellis) 13--9-11/19 time / s

Original Mix + Oracle Labels Time-Frequency Masking Works well for overlapping voices 8 6 Male Female - level / db Oraclebased Resynth 8 6.5 1.5 1 time / sec time-frequency resolution? time / sec cooke-v3n7.wav cooke-v3msk-ideal.wav cooke-n7msk-ideal.wav E896 Music Signal Processing (Dan Ellis) 13--9-1/19

Pan-Based Filtering Can use time-frequency masking even for stereo e.g. calculate panning index as ILD mask cells matching that ILD Avendano 3 6 level / db 5 6 ILD mask 1 pt win.5.. +1. db 5 1 15 time / s E896 Music Signal Processing (Dan Ellis) 13--9-13/19 5 level / db ILD / db

Harmonic-based Masking Time-frequency masking can be used to pick out harmonics given pitch track, know where to expect harmonics Denbigh & Zhao 199 3 1 3 1 1 3 5 6 7 8 time / s E896 Music Signal Processing (Dan Ellis) 13--9-1/19

Harmonic Filtering Given pitch track, could use time-varying comb filter to get harmonics or: isolate each harmonic by heterodyning: 3 ˆx(t) = k Avery Wang 1995 â k (t)cos(kˆ(t)t) â k (t) =LP F { x(t)e jkˆ(t)t } 1 3 1 3 1 1 3 5 6 7 time / s 8 E896 Music Signal Processing (Dan Ellis) 13--9-15/19

Nonnegative Matrix Factorization Decomposition of spectrograms into templates + activation X = W H fast & forgiving gradient descent algorithm fits neatly with time-frequency masking 5 1 15 Lee & Seung 99 Abdallah & Plumbley Smaragdis & Brown 3 Virtanen 7 Virtanen 3 sounds 1 3 Bases from all W t Rows of H 1 3 5 1 15 Time (DFT slices) E896 Music Signal Processing (Dan Ellis) 13--9-16/19 1 1 8 6 Frequency (DFT index) Smaragdis

. Model-Based Separation When N (sources) > M (sensors), need additional constraints to solve problem e.g. assumption of single dominant pitch Can assemble into a model M of source si defines set of possible waveforms..probabilistically: Pr(s i M) Source separation from mixture as inference: s = {s i } = arg max s where P r(x s,a)p (A) Pr(x s,a)=n (x As, ) i Pr(s i M) E896 Music Signal Processing (Dan Ellis) 13--9-17/19

Can constrain: Source Models source spectra (e.g. harmonic, noisy, smooth) temporal evolution (piecewise-continuous) spatial arrangements (point-source, diffuse) Factored decomposition: Ozerov, Vincent & Bimbot 1 http://bass-db.gforge.inria.fr/fasst/ Stereo instantaneous mix Separated source 1 3 3 Frequency 1 Frequency 1.5 1 1.5.5 3 Time.5 1 1.5.5 3 Time Separated source 3 3 Frequency 1 Frequency 1.5 1 1.5.5 3 Time.5 1 1.5.5 3 Time Separated source 3 3 Frequency 1.5 1 1.5.5 3 Time Music: Shannon Hurley / Mix: Michel Desnoues & Alexey Ozerov / Separations: Alexey Ozerov E896 Music Signal Processing (Dan Ellis) 13--9-18/19

Summary Acoustic Source Mixtures The normal situation in real-world sounds Spatial filtering Canceling sources by subtracting channels Time-Frequency Masking Selecting spectrogram cells Model-Based Separation Exploiting regularities in source signals E896 Music Signal Processing (Dan Ellis) 13--9-19/19

References S. Abdallah & M. Plumbley, Polyphonic transcription by non-negative sparse coding of power spectra, Proc. Int. Symp. Music Info. Retrieval,. C. Avendano, Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications, IEEE WASPAA, Mohonk, pp. 55-58, 3. A. Bell, T. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Computation, vol. 7 no. 6, pp. 119-1159, 1995. J. Benesty, J. Chen, Y. Huang, Microphone Array Signal Processing, Springer, 8. J. Blauert, Spatial Hearing, MIT Press, 1996. A. Bregman, Auditory Scene Analysis, MIT Press, 199. G. Brown & M. Cooke, Computational auditory scene analysis, Computer Speech and Language, vol. 8 no., pp. 97-336, 199. P. Denbigh & J. Zhao, Pitch extraction and separation of overlapping speech, Speech Communication, vol. 11 no. -3, pp. 119-15, 199. D. Lee & S. Seung, Learning the Parts of Objects by Non-negative Matrix Factorization, Nature 1, 788, 1999. A. Ozerov, E. Vincent, & F. Bimbot, A general flexible framework for the handling of prior information in audio source separation, INRIA Tech. Rep. 753, Nov. 1. S. Roweis, One microphone source separation, Adv. Neural Info. Proc. Sys., pp. 793-799, 1. P. Smaragdis & J. Brown, Non-negative Matrix Factorization for Polyphonic Music Transcription, Proc. IEEE WASPAA,177-18, October, 3. T. Virtanen Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Tr. Audio, Speech, & Lang. Proc. 15(3), 166 17, 7. Avery Wang, Instantaneous and frequency-warped signal processing techniques for auditory source separation, Ph.D. dissertation, Stanford CCRMA, 1995. E896 Music Signal Processing (Dan Ellis) 13--9 - /19