Bag-of-Features Acoustic Event Detection for Sensor Networks

Similar documents
MULTI-MICROPHONE FUSION FOR DETECTION OF SPEECH AND ACOUSTIC EVENTS IN SMART SPACES

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

GEOMETRY CALIBRATION OF DISTRIBUTED MICROPHONE ARRAYS EXPLOITING AUDIO-VISUAL CORRESPONDENCES. Axel Plinge and Gernot A. Fink

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

AUDIO PHRASES FOR AUDIO EVENT RECOGNITION

Recent Advances in Acoustic Signal Extraction and Dereverberation

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

arxiv: v1 [cs.sd] 4 Dec 2018

Book Chapters. Refereed Journal Publications J11

MULTI-SPEAKER TRACKING USING MULTIPLE DISTRIBUTED MICROPHONE ARRAYS. Axel Plinge and Gernot A. Fink

Audio Fingerprinting using Fractional Fourier Transform

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

SOUND SOURCE RECOGNITION FOR INTELLIGENT SURVEILLANCE

Advanced Music Content Analysis

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

arxiv: v2 [eess.as] 11 Oct 2018

Sampling Rate Synchronisation in Acoustic Sensor Networks with a Pre-Trained Clock Skew Error Model

DNN AND CNN WITH WEIGHTED AND MULTI-TASK LOSS FUNCTIONS FOR AUDIO EVENT DETECTION

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

1 Publishable summary

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

LOCALIZATION AND IDENTIFICATION OF PERSONS AND AMBIENT NOISE SOURCES VIA ACOUSTIC SCENE ANALYSIS

Speech/Music Change Point Detection using Sonogram and AANN

EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY

Campus Location Recognition using Audio Signals

AUDIO TAGGING WITH CONNECTIONIST TEMPORAL CLASSIFICATION MODEL USING SEQUENTIAL LABELLED DATA

Spectral Noise Tracking for Improved Nonstationary Noise Robust ASR

Automotive three-microphone voice activity detector and noise-canceller

An Improved Voice Activity Detection Based on Deep Belief Networks

Change Point Determination in Audio Data Using Auditory Features

Environmental Sound Recognition using MP-based Features

Microphone Array Design and Beamforming

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

Extended Touch Mobile User Interfaces Through Sensor Fusion

Indoor Location Detection

Mikko Myllymäki and Tuomas Virtanen

Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt

CP-JKU SUBMISSIONS FOR DCASE-2016: A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

Digital Media Authentication Method for Acoustic Environment Detection Tejashri Pathak, Prof. Devidas Dighe

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

The Jigsaw Continuous Sensing Engine for Mobile Phone Applications!

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

Discriminative Training for Automatic Speech Recognition

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

Research Article DOA Estimation with Local-Peak-Weighted CSP

Audio Imputation Using the Non-negative Hidden Markov Model

Gammatone Cepstral Coefficient for Speaker Identification

Robust telephone speech recognition based on channel compensation

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

RECENTLY, there has been an increasing interest in noisy

Radio Tomographic Imaging and Tracking of Stationary and Moving People via Kernel Distance

Comparing Time and Frequency Domain for Audio Event Recognition Using Deep Learning

Audio Classification by Search of Primary Components

Calibration of Microphone Arrays for Improved Speech Recognition

Separation and Recognition of multiple sound source using Pulsed Neuron Model

Minimal-Impact Audio-Based Personal Archives

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Design and Implementation of an Audio Classification System Based on SVM

ZHIHUI ZHU. Johns Hopkins University Phone: (720) N Charles St., Baltimore MD 21218, USA Web: mines.edu/ zzhu

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm

Image De-Noising Using a Fast Non-Local Averaging Algorithm

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Proceedings of Meetings on Acoustics

Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Robust speech recognition using temporal masking and thresholding algorithm

Using RASTA in task independent TANDEM feature extraction

Relative phase information for detecting human speech and spoofed speech

Loudspeaker and Listening Position Estimation using Smart Speakers Nielsen, Jesper Kjær

Unsupervised birdcall activity detection using source and system features

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Time-Frequency Distributions for Automatic Speech Recognition

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

UNSUPERVISED SPEAKER CHANGE DETECTION FOR BROADCAST NEWS SEGMENTATION

Implementing Speaker Recognition

Evaluation of Image Segmentation Based on Histograms

Gaussian Mixture Model Based Methods for Virtual Microphone Signal Synthesis

Blind Blur Estimation Using Low Rank Approximation of Cepstrum

DWT BASED AUDIO WATERMARKING USING ENERGY COMPARISON

Text and Language Independent Speaker Identification By Using Short-Time Low Quality Signals

Applications of Music Processing

A Two-step Technique for MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis

Lecture 14: Source Separation

IMPROVING WIDEBAND SPEECH RECOGNITION USING MIXED-BANDWIDTH TRAINING DATA IN CD-DNN-HMM

Epoch Extraction From Emotional Speech

IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM

Microphone Array Power Ratio for Speech Quality Assessment in Noisy Reverberant Environments 1

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

SEMANTIC ANNOTATION AND RETRIEVAL OF MUSIC USING A BAG OF SYSTEMS REPRESENTATION

Curriculum Vitae. Petar M. Djurić

AUDIO VISUAL TRACKING OF A SPEAKER BASED ON FFT AND KALMAN FILTER

ACOUSTIC APPLICATIONS AND TECHNOLOGIES FOR AMBIENT ASSISTED LIVING SCENARIOS

Transcription:

Bag-of-Features Acoustic Event Detection for Sensor Networks Julian Kürby, René Grzeszick, Axel Plinge, and Gernot A. Fink Pattern Recognition, Computer Science XII, TU Dortmund University September 3, 2016 DCASE Workshop Budapest, Hungary

Axel Plinge BoF AED in Sensor Networks 1/14 Motivation Acoustic Sensor Networks (ASNs) are increasingly available: smartphones, laptops, hearing aids,... offer the possibility of collaborative processing Acoustic Event Detection (AED) useful for ASN applications [1] distributed sensors can improve performance [2] can we do better than heuristics? [3] [1] A. Plinge, F. Jacob, R. Haeb-Umbach, and G. A. Fink. Acoustic microphone geometry calibration: An overview and experimental evaluation of state-of-the-art algorithms. IEEE Signal Process. Mag., 33(4):14 29, July 2016 [2] H. Phan, M. Maass, L. Hertel, R. Mazur, and A. Mertins. A multi-channel fusion framework for audio event detection. In IEEE Workshop App. Signal Process. to Audio & Acoustics, 2015 [3] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf., pages 2375 2379, Lisbon, Portugal, Sept. 2014

Axel Plinge BoF AED in Sensor Networks 2/14 Method Overview Bag-of-Features approach originating in text retrieval successful in AED [1] fast and online Multi-channel fusion individual microphones or arrays as sensor node heuristic fusion: vote, max, product,... learning based fusion: classifier stacking Processing pipeline Acoustic Sensor Node Features Quantization Classification Histogram Fusion [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process., Florence, Italy, May 2014

Axel Plinge BoF AED in Sensor Networks 3/14 Method (1/5) Features Features Quantization Histogram Classification Fusion sliding window for each frame k, compute yk perceptual loudness, MFCCs, and GFCCs [1] Loudness(Filter Codebook Training ( sum(() Fusion Training Loudness Sampling(+ Quantization Sliding(Window Spectrum FFT Mel(Filterbank log( ( DCT MFCCs Gammatone(Filterbank log( ( DCT GFCCs GFCCs MFCCs L silence speech chairs door steps [1] X. Zhao, Y. Shao, and D. Wang. CASA-based robust speaker identification. IEEE Trans. Audio, Speech, Language Process., 20(5):1608 1616, 2012 [2] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process., Florence, Italy, May 2014 [3] code at http://patrec.cs.tu-dortmund.de/resources

Axel Plinge BoF AED in Sensor Networks 4/14 Method (2/5) Quantization Features Quantization Histogram Classification Fusion compute class-wise GMM by EM Codebook Training Fusion Training concatenate to super-codebook v l=(i c+i) = (µ i,c, σ i,c ) quantize each frame k by super-codebook q k,l (yk, v l ) = N (yk µ l, σ l ) histogram over a window of K frames b l (Y n, v l ) = 1 K K q k,l (yk, v l ) k=1 silence speech chairs door steps q l q l q l q l q l [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process., Florence, Italy, May 2014 [2] code at http://patrec.cs.tu-dortmund.de/resources

Axel Plinge BoF AED in Sensor Networks 5/14 Method (3/5) Classification Features Quantization Histogram Classification Fusion Multinominal Bayes classification Codebook Training Fusion Training train with Lidstone smoothing P(v l Ω c) = α+ Yn Ωc b l (Y n,v l ) αl+ L m=1 Yn Ωc bm(yn,vm) all classes equally likely, i.e., have the same prior maximum likelihood classification P(Y n Ω c) = v l v P(v l Ω c) b l (Y n,v l ) log P(Y Ωc) silence 0 3 6 9 c speech 0 3 6 9 c chairs 0 3 6 9 c door 0 3 6 9 c steps 0 3 6 9 c [1] A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process., Florence, Italy, May 2014 [2] code at http://patrec.cs.tu-dortmund.de/resources

Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Quantization Histogram Classification Fusion BoF Models per channel, per array, or global Codebook Training Fusion Training [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf., pages 2375 2379, Lisbon, Portugal, Sept. 2014

Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Quantization Histogram Classification Fusion BoF Models per channel, per array, or global Heuristic fusion [1] majority voting ĉ (m) = argmax P m(ym,n Ω c) c ĉ = argmax c {ĉ (m) = c } argmax c Codebook Training Fusion Training P 1(Y1,n Ω 1)... P 1(Y1,n Ω C ) P 1(Y1,n Ω 2)... P M (Y2,n Ω C ).. P 1(Y1,n Ω C ) }{{}... P M(YM,n Ω C ) }{{} argmax c = c argmax c = c [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf., pages 2375 2379, Lisbon, Portugal, Sept. 2014

Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Quantization Histogram Classification Fusion BoF Models per channel, per array, or global Heuristic fusion [1] majority voting ĉ (m) = argmax P m(ym,n Ω c) c ĉ = argmax c {ĉ (m) = c } maximum rule ĉ = argmax max c m argmax c Pm(Ym,n Ωc) Codebook Training Fusion Training max m{p 1(Y1,n Ω 1)... P M (YM,n Ω 1)} max m{p 1(Y1,n Ω 2)... P M (YM,n Ω 2)}... max m{p 1(Y1,n Ω C )... P M (YM,n Ω C )} [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf., pages 2375 2379, Lisbon, Portugal, Sept. 2014

Axel Plinge BoF AED in Sensor Networks 6/14 Method (4/5) Fusion Features Quantization Histogram Classification Fusion BoF Models per channel, per array, or global Heuristic fusion [1] majority voting ĉ (m) = argmax P m(ym,n Ω c) c ĉ = argmax c {ĉ (m) = c } maximum rule ĉ = argmax max c m product rule ĉ = argmax c argmax c Pm(Ym,n Ωc) P m(ym,n Ω c) m Codebook Training Fusion Training P 1(Y1,n Ω 1) P 2(Y2,n Ω 1)... P M (YM,n Ω 1) P 1(Y1,n Ω 2) P 2(Y2,n Ω 2)... P M (YM,n Ω 1). P 1(Y1,n Ω C ) P 2(Y2,n Ω C )... P M (YM,n Ω 1) [1] P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf., pages 2375 2379, Lisbon, Portugal, Sept. 2014

Axel Plinge BoF AED in Sensor Networks 7/14 Method (5/5) Fusion Features Quantization Histogram Classification Fusion Learned Fusion [1] Codebook Training classifier stacking use a meta-learner instead of heuristics Fusion Training classification of the class-channel matrix ĉ = F P 1(Y1,n Ω 1)... P M (YM,n Ω 1) P 1(Y1,n Ω 2)... P M (YM,n Ω 2)... P 1(Y1,n Ω C )... P M (YM,n Ω C ) train a random forest classifier F using data not used for training the models invariance through channel-sorting argsort max P m c m(ym,n Ω c) [1] J. Kürby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, Budapest, Hungary, Sept. 2016

Axel Plinge BoF AED in Sensor Networks 8/14 Evaluation ITC: dataset ITC-Irst dataset [1] smart conference room seven t-shaped arrays at the walls four microphones on the table door knock, door slam, steps, chair moving, spoon (cup jingle), paper wrapping, key jingle, keyboard typing, phone ring, applause, cough, laugh, door open, phone vibration, mimo pen buzz, falling object, and unknown/background [1] A. Temko, R. Malkin, C. Zieger, D. Macho, C. Nadeu, and M. Omologo. Clear evaluation of acoustic event detection and classification systems. In R. Stiefelhagen and J. Garofolo, editors, Multimodal Technologies for Perception of Humans, volume 4122 of Lecture Notes in Computer Science, pages 311 322. Springer Berlin Heidelberg, 2007

Axel Plinge BoF AED in Sensor Networks 9/14 Evaluation ITC: Literature Comparison three training session days with events occurring at different positions third session used for training the stacking classifier forth session for test 12 first classes as foreground [1] F-score [%] 85 80 75 frame-wise evaluation 40 AFER [%] 30 20 fusion(4) [2] single channel stacking (32) [3] 70 10 [1] A. Temko, R. Malkin, C. Zieger, D. Macho, C. Nadeu, and M. Omologo. Clear evaluation of acoustic event detection and classification systems. In R. Stiefelhagen and J. Garofolo, editors, Multimodal Technologies for Perception of Humans, volume 4122 of Lecture Notes in Computer Science, pages 311 322. Springer Berlin Heidelberg, 2007 [2] H. Phan, M. Maass, L. Hertel, R. Mazur, and A. Mertins. A multi-channel fusion framework for audio event detection. In IEEE Workshop App. Signal Process. to Audio & Acoustics, 2015 [3] J. Kürby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, Budapest, Hungary, Sept. 2016

Axel Plinge BoF AED in Sensor Networks 10/14 Evaluation ITC: Fusion strategies three training session days with events occurring at different positions third session used for training the stacking classifier forth session for test F-score [%] 85 80 75 70 frame-wise evaluation global channel-specific model single channel max product vote stacking channel-specific models perform better stacking better than heuristics [1] J. Kürby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, Budapest, Hungary, Sept. 2016

Axel Plinge BoF AED in Sensor Networks 11/14 Evaluation: FINCA dataset FINCA dataset [1] new real-world recordings smart conference room two microphone arrays at the ceiling and two in the table circular, 8 mic, 10cm diameter applause, chairs, cups, door, doorbell, doorknock, keyboard, knock, music, paper, phonering, phonevibration, pouring, screen, speech, steps, streetnoise, touching, ventilator, and silence. [1] dataset available at http://patrec.cs.tu-dortmund.de/resources

Axel Plinge BoF AED in Sensor Networks 12/14 Evaluation FINCA: Fusion strategies five 2/3 1/3 splits for training and test 1/3 of training used for the stacking classifier silence as background F-Score [%] 100 95 90 85 80 frame-wise evaluation global array channel-specific model single channel max product vote stacking channel-specific models perform better stacking better than heuristics [1] J. Kürby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, Budapest, Hungary, Sept. 2016 [2] dataset available at http://patrec.cs.tu-dortmund.de/resources

Axel Plinge BoF AED in Sensor Networks 13/14 Evaluation FINCA: Position invariance classification of nine classes occurring at different positions in the room error [%] error [%] 10 0 10 mixed positions in training and test global array channel-specific model separate positions in training and test 0 global array channel-specific stacking performs best model sorting mitigates effect of unseen positions global models better for unseen positions single channel max product vote stacking sorted (32) sorted (5) [1] J. Kürby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, Budapest, Hungary, Sept. 2016 [2] dataset available at http://patrec.cs.tu-dortmund.de/resources

Axel Plinge BoF AED in Sensor Networks 14/14 Conclusion acoustic sensor networks allow multi-channel AED extension [1] of Bag-of-Features online AED [2] multi-channel fusion improves the results classifier stacking outperforms heuristic strategies channel re-ordering by sorting can improve position invariance [1] J. Kürby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, Budapest, Hungary, Sept. 2016 [2] R. Grzeszick, A. Plinge, and G. A. Fink. Temporal acoustic words for online acoustic event detection. In Proc. 37th German Conf. Pattern Recognition, Aachen, Germany, 2015 [3] http://patrec.cs.tu-dortmund.de/resources

Axel Plinge BoF AED in Sensor Networks 14/14 References P. Giannoulis, G. Potamianos, A. Katsamanis, and P. Maragos. Multi-microphone fusion for detection of speech and acoustic events in smart spaces. In European Signal Process. Conf., pages 2375 2379, Lisbon, Portugal, Sept. 2014. R. Grzeszick, A. Plinge, and G. A. Fink. Temporal acoustic words for online acoustic event detection. In Proc. 37th German Conf. Pattern Recognition, Aachen, Germany, 2015. J. Kürby, R. Grzeszick, A. Plinge, and G. A. Fink. Bag-of-features acoustic event detection for sensor networks. In Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, Budapest, Hungary, Sept. 2016. H. Phan, M. Maass, L. Hertel, R. Mazur, and A. Mertins. A multi-channel fusion framework for audio event detection. In IEEE Workshop App. Signal Process. to Audio & Acoustics, 2015. A. Plinge and G. A. Fink. Multi-speaker tracking using multiple distributed microphone arrays. In IEEE Int. Conf. Acoustics Speech & Signal Process., Florence, Italy, May 2014. A. Plinge and S. Gannot. Multi-microphone speech enhancement informed by auditory scene analysis. In Sensor Array and Multichannel Signal Process. Workshop, Rio de Janeiro, Brazil, July 2016.

Axel Plinge BoF AED in Sensor Networks 14/14 A. Plinge, R. Grzeszick, and G. A. Fink. A bag-of-features approach to acoustic event detection. In IEEE Int. Conf. Acoustics Speech & Signal Process., Florence, Italy, May 2014. A. Plinge, F. Jacob, R. Haeb-Umbach, and G. A. Fink. Acoustic microphone geometry calibration: An overview and experimental evaluation of state-of-the-art algorithms. IEEE Signal Process. Mag., 33(4):14 29, July 2016. A. Temko, R. Malkin, C. Zieger, D. Macho, C. Nadeu, and M. Omologo. Clear evaluation of acoustic event detection and classification systems. In R. Stiefelhagen and J. Garofolo, editors, Multimodal Technologies for Perception of Humans, volume 4122 of Lecture Notes in Computer Science, pages 311 322. Springer Berlin Heidelberg, 2007. X. Zhao, Y. Shao, and D. Wang. CASA-based robust speaker identification. IEEE Trans. Audio, Speech, Language Process., 20(5):1608 1616, 2012.