Blind Source Separation for a Robust Audio Recognition Scheme in Multiple Sound-Sources Environment

Similar documents
Audio Fingerprinting using Fractional Fourier Transform

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

Audio Watermarking Based on Multiple Echoes Hiding for FM Radio

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

ICA & Wavelet as a Method for Speech Signal Denoising

Simple Impulse Noise Cancellation Based on Fuzzy Logic

Feature Extraction of Acoustic Emission Signals from Low Carbon Steel. Pitting Based on Independent Component Analysis and Wavelet Transforming

Multiple Sound Sources Localization Using Energetic Analysis Method

Reversible data hiding based on histogram modification using S-type and Hilbert curve scanning

Drum Transcription Based on Independent Subspace Analysis

DWT BASED AUDIO WATERMARKING USING ENERGY COMPARISON

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Change Point Determination in Audio Data Using Auditory Features

Available online at ScienceDirect. The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013)

Development of Real-Time Adaptive Noise Canceller and Echo Canceller

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

A Novel Fault Diagnosis Method for Rolling Element Bearings Using Kernel Independent Component Analysis and Genetic Algorithm Optimized RBF Network

Zero-Based Code Modulation Technique for Digital Video Fingerprinting

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Study on OFDM Symbol Timing Synchronization Algorithm

Automotive three-microphone voice activity detector and noise-canceller

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Composite Adaptive Digital Predistortion with Improved Variable Step Size LMS Algorithm

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Multi-GI Detector with Shortened and Leakage Correlation for the Chinese DTMB System. Fengkui Gong, Jianhua Ge and Yong Wang

High-speed Noise Cancellation with Microphone Array

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Multi-Resolution Estimation of Optical Flow on Vehicle Tracking under Unpredictable Environments

Speech/Music Change Point Detection using Sonogram and AANN

Journal of mathematics and computer science 11 (2014),

Detection of Rail Fastener Based on Wavelet Decomposition and PCA Ben-yu XIAO 1, Yong-zhi MIN 1,* and Hong-feng MA 2

3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015)

SURVEILLANCE SYSTEMS WITH AUTOMATIC RESTORATION OF LINEAR MOTION AND OUT-OF-FOCUS BLURRED IMAGES. Received August 2008; accepted October 2008

Frequency-Domain Equalization for SC-FDE in HF Channel

Adaptive filter and noise cancellation*

IJMIE Volume 2, Issue 4 ISSN:

ICA for Musical Signal Separation

A New Data Conjugate ICI Self Cancellation for OFDM System

NO-REFERENCE IMAGE BLUR ASSESSMENT USING MULTISCALE GRADIENT. Ming-Jun Chen and Alan C. Bovik

Source Separation and Echo Cancellation Using Independent Component Analysis and DWT

Segmentation of Fingerprint Images

3D display is imperfect, the contents stereoscopic video are not compatible, and viewing of the limitations of the environment make people feel

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

A SCALABLE AUDIO FINGERPRINT METHOD WITH ROBUSTNESS TO PITCH-SHIFTING

High capacity robust audio watermarking scheme based on DWT transform

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Performance Analysis of Parallel Acoustic Communication in OFDM-based System

Proceedings of Meetings on Acoustics

Toward an Augmented Reality System for Violin Learning Support

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

ON THE IMPLEMENTATION OF MELODY RECOGNITION ON 8-BIT AND 16-BIT MICROCONTROLLERS

Automatic Morse Code Recognition Under Low SNR

TIMIT LMS LMS. NoisyNA

PoS(CENet2015)037. Recording Device Identification Based on Cepstral Mixed Features. Speaker 2

Separation of Noise and Signals by Independent Component Analysis

A Comparison of Histogram and Template Matching for Face Verification

The Study and Implementation of Agricultural Information Service System Based on Addressable Broadcast

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Design of Asymmetric Dual-Band Microwave Filters

A Novel Dual-Band Scheme for Magnetic Resonant Wireless Power Transfer

DURING the past several years, independent component

Real-time Adaptive Concepts in Acoustics

FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

Audio Imputation Using the Non-negative Hidden Markov Model

ADAPTIVE ESTIMATION AND PI LEARNING SPRING- RELAXATION TECHNIQUE FOR LOCATION ESTIMATION IN WIRELESS SENSOR NETWORKS

A multi-class method for detecting audio events in news broadcasts

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

A Reversible Data Hiding Scheme Based on Prediction Difference

The Seamless Localization System for Interworking in Indoor and Outdoor Environments

Anti-shaking Algorithm for the Mobile Phone Camera in Dim Light Conditions

FILTER FIRST DETECT THE PRESENCE OF SALT & PEPPER NOISE WITH THE HELP OF ROAD

64 Kb logic RRAM chip resisting physical and side-channel attacks for encryption keys storage

Introduction to Audio Watermarking Schemes

LPSO-WNN DENOISING ALGORITHM FOR SPEECH RECOGNITION IN HIGH BACKGROUND NOISE

Address: 9110 Judicial Dr., Apt. 8308, San Diego, CA Phone: (240) URL:

REALIZATION OF VLSI ARCHITECTURE FOR DECISION TREE BASED DENOISING METHOD IN IMAGES

(JBE Vol. 17, No. 6, November 2012) a), A Study on the AM/FM Digital Radio for Practical Use Based on DRM and DRM+

Reversible Data Hiding in Encrypted color images by Reserving Room before Encryption with LSB Method

Open Access Research of Dielectric Loss Measurement with Sparse Representation

Speech Synthesis using Mel-Cepstral Coefficient Feature

Digital Audio Watermarking With Discrete Wavelet Transform Using Fibonacci Numbers

Laser Printer Source Forensics for Arbitrary Chinese Characters

Using RASTA in task independent TANDEM feature extraction

Localized Robust Audio Watermarking in Regions of Interest

A Technology Forecasting Method using Text Mining and Visual Apriori Algorithm

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

IMAGE TYPE WATER METER CHARACTER RECOGNITION BASED ON EMBEDDED DSP

Dr. Kusam Sharma *1, Prof. Pawanesh Abrol 2, Prof. Devanand 3 ABSTRACT I. INTRODUCTION

Path Planning in Dynamic Environments Using Time Warps. S. Farzan and G. N. DeSouza

A Novel Algorithm for Hand Vein Recognition Based on Wavelet Decomposition and Mean Absolute Deviation

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 01, 2016 ISSN (online):

Suppression of Pulse Interference in Partial Discharge Measurement Based on Phase Correlation and Waveform Characteristics

The Hand Gesture Recognition System Using Depth Camera

Open Access An Algorithm for GPS Anti-Jamming Based on Improved FastICA

A Single Image Haze Removal Algorithm Using Color Attenuation Prior

Iris Recognition-based Security System with Canny Filter

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

An Equalization Technique for Orthogonal Frequency-Division Multiplexing Systems in Time-Variant Multipath Channels

Audio Enhancement Using Remez Exchange Algorithm with DWT

Transcription:

International Conference on Mechatronics, Electronic, Industrial and Control Engineering (MEIC 25) Blind Source Separation for a Robust Audio Recognition in Multiple Sound-Sources Environment Wei Han,2,3, Songbin Zhou,2,3, Chang Li,2,3, Yisen Liu,2,3, Zhe Liu,2,3 Guangdong Institute of Automation 2 Key Laboratory of Modern Control Technology of Guangdong Province 3 Public Laboratory of Modern Control and Manufacture Technology of Guangdong Province,2,3 Guangzhou, China e-mail: w.han@gia.ac.cn Abstract In multiple environment, robustness is a major challenge for audio recognition system based on audio fingerprinting, because mixed audio signals may make recognition rate has a significant decline. This paper proposes a novel audio fingerprinting method, which uses blind source separation to divide mixed audio signals into independent components and each is close to its original sound-source, then the classical scheme can perform accurately identifying. Experimental results show that novel scheme is quite robust in noisy conditions where uncertain audio signals mixed by various numbers of sound-source, even though the feature of each original sound-source and their mixed model are unknown. Keywords audio recognition; audio fingerprinting; multiple environment; blind source separation I. INTRODUCTION Now, audio fingerprinting is a very common way to recognize an unknown audio clip. It has been reported that there already are available services not only for providing music search such as Shazam [], but also for monitoring broadcast for advertisement tracking [2] and integrity checking for audio content [3]. As the applications of audio fingerprinting on mobile devices are becoming more and more widely, it urgently needs to possess more robust against multiple environment, especially for working in various public places. Some excellent audio fingerprinting schemes have been proposed to satisfy audio recognition. scheme proposed by Haitsma and Kalker [4] is proven to be the most accurate audio fingerprinting scheme in a relatively noise-free environment. Wooram [5] uses predominant pitch extraction to devise an approach of sub-fingerprint masking, which improves the robustness of scheme. The system developed by Wang [6] has become a successful commercial application. Based on the idea of Wang s method, Jun-Yong Lee [7] proposes an adaptive audio fingerprinting extraction method based on the constant Q transform (CQT) to enhance the robustness of audio fingerprinting in a real noisy environment for real-time TV advertising identification. In practice, however, it still needs further improvement to be used in multiple environment. In this paper, blind source separation (BSS) is used to segregate unknown mixed audio signals to get independent components which are close to their original, and then the classical scheme can perform exactly identifying. II. BLIND SOURCE SEPARATION BASED ON FASTICA ALGORITHM In practical applications, the recorded signals are often polluted by other. And worse still, all the original and their mixed way are blind, only indistinct mixed audio signals can be observed. But BSS, which can divide mixed signals into independent components, is an efficient way to restore original signals from their mixed signals. FastICA algorithm [9] is the most mutual implementation method for BSS. A. Background of the FastICA Algorithm Assume that the mixed audio signals matrix is X defined as X AS () where X ( x, x2, L, x ) T n have n observed acoustical signals which mixed by n unknown independent original S ( s, s2, L, s ) T n, and A is a full-rank n by n mixing matrix. The goal of FastICA algorithm is to recover independent original audio signals from their mixed signals by finding a linear transformation matrix W that maximizes the mutual independence of sound-mixture. The decomposition model is shown in equation (2). Y WX = WAS = GS (2) Thus separation can be achieved when G=E (E is a nth-order identity matrix) results from repeatedly learning. FastICA measures non-gaussianity using kurtosis to find independent components from their mixtures. FastICA algorithm based on the fixed-point iteration scheme is to find the maximum of the non-gaussianity of W T X as measured by negentropy. The unit vector W is substituted into the projection W T X such that the negentropy is maximized. The fixed-point iteration operations, of the FastICA algorithm using an 25. The authors - Published by Atlantis Press 564

approximate negentropy and Newton iteration are addressed as []. B. The Effectiveness of BSS Almost all schemes [4-8] extract audio fingerprinting from spectrum feature of audio signals. Therefore, the difficulty for identifying mixed audio signals by audio fingerprinting can be deduced from analyzing the discrepancy between mixed signals spectrums and.2.5..5 Original A.3.2. original signals spectrums. Randomly selecting and mingling arbitrary three audio clips A, A2 and A3, their mixed signals are A4, A5, and A6, as shown in Fig.; their own spectrums and their mixed signals spectrums are shown in Fig.2; the separated independent components from mixed signals are AA, AA2 and AA3, their spectrums are shown in Fig.3. Original A.3.2. Original A -.5 5 5 2 -. 5 5 2 -. 5 5 2.2.5.3.2.3.2..5.. -.5 5 5 2 -. 5 5 2 -. 5 5 2.2..3.2.2.. -. -. -. -.2 5 5 2 -.2 5 5 2 Figure. Original audio signals and their mixed signals -.2 5 5 2.5 Spectrum of A.5 Spectrum of A.5 Spectrum of A.5.5.5 2 3 4 5 2 3 4 5 2 3 4 5.5.5.5.5.5.5 2 3 4 5 2 3 4 5 2 3 4 5.5.5.5.5.5.5 2 3 4 5 2 3 4 5 2 3 4 5 Figure 2. Spectrums of original audio signals and their mixed signals In practice, however, we can not confirm the corresponding relation between separated independent components and original in general, i.e., the AA, AA2 and AA3 are not doubtless respectively corresponding to A, A2 and A3. Therefore, in Fig.3, the spectrums relationship between A and AA, AA2, AA3 is listed separately, so is A2 and A3. Fig. and Fig.2 demonstrate that the obvious differences in original signals spectrums and mixed signals spectrums, which will result in imparity between mixed signals audio fingerprinting and original signals audio fingerprinting, even though the mixed signals composited by original signals. Fig.3 shows the spectrums of separated independent signals. It can be easily to perceive that the spectrums of independent signals are very approximate to their original signals spectrums, from which the enormous degree of closeness of their audio fingerprinting can be 565

concluded. And actually, it can be clearly seen at least that the spectrums of A2, A3 is similar to AA s spectrum, AA2 s spectrum separately..5 Spectrum of A Spectrum of AA.5 Spectrum of A.5 Spectrum of A.5.5.5 2 3 4 5 2 3 4 5 2 3 4 5.5 Spectrum of AA.5.5.5.5.5 2 3 4 5 2 3 4 5 2 3 4 5.5 Spectrum of AA.5.5.5.5.5 2 3 4 5 2 3 4 5 2 3 4 5 Figure 3. Spectrums of original audio signals and separated independent signals III. AUDIO FINGERPRINTING SCHEME The proposed audio fingerprinting system (BFP scheme) is based on the hashing algorithm. This section is divided into two modules to describe in detail. A. The particulars of hashing algorithm are given in [4]. The audio signal is sampled at the rate of 44 Hz and segmented into overlapping frames, each of which contains 52 non-overlapped samples and 5872 overlapped samples. Each frame of 6384 samples is then Fast Fourier Transformed. By logarithmically dividing the obtained audio spectrum, 33 non-overlapping frequency bands from 3 Hz to 2Hz are acquired. Then total of 32 hash bits are assigned for each frame to become a single sub-fingerprint. A single sub-fingerprint for frame nth frame is defined as a bit sequence of F(n,m) for m 3 where F(n,m) is defined as equation (3). if ( E( n, m) E( n, m )) ( E( n, m) E( n, m )) F( n, m) (3) if ( E( n, m) E( n, m )) ( E( n, m) E( n, m )) B. BFP As shown in Fig.4 is the overview of BFP scheme, for the robust fingerprinting extraction in multiple environment, we propose to use N microphones (N should more than the number of original sound-source in general []) to collect mixed audio signals, then divide mixed signals into independent components by BSS. Each independent component is very approximate to its original. Due to it is hard to exactly confirm the sequence of independent components and their corresponding relation with the original signals, i.e., it is uncertain that which is the needed independent component, thus every independent component has to be put into fingerprinting database to query. Mixed Clip Mixed Clip 2... Mixed Clip N Blind Source Separation Audio Fingerprinting Extracting Figure 4. The overview of BFP scheme IV. EXPERIMENTS Fingerprint Database Fingerprint Matching Retrieval Result To evaluate the performance of BFP scheme, we implement the following three schemes including the proposed algorithm to compare: ) Our fingerprinting scheme (BFP); 2) Wooram s fingerprinting scheme (MBM) [5]; 3) scheme [4]. A. Experimental Data Experiments were performed using a music database containing songs randomly selected from worldwide popular songs of various genres such as DJ, electronic, classic, blues, jazz, folk, light music, hip-hop, country, rock and so on. All the audio data are stored in PCM format with mono, 6 bit depth and 44. khz sampling rate. Fingerprinting database is composed of these songs audio fingerprinting. From the selected songs, randomly created audio query clips of three, six and nine seconds. And in the following experiments, the mixture of M (M=2, 3, 4) refers to an unknown audio clip mixed by arbitrary M audio clips in 566

these fragments. B. Experimental Results Tab.I and Fig.5 show the results of the audio retrieval experiments performed on the database based on three different schemes, which are BFP, MBM and scheme. In the experiment, the length of audio query clips is 6s, and for MBM scheme, the bit-mask used in our experiment has seven bits set to. These results clearly show that BFP scheme outperforms other two schemes in retrieval accuracy in the conditions of sound-commixture mixed by various numbers of sound-source, including the most common s. Recognition Accuracy(%) 8 6 4 2 2 3 4 four soundsources two soundsources 3 s e c 6 s e c 9 s e c and Figure 6. Accuracy evalution according to query length TABLE I. Process approach two three four three and Recognition Accuracy(%) 8 6 4 2 Figure 5. two soundsources THE ACCURACY OF THREE SCHEMES BFP MBM 7-Bit 95.% 65.7% 63.2% 84.3% 3.8% 29.5% 68.4% 7.% 6.5% 67.6% 5.4% 5.3% 2 3 4 four soundsources B F P M B M and Recognition performance evaluation of BFP, MBM and Tab.II and Fig.6 show the recognition performance of BFP scheme when query length is changed. This result indicates that the accuracy increases as the length of the query prolongs. Also, the proposed scheme shows satisfactory performance with just three seconds long query. TABLE II. Process approach two three four three and white noise ACCURACY EVALUATION ACCORDING TO QUERY LENGTH Query Length 3s 6s 9s 9.3% 95.% 95.8% 8.8% 84.3% 85.% 6.9% 68.4% 68.7% 6.5% 67.6% 68.% V. CONCLUSIONS This paper proposes a novel modified audio fingerprinting algorithm based on scheme to recognize mixed audio signals in multiple environment. The proposed algorithm enhances the fingerprinting algorithm by dividing mixed audio signals into independent components which are close to their original, which guarantees great similarity between separated independent component s audio fingerprinting and original signals audio fingerprinting. It clearly outperforms original algorithm in recognizing audio signals in multiple environment. However, the corresponding relationship between separated independent signals from mixed audio signals and original signals is unknown that is, it is uncertain that which is the needed independent signals. So we have to put every separated independent signals audio fingerprinting into fingerprinting database to query, which will increases retrieval time undoubtedly. Although there already have some BSS algorithms with restrictive conditions to implement accurately separating, the effectiveness should be improved. Therefore, the improvement in exactly separating to reduce retrieval time is considered for future work. ACKNOWLEDGMENT This work was supported by the Science and Technology Project of Guangdong Province (Grant no. 23B93, Grant no. 23B63), the Science and Technology Project of Guangzhou City (Grant no. 23J2262), the Science and Technology Project of Guangdong Institute of Automation (Grant no. A246), and the Scientific Research Foundation of Guangdong Academy of Science for Young (Grant no. qnjj236). REFERENCES [] http://www.doreso.com/ [2] Cerquides, J.R. A real Time Audio Fingerprinting System for Advertisement Tracking and Reporting in FM Radio, Radioelektronika, 27. 7th International Conference, Apr. 224-25, 27, Brno, The Czech Republic, pp. 23-26. [3] E.G ómez, P.Cano, C.T.Gomes, etc. Mixed Watermarking-fingerprinting Approach for Integrity Verification of Audio Recordings, Proceedings of International Telecommunications Symposium ITS22, Sept. 22, Natal, Brazil, pp. 27-284. 567

[4] Haitsma, J. and T. Kalker. A highly robust audio fingerprinting system, Proceedings of the 3 rd International Conference on Music Information Retrieval, Oct. 3-7, 22, Paris, France, pp. 7-5. [5] Wooram Son, Hyun-Tae Cho, Kyoungro Yoon and Seok-pil Lee. Sub-fingerprint Masking for a Robust Audio Fingerprinting System in a Real-noise Environment for Portable Consumer Devices, IEEE Transactions on Consumer Electronics, vol. 56, pp. 56-6, Feb. 2. [6] Avery Wang. The Shazam Music Recognition Service, Communications of the ACM, vol. 49, pp.44-48, Aug. 26. [7] Jun-Yong Lee, Hyoung-Gook Kim. Audio Fingerprinting to Identify TV Commercial Advertisement in Real-Noisy Environment, 24 International Symposium on communications and Information Technology (ISCIT), Sep. 24-26, 24, Incheon, Korea, pp. 527-53. [8] Chahid Ouali, Pierre Dumouchel and Vishwa Gupta. A Robust Audio Fingerprinting Method for Content-Based Copy Detection, 24 2th International Workshop on Content-Based Multimedia Indexing (CBMI), Jun. 8-2, 24, Klagenfurt, Austria, pp. -6. [9] Kuo-Kai Shyu, Ming-Huan Lee, Yu-Te Wu, and Po-Lei Lee. Implementation of Pipelined FastICA on FPGA for Real-Time Blind Source Separation, IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 9, pp. 958-97, June 28. [] Lan-Da Van, Di-You Wu and Chien-Shiun Chen. Energy-Efficient FastICA Implementation for Biomedical Signal Separation, IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 22, pp. 89-822, Nov. 2. [] Da-Peng Guo, Qiu-Hua Lin. Fast decryption utilizing correlation calculation for BSS-based speech encryption system, 2 Sixth International Conference on Natural Computation (ICNC), Aug. -2, 2, Yantai, Shandong, pp. 428-432. 568