An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

Similar documents
Audio Fingerprinting using Fractional Fourier Transform

MFCC-based perceptual hashing for compressed domain of speech content identification

High capacity robust audio watermarking scheme based on DWT transform

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION

DWT BASED AUDIO WATERMARKING USING ENERGY COMPARISON

Introduction of Audio and Music

Speech Perceptual Hashing Authentication Algorithm Based on Spectral Subtraction and Energy to Entropy Ratio

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Design and Implementation of an Audio Classification System Based on SVM

Digital Audio Watermarking With Discrete Wavelet Transform Using Fibonacci Numbers

Audio Watermarking Using Pseudorandom Sequences Based on Biometric Templates

Application of Adaptive Spectral-line Enhancer in Bioradar

Blind Source Separation for a Robust Audio Recognition Scheme in Multiple Sound-Sources Environment

PoS(CENet2015)037. Recording Device Identification Based on Cepstral Mixed Features. Speaker 2

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Localized Robust Audio Watermarking in Regions of Interest

FPGA implementation of DWT for Audio Watermarking Application

Evaluation of Audio Compression Artifacts M. Herrera Martinez

High-speed Noise Cancellation with Microphone Array

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

HTTP Compression for 1-D signal based on Multiresolution Analysis and Run length Encoding

TWO ALGORITHMS IN DIGITAL AUDIO STEGANOGRAPHY USING QUANTIZED FREQUENCY DOMAIN EMBEDDING AND REVERSIBLE INTEGER TRANSFORMS

Environmental Sound Recognition using MP-based Features

DWT based high capacity audio watermarking

Open Access Research of Dielectric Loss Measurement with Sparse Representation

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

Digital Watermarking Using Homogeneity in Image

Sound Quality Evaluation for Audio Watermarking Based on Phase Shift Keying Using BCH Code

A variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

IMAGE TYPE WATER METER CHARACTER RECOGNITION BASED ON EMBEDDED DSP

A New Fake Iris Detection Method

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Sound pressure level calculation methodology investigation of corona noise in AC substations

Open Access Sparse Representation Based Dielectric Loss Angle Measurement

Auditory modelling for speech processing in the perceptual domain

Color Image Segmentation in RGB Color Space Based on Color Saliency

Speech and Music Discrimination based on Signal Modulation Spectrum.

Frequency Demodulation Analysis of Mine Reducer Vibration Signal

Multiple Watermarking Scheme Using Adaptive Phase Shift Keying Technique

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Research on Analysis of Aircraft Echo Characteristics and Classification of Targets in Low-Resolution Radars Based on EEMD

Speech/Music Discrimination via Energy Density Analysis

Introduction to Audio Watermarking Schemes

EMC ANALYSIS OF ANTENNAS MOUNTED ON ELECTRICALLY LARGE PLATFORMS WITH PARALLEL FDTD METHOD

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythm Analysis in Music

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Study on OFDM Symbol Timing Synchronization Algorithm

IMPROVING AUDIO WATERMARK DETECTION USING NOISE MODELLING AND TURBO CODING

Laser Printer Source Forensics for Arbitrary Chinese Characters

Automatic Morse Code Recognition Under Low SNR

TRANSIENT NOISE REDUCTION BASED ON SPEECH RECONSTRUCTION

A multi-class method for detecting audio events in news broadcasts

Rhythm Analysis in Music

Research Article A Robust Zero-Watermarking Algorithm for Audio

Voice Activity Detection for Speech Enhancement Applications

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Application of Singular Value Energy Difference Spectrum in Axis Trace Refinement

Watermarking-based Image Authentication with Recovery Capability using Halftoning and IWT

LPSO-WNN DENOISING ALGORITHM FOR SPEECH RECOGNITION IN HIGH BACKGROUND NOISE

Multi Modulus Blind Equalizations for Quadrature Amplitude Modulation

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Adaptive Selection of Embedding. Spread Spectrum Watermarking of Compressed Audio

Audio Watermarking Based on Multiple Echoes Hiding for FM Radio

Efficient and Robust Audio Watermarking for Content Authentication and Copyright Protection

Real time speaker recognition from Internet radio

The main object of all types of watermarking algorithm is to

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Speech Synthesis using Mel-Cepstral Coefficient Feature

Adaptive filter and noise cancellation*

A SCALABLE AUDIO FINGERPRINT METHOD WITH ROBUSTNESS TO PITCH-SHIFTING

An Improvement for Hiding Data in Audio Using Echo Modulation

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Cepstrum alanysis of speech signals

Journal of mathematics and computer science 11 (2014),

Multi-GI Detector with Shortened and Leakage Correlation for the Chinese DTMB System. Fengkui Gong, Jianhua Ge and Yong Wang

Atmospheric Signal Processing. using Wavelets and HHT

Audio Watermarking Scheme in MDCT Domain

Analysis of LMS Algorithm in Wavelet Domain

Reversible data hiding based on histogram modification using S-type and Hilbert curve scanning

Analysis on detection probability of satellite-based AIS affected by parameter estimation

REpeating Pattern Extraction Technique (REPET)

High Capacity Audio Watermarking Based on Fibonacci Series

Feature Extraction of Acoustic Emission Signals from Low Carbon Steel. Pitting Based on Independent Component Analysis and Wavelet Transforming

Research on Extracting BPM Feature Values in Music Beat Tracking Algorithm

Detection of Rail Fastener Based on Wavelet Decomposition and PCA Ben-yu XIAO 1, Yong-zhi MIN 1,* and Hong-feng MA 2

A Novel Algorithm for Hand Vein Recognition Based on Wavelet Decomposition and Mean Absolute Deviation

Gammatone Cepstral Coefficient for Speaker Identification

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses

Journal of American Science 2015;11(7)

Noise Removal of Spaceborne SAR Image Based on the FIR Digital Filter

Applications of Music Processing

An Improved Voice Activity Detection Based on Deep Belief Networks

Transcription:

Journal of Information & Computational Science 8: 14 (2011) 3027 3034 Available at http://www.joics.com An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Jianguo JIANG a,, Kaige MA a, Mingxing WEN a, Yongqing LIU a, Shuangji WANG a,b a School of Computer Science and Technology, Xidian Univ., Xi an 710071, China b No.91388 Troops of PLA, Zhanjiang 524022, China Abstract Aiming at providing a solution to the problems that the feeblish robustness of general algorithms in dealing with the linear speed change attacks and their overlarge fingerprint memory space, an audio fingerprint algorithm based on db4 wavelet transformation combined with statistical characteristics of wavelet domain is proposed. At first, decompose the audio signal in 5-layer wavelet. Then calculate the plus-minus change of low-frequency sub-band s wavelet coefficient, the energy distribution center, the energy of sub-band in wavelet domain, and the variance of wavelet coefficient. Finally, by using the results calculated as parameters of audio fingerprints, the 8-bit fingerprint block per frame was generated. Simulation results suggest that this algorithm shows excellent robustness in dealing with the attacks toward ordinary stick signal content and additive white Gaussian noise, and linear speed change attacks. Moreover, the memory space taken up by fingerprints is less. Keywords: Audio Fingerprints; Wavelet Transform; Linear Attack; Additive White Gaussian Noise; Robustness 1 Introduction In order to solve the difficulties in searching the needed songs among mass audio information, a digital audio fingerprinting technology with automatic music recognition came into being. Audio fingerprint is a compact digital signature based on content, which can represent the important acoustic characteristics of a piece of music. Its main purpose is to establish an effective mechanism to compare the two audio data in human auditory perception [1,2]. Wavelet transform is a local transformation on a signal in Time and Frequency domains, which can effectively extract information from the signal, and do multi-scale detailed analysis on a function or signal by functions such as scaling and translation, thereby can solve many difficult issues which can not be solved by the Fourier transform. CSLu and others proposed a method, which is, supported by the Fundamental Research Funds for the Defense of China (NO. D1120060967). Corresponding author. Email address: jjg3306@126.com (Jianguo JIANG). 1548 7741 / Copyright 2011 Binary Information Press December 2011

3028 J. JIANG et al. / Journal of Information & Computational Science 8: 14 (2011) 3027 3034 by adopting one-dimensional continuous wavelet transformation to extract audio characteristics, based on this method the audio fingerprint generation method for identification and authentication respectively was constructed. AL.Ghouti and others used balanced multiwavelets (Balanced Multiwavelets, BMW) extraction coefficient feature to propose an audio hashing algorithm [3]. In some documents, the author proposed audio fingerprint algorithm by combining computer vision. Y. Ke and others made audio signal spectrum as a two-dimensional images to handle [4], S. Bahja and others applied computer vision technology into data stream processing, and generated audio fingerprints by the Haar wavelet transform and Min Hash technology, and used Locality Sensitive Hashing(LSH) technique [5,6] in audio fingerprint retrieval. In terms of audio fingerprint algorithm in time and frequency domains, this paper considered how to improve the robustness in dealing with the linear speed change attacks and reduce its memory space for fingerprints, and proposed an audio fingerprint algorithm based on db4 wavelet transform, which combined wavelet transform with audio fingerprint algorithm. During the process of the algorithm, the audio signal was decomposed into 5-layer wavelet, and then calculated the plus-minus change of low-frequency sub-band s wavelet coefficient, the energy distribution center, the energy of sub-band in wavelet domain, and the variance of wavelet coefficient. Finally, by using the results calculated as the parameters of the audio fingerprints, the 8-bit fingerprint block per frame was generated. Comparison of simulation results suggest that this algorithm shows excellent robustness and identification, and the fingerprint is smaller in size. Using the index relationship established between the audio fingerprint algorithm and audio information, it can realize audio information real-time searching, which greatly improve the efficiency of audio searching [7,8]. 2 Algorithm Process The main steps of the algorithm are as follows: (1)Pretreatment, converts the input audio signal to mono signal whose down-sampling frequency is 5KHz. (2)Framing, windowing and overlapping, the length of the frame is 0.37s, using Hanning window, the overlap factor is P=28/32. The formula of Hanning window is as follows: w(n) = { 0.5 [1 cos(2πn/(n 1))], 0 n N 1 0, else (3)Using the wavelet based on db4 to decompose each frame of audio signal in 5-layer wavelet. A total of six components are achieved which include one approximation component ca5 and five details component cd1,, cd5. (4)Calculate the variance of the wavelet coefficients, the zero-crossing rate of wavelet coefficients, the centroid of wavelet domain and the energy of sub-band in wavelet domain of each component. (5)Extract hash bite value from each set of parameters in order to get a set of audio fingerprints of 8bits for per frame. The principle framework of the algorithm is shown in Fig. 1.

J. JIANG et al. / Journal of Information & Computational Science 8: 14 (2011) 3027 3034 3029 3 Generation of Fingerprint Fig. 1: Principle framework of the algorithm 3.1 The variance of the wavelet coefficients The formula of the variance of the wavelet coefficients [9] is σ(i, j) = 1 N (cd j cd) 2 N j=1 Where, cd = N cdj, σ (i, j) represents the variance of the j-th wavelet coefficient in the i-th j=1 frame, and N represents the total number of wavelet coefficients (The following definitions are the same with the definitions above). 3.2 The zero-crossing rate of the wavelet domain The zero-crossing rate of the wavelet domain reflects the plus-minus change of low-frequency sub-band s wavelet coefficients [9] when audio signal has been dealt with wavelet transform. The formula of it is as follows: zcr m = 1 sign[x(n)] sign[x(n 1)] w(n m) 2 m Where, x(n) is the n-th value of the wavelet coefficients in the m-th frame, which separately correspond to ca 5 and cd 5 ; W (n) is the window function, the length of which is N. if x(n) 0, then sign [x(n)] = 1; otherwise sign [x(n)] = 0. 3.3 The centroid of the wavelet domain The centroid of the wavelet domain is expressed as the center of energy distribution. In wavelet domain, the centroid of the audio signal changes with time, so it can be the characteristics of reflecting the non-stationarity of audio signal.

3030 J. JIANG et al. / Journal of Information & Computational Science 8: 14 (2011) 3027 3034 The computational formula of the centroid [9] is: N i x(i) 2 centroid = N x(i) 2 Where, x(i) is the i-th wavelet coefficient. 3.4 The energy of sub-band in wavelet domain The change in amplitude of the audio signal is an important dynamic characteristic of the audio signal, and the change in amplitude can reflect the change of energy. We can use the wavelet coefficients to measure the energy characteristics of audio because of the fact that the average rate of the wavelet coefficients corresponds to the average rate in time domain. The formula of calculating the energy of sub-band [9] is as follows: energy = 1 N x(i) 2 N 3.5 Generation of fingerprint The formula of the Hash-bit value sequence of the variance of wavelet coefficient is as follows: { 1, σ(n, m) σ(n, m + 1) (σ(n + 1, m) σ(n + 1, m + 1)) > 0 F 1 (n, m) = (1) 0, σ(n, m) σ(n, m + 1) (σ(n + 1, m) σ(n + 1, m + 1)) 0 Where, F 1 (n, m) represents the m-th bit value in the n-th frame. Besides, the formulas of the Hash-bit value of the zero-crossing rate of the wavelet coefficients, the centroid of the wavelet domain and the energy of the wavelet domain are as follows: 1, S c (n) N S c (i) > 0 F 2 (n, c) = 0, S c (n) N (2) S c (i) 0 Where, F 2 (n, c) corresponds to the Hash bit value of the zero-crossing rate of the wavelet coefficients, the centroid of the wavelet domain and the energy in wavelet domain, S c (n) represents the zero-crossing rate of the wavelet coefficients, the centroid of the wavelet domain or the energy in wavelet domain for the n-th frame. Set c = 1, which represents the zero-crossing rate of the wavelet coefficients; Set c = 2, which represents the centroid of the wavelet domain; Set c = 3, which represents the energy in wavelet domain. We can get the final formula of the audio fingerprint bit value for per frame: F 1 (n, m), 0 < m 5 F 2 (n, 1), m = 6 F (n, m) = F 3 (n, 2), m = 7 F 2 (n, 3), m = 8 (3)

J. JIANG et al. / Journal of Information & Computational Science 8: 14 (2011) 3027 3034 3031 4 Simulation Results and Comparison The simulation uses 100 randomly selected popular songs as test audios. Randomly select 4 initial points and intercept audio clips as long as 3.3s for each test audio, so there are 400 audio clips in total as experimental samples. After attack treatment, use the algorithm proposed in this paper and the traditional Mel frequency cepstrum coefficients (MFCC) algorithm respectively to make a simulation comparison. The results show that the algorithm proposed in this paper has better robustness for general content attacks, especially for linear speed change attack. The simulation uses Bit Error Rate(BER), Correct Identification Rate(CIR) and Best Recognition Rate(BRR) to measure the robustness of the algorithm [10,11]. The experimental environment is Windows XP, CPU 1.61GHz, 512MB memory; The tools used in the experiment include MATLAB 6.5, Adobe Audition 3.0. 4.1 The robustness analysis on the attack treatment of the common stick signal content For the attack treatment of the common stick signal content, the average BER between the fingerprint of the attacked audio clips and the fingerprint of the source audio is shown in Fig. 2. (Of all the figures in this paper, the dotted line represents the algorithm based on db4 wavelet characteristics, and the solid line represents the MFCC algorithm). Fig. 2: Comparison of the average bit error rate(ber) under different attack for the algorithm In Fig. 2, attack type 1-20 are respectively 32Kbps MP3 Compression Attack,128Kbps MP3 Compression Attack, Band-pass filter(bpk) attack, Amplitude Compression Attack,Equalization Attack, Echo Attack, Time Scale Modification Attack(TSM are separately ±2%, ±4% and ±5%, and the principle that the negative after the positive is taken in the figure) and Liner Speed Change Attack(LSC are separately ±1%, ±2%, ±3%, ±4% and ±5%). Fig. 2 shows that the average BER of the algorithm under attack is more stable than the MFC-

3032 J. JIANG et al. / Journal of Information & Computational Science 8: 14 (2011) 3027 3034 C algorithm, especially in terms of Liner Speed Change Attack [12] and Time Scale Modification Attack; while MFCC has advantages relatively in terms of 32Kbps MP3 Compression Attack, 128Kbps MP3 Compression Attack, Band-pass filter(bpk) attack, Amplitude Compression Attack and Equalization Attack. The analysis on the correct identification rate and the best recognition rate of the algorithm under different attacks are shown in Fig. 3. Fig. 3 shows that the algorithm based on db4 wavelet (a) Comparison of correct identification rate (CIR) (b) Comparison of best recognition rate (BRR) Fig. 3: Performance of the algorithm when under different attacks statistical characteristics has stronger robustness in terms of Liner Speed Change Attack, so as to overcome the shortcoming of weak robustness for most algorithms. 4.2 The robustness analysis of the additive white gaussian noise The robustness of the additive white Gaussian noise when the algorithm is used under different degrees is shown in Fig. 4. In this experiment, the signal to noise ratio(snr)of the additive white Gaussian noise is separately set to 20dB, 15dB, 10dB, 5dB, 3dB and 2dB. Fig. 4 shows that the average BER of the algorithm based on db4 wavelet statistical characteristics is more stable than that of the MFCC algorithm when under the attack of the additive white Gaussian noise; while in terms of the correct identification rate, the algorithm based on db4 wavelet statistical characteristics shows a better performance in low signal to noise ratio(snr), but it doesn t improve much as the SNR grows; in terms of the best recognition rate, the algorithm based on db4 wavelet statistical characteristics is better than the MFCC algorithm in the case of low signal to noise ratio (SNR). 5 Conclusion This paper proposes an audio fingerprinting algorithm based on db4 wavelet statistical characteristics. Use the plus-minus change of low-frequency sub-band s wavelet coefficients after wavelet transform, the energy distribution center in wavelet domain, the energy of sub-band in wavelet domain, and the variance of wavelet coefficients as parameters of extracting audio fingerprinting. The results of simulation and comparison with the MFCC algorithm show that the algorithm has

J. JIANG et al. / Journal of Information & Computational Science 8: 14 (2011) 3027 3034 3033 (a) Comparison of average bit error rate (BER) (b) Comparison of correct identification rate (CIR) (c) Comparison of the best recognition rate (BRR) Fig. 4: Performance of the algorithm when under additive white gaussian noise attack better robustness and higher recognition rate. But it has relatively weak ability while dealing with band-pass filter attack, amplitude compression attack and equilibrium attack. In future research, it is necessary to have a further study in the improvement of fingerprint size and the robustness while coping with the above three attacks, so as to increase the efficiency and the correct recognition rate of the algorithm. Acknowledgement This work is supported by the Fundamental Research Funds for the Defense of China (NO. D1120060967). References [1] Yaduo Liu, Wei Li, Xiaoqiang Li, A Robust Compressed-Domain Music Fingerprinting Technique Based on MDCT Spectral Entropy, ACTA ELECTRONICA SINICA, 38: 1172 1176, 2010.

3034 J. JIANG et al. / Journal of Information & Computational Science 8: 14 (2011) 3027 3034 [2] C. S. Lu, Audio Fingerprinting Based on Analyzing Ttme-Frequency Localization of Signals, Multimedia Signal Processing, pp. 174 177, 2002. [3] L. Ghouti and A. Bouridane, A Robust Perceptual Audio Hashing Using Balanced Multiwavelets, In Intemational Conference 011 Acoustics, Speech and Signal Processing, 5 : 209 212, 2006. [4] Y. Ke, D. Hoiem, R. Sukthankar, Computer Vision for Music Identification, Proceedings of Computer Vision and Pattem Recognition, pp. 597 604, 2005. [5] S. Bahja and M. Covell, Content Fingerprinting Using Wavelets, In Conference on Visual Media Production, pp. 198 207, 2006. [6] S. Bahja and M. Covell, Audio Fingerprinting Combining Computer Vision and Data Stream Processing, In International Conference On Acoustics, Speech and Signal Processing, 2: 213 216, 2007. [7] G. H. Li, D. F. Wu, J. Zhang, Concept framework for audio information retrieva: ARF, Journal of Computer Science and Technology, 18: 667 673, 2003. [8] J. Haitsma and T. Kalker, A Highly Robust Audio Fingerprinting System, In Proceedings of International Conference on Music Information Retrieval, 2002. [9] Jiming Zheng, Guohua Wei, Yu Wu, New effective method on content based audio feature extraction, COMPUTER ENGINEERING AND APPLICATIONS, 45: 131 137, 2009. [10] S. Baluja and M. Covell, Waveprint: Efficient Wavelet-based Audio Fingerprinting, Pattern Recognition, 41: 3467 3480, November, 2008. [11] Y. Jiao, B. Yang, M. Li and X. Niu, MDCT-Based Perceptual Hashing for Compressed Audio Content Identification, In IEEE Workshop on Multimedia Signal Processing, PP. 381 384, 2007. [12] Jingbing LI, Yingbin WEI, A Novel Watermarking Algorithm Robust to Local Nonlinear Geometrical Attacks, Journal of Computational Information Systems, Vol. 3 (5): 2181 2186, 2007.