PoS(CENet2015)037. Recording Device Identification Based on Cepstral Mixed Features. Speaker 2

Similar documents
Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

Camera identification from sensor fingerprints: why noise matters

Laser Printer Source Forensics for Arbitrary Chinese Characters

IMPROVEMENTS ON SOURCE CAMERA-MODEL IDENTIFICATION BASED ON CFA INTERPOLATION

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

IDENTIFYING DIGITAL CAMERAS USING CFA INTERPOLATION

Mel Spectrum Analysis of Speech Recognition using Single Microphone

SOURCE CAMERA IDENTIFICATION BASED ON SENSOR DUST CHARACTERISTICS

Digital Media Authentication Method for Acoustic Environment Detection Tejashri Pathak, Prof. Devidas Dighe

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Audio Fingerprinting using Fractional Fourier Transform

Image Tampering Localization via Estimating the Non-Aligned Double JPEG compression

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Source Camera Identification Forensics Based on Wavelet Features

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Speech Recognition using FIR Wiener Filter

Detecting Resized Double JPEG Compressed Images Using Support Vector Machine

Design and Implementation of an Audio Classification System Based on SVM

Introduction of Audio and Music

Implementing Speaker Recognition

Detection of Image Forgery was Created from Bitmap and JPEG Images using Quantization Table

A Novel Algorithm for Hand Vein Recognition Based on Wavelet Decomposition and Mean Absolute Deviation

Watermarking-based Image Authentication with Recovery Capability using Halftoning and IWT

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

Source Camera Model Identification Using Features from contaminated Sensor Noise

Digital Audio Watermarking With Discrete Wavelet Transform Using Fibonacci Numbers

FPGA implementation of DWT for Audio Watermarking Application

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Gammatone Cepstral Coefficient for Speaker Identification

Identification of disguised voices using feature extraction and classification

Image De-Noising Using a Fast Non-Local Averaging Algorithm

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

VECTOR QUANTIZATION-BASED SPEECH RECOGNITION SYSTEM FOR HOME APPLIANCES

High-speed Noise Cancellation with Microphone Array

Audio Watermarking Using Pseudorandom Sequences Based on Biometric Templates

High capacity robust audio watermarking scheme based on DWT transform

STEGANALYSIS OF IMAGES CREATED IN WAVELET DOMAIN USING QUANTIZATION MODULATION

Introduction to Video Forgery Detection: Part I

Audio Watermarking Based on Multiple Echoes Hiding for FM Radio

Hiding Image in Image by Five Modulus Method for Image Steganography

Drum Transcription Based on Independent Subspace Analysis

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Multiple Sound Sources Localization Using Energetic Analysis Method

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION

A JPEG CORNER ARTIFACT FROM DIRECTED ROUNDING OF DCT COEFFICIENTS. Shruti Agarwal and Hany Farid

Automatic Morse Code Recognition Under Low SNR

Isolated Digit Recognition Using MFCC AND DTW

Detection of Rail Fastener Based on Wavelet Decomposition and PCA Ben-yu XIAO 1, Yong-zhi MIN 1,* and Hong-feng MA 2

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Detection of Misaligned Cropping and Recompression with the Same Quantization Matrix and Relevant Forgery

Distinguishing between Camera and Scanned Images by Means of Frequency Analysis

An Integrated Image Steganography System. with Improved Image Quality

Speech Synthesis using Mel-Cepstral Coefficient Feature

IMAGE TAMPERING DETECTION BY EXPOSING BLUR TYPE INCONSISTENCY. Khosro Bahrami and Alex C. Kot

Fragile Sensor Fingerprint Camera Identification

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Retrieval of Large Scale Images and Camera Identification via Random Projections

Drink Bottle Defect Detection Based on Machine Vision Large Data Analysis. Yuesheng Wang, Hua Li a

Communications Theory and Engineering

SGN Audio and Speech Processing

Distributed Speech Recognition Standardization Activity

Color PNG Image Authentication Scheme Based on Rehashing and Secret Sharing Method

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

A Review of Image Forgery Techniques

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Roberto Togneri (Signal Processing and Recognition Lab)

11th International Conference on, p

Camera Model Identification Framework Using An Ensemble of Demosaicing Features

Reversible data hiding based on histogram modification using S-type and Hilbert curve scanning

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Different-quality Re-demosaicing in Digital Image Forensics

Speech Perceptual Hashing Authentication Algorithm Based on Spectral Subtraction and Energy to Entropy Ratio

Electric Guitar Pickups Recognition

Automatic source camera identification using the intrinsic lens radial distortion

SGN Audio and Speech Processing

2018 IEEE Signal Processing Cup: Forensic Camera Model Identification Challenge

Wavelet-based Image Splicing Forgery Detection

SOUND SOURCE RECOGNITION FOR INTELLIGENT SURVEILLANCE

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

REVERSIBLE MEDICAL IMAGE WATERMARKING TECHNIQUE USING HISTOGRAM SHIFTING

Experimental Research on Cavitation Erosion Detection Based on Acoustic Emission Technique

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Demosaicing Algorithm for Color Filter Arrays Based on SVMs

Multiplexing Module W.tra.2

MFCC-based perceptual hashing for compressed domain of speech content identification

Forgery Detection using Noise Inconsistency: A Review

Campus Location Recognition using Audio Signals

Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models

Research Article A Robust Zero-Watermarking Algorithm for Audio

Robust Low-Resource Sound Localization in Correlated Noise

Passive Image Forensic Method to detect Copy Move Forgery in Digital Images

Exposing Digital Forgeries from JPEG Ghosts

Camera identification by grouping images from database, based on shared noise patterns

License Plate Localisation based on Morphological Operations

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Transcription:

Based on Cepstral Mixed Features 12 School of Information and Communication Engineering,Dalian University of Technology,Dalian, 116024, Liaoning, P.R. China E-mail:zww110221@163.com Xiangwei Kong, Xingang You School of Information and Communication Engineering,Dalian University of Technology,Dalian, 116024, Liaoning, P.R. China E-mail:kongxw@dlut.edu.cn, youxg@dlut.edu.cn Bo Wang 3 School of Information and Communication Engineering,Dalian University of Technology,Dalian,116024 Liaoning, P.R. China E-mail: bowang@dlut.edu.cn The authenticity of the recording evidence is the foundation of legitimacy and relevance, which is the primary condition of recording evidence. With the springing up of private recording evidence, there is an urgent need for authenticity identification of recordings. That the evidence shall be from an accurate and legitimate source is a prerequisite for three elements. Recording equipment identification is the core content of sources of evidence. This article studies the characteristics of the recording device parameters, proposing three characteristic parameters of recording equipment such as the proportion of time-domain low roughness, etc. And combined with improved Mel Frequency Cepstrum Coefficient (MFCC) feature parameters characteristic parameters constitute a hybrid 92-dimensional. According to experimental analysis, with 10 different brands and models of recording device (including five different brands and models commonly used in voice recorder and five kinds of commonly used different brands and models of mobile phones), 60 young men and women, each of 10 different voice, the same type of equipment to record each 2, shows that mixed characteristic parameters can effectively characterize the characteristics of the recording equipment. Recognition rate increases by more than 6% compared with ordinary cepstrum. CENet2015 12-13 September 2015 Shanghai, China 1 Speaker 2 This work is supported by the Research Fund for the Doctoral Program of Liaoning Province (Grant No. 20131014), the Open Fund of Artificial Intelligence Key Laboratory of Sichuan Province (Grant No. 2012RZJ01), and also the Fundamental Research Funds for the Central Universities (Grant No. DUT13RC201). 3 Corresponding Author Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). http://pos.sissa.it/

1. Introduction The recording equipment classification is the latest audio forensics research hotspot [1]. In the course of the audio evidence provided, somebody claims that he used a device to record audio evidence, but there is no effective way to verify it, hence, people carry out researches in this area [2-3]. In 2006, Lukas [4] studied on the effects of the sensor output noise on VCR recognition. Since 2007, Dirik [5-6], who studied the impact of dust characteristics of the sensor to VCR recognition, achieved valuable results. Tsai and Li et al. had a in-depth study cellular phone recognition[7-8]. Cemal et al. extracted cell phone s characteristics from the cell phone recording signal[9], and using the MFCC parameters as feature parameters and SVM as a recognition model, a high recognition rate of 96% is achieved for 14 different phones. Cemal had studied and analyzed the characteristic parameters and recognition model of recording equipment, its characteristic parameters and recognition model are based on the existing speaker recognition features and models, either Fourier Transformation parameters or MFCC is not for a special recording device identification parameters [10]. Characteristic parameters that specifically for recording equipment are still very few. In terms of the MFCC, low-dimensional parameters generally reflect the speaker's semantic features, and highdimensional parameters generally reflect the speaker's personality traits. The MFCC will definitely affect recognition of recording devices accuracy rate, when it is used as a characteristic parameter of recording equipment. Therefore, we must find or construct characteristic parameters consistent with the characteristics of the recording device. From the recording equipment itself, taking into account copyright and other reasons, there may be a difference in terms of recording circuit and chip, sampling rate, the number of quantization bits and the compression algorithm, where we can find the recording equipment personality characteristics. Also, recording equipment parameters are not only mixed in semantic features bands but also mixed in speaker feature parameters. Hence, considering the lack of special characteristic parameters of recording equipment, we study and propose a number of characteristic parameters characterizing feature of recording equipment firstly, and then combining with existing audio feature consist of mixed feature of recording equipment. 2. Propose of two Time-frequency Domain Characteristic Parameters Currently on the market a lot of recording equipment or phone recording material have adopted the compression. Different compression algorithms and filtering algorithm makes audio signal present different time-frequency domain features, and at present there is no research on this aspect. Therefore, it is necessary to analyze the new characteristics parameters of the recording device according to this situation. 2.1 Amplitude Proportion For the recording device, considering patents and other reasons, recording equipment differ from each other in circuit and personality characteristics, which constitute the personality characteristics of the recording equipment. Minimum amplitude proportion is a parameter reflecting quantization bit number of the device. In the recording signal, the amplitude of the smaller sampling points occupies a certain proportion. After normalized quantifying the signal, the minimum amplitude and the number of quantization bits show the following relationship: x = K 2 -M (2.1) in which M is quantization bit number, x min K is the K-th minimum amplitude. Any amplitude is an integer multiple of the minimum quantization value. Amplitude proportion is: min K 2

Aratio = num num (2.2) K min K / total As the statistical properties of the speech signal satisfies Laplace distribution, amplitude distribution of the speech signal satisfies the following equation: - p ( x) = 0.5ae a x, a = 2 s (2.3) L x 2.2 Time-domain Low Proportion Roughness In the speech signal processing, in order to improve the quality of hearing or speaker recognition rate, people pay more attention to vowel, larger amplitude of a signal, and optimize in the spectrum. They often overlook the processing of the auditory insensitive low amplitude sampling points, which often carry characteristics of the amplifier circuit's non-linear area and compression algorithms personality characteristics such as information of quantify bits which reflects the characteristics of the recording equipment. According to the probability distribution of the voice, the voice in the amplitude of the lower case, were evenly distributed. However, the proportion of low amplitude is not uniformly distributed. Each device presents a unique personality trait in the low-amplitude. The proportion of time-domain low roughness's definition process is given as following. It can be defined by: The proportion a i in each frame is defined as follows: count( xi ) a = i count _ total (2.5) count( x i ) denotes the number of the data whose amplitude is x i in the frame, count _ total is frame length. Let: when It can be defined as follows: xi = i - 2 M bi = ai - ai - 1 1 (2.4) (2.6) i = 1, b = 0 (2.7) b = { b, b,..., b } (2.8) i~ j i i+ 1 i+ j Then we can make the following definition: c ij H bi ~ jbi ~ j = (2.9) j c ij gives roughness of a total of j points starting of the i-th minimum amplitude. If the low amplitude were evenly distributed, and a i satisfy : ai = a, " i = 1, 2,3, L, where a is a constant. Equation (2.6) may be represented as: Ifb = 0, " i = 1, 2,3, L,then: i c ij H bi ~ jbi ~ j = = 0 (2.10) j 3

2.3 Characteristic Mixing Parameters of Recording Device According to the above analysis, this chapter intends to adopt the following mixing 92- dimensional feature mixing parameters as the characteristic parameter of recording equipment. Table 1 shows the details. The MFCC and DCT minimum amplitude proportion features based on frequency domain have been proved to be effective in prior works. In last two subsection, we have demonstrated that the effect of quantization step of difference devices. The feature vectors in spacial domain are sensitive to the effect. A reasonable approach can be obtained to combine the time-domain and frequency-domain features to construct a better classifier. Base on this, we mix 44- dimensional MFCC features, 10-dimentional DCT minimum amplitude proportion features, 20- dimensional time domain minimum amplitude proportion features and 20-dimensional timedomain low proportion roughness features for the feature vector. Mixed characteristic parameters MFCC1-10,33-64 10-dimensional DCT minimum amplitude proportion 20-dimensional minimum amplitudeproportion 20-dimensional time-domain low proportion roughness Description Using 64-dimensional MFCC parameters low-dimensional and high-dimensional parts Frequency domain features, after DCT transform, calculate the number of the minimum value of 10 points in the proportion of all point values. Time-domain characteristics, calculate the number of the minimum value of 20 points in the proportion of all point values. Definition is shown in Equation (2.3). Definition is shown in Equation (2.6). Table 1: Time-frequency mixing characteristic parameters of recording equipment 3. Experimental Results and Analysis The recording device used in the experiment are five recording device ( each type of equipment is two). Recording subjects were 60 persons consist of 30 young men and 30 women. Everyone speaks 10 different Mandarin, and every word is about 10 seconds, generating 6000 wav audio data. The sampling frequency is 44.1KHz, quantization bits are all 16-bit, frame length is 2048 points, a frame shift of 50%.Take a word each person and each device as training audio, the other as a test audio. The basic situation of these five voice recorder are as follows: (1) Sony PCM-M10: Recordable: MP3 format, sampling frequency is 44.1KHz (bit rate is 64Kbps, 128Kbps, 320Kbps); PCM format, sampling frequency selectable from 22.05KHz, 44.1KHz, 48KHz, 96KHz, respectively, can be quantified into a 16bit / 24bit; hereinafter referred to by Sony. (2) Tong Fang TF-A20: MP3 (sampling frequency is 32KHz, 192Kbps), hereinafter referred to by Tong Fang; (3) Jing Hua DVR-818: MP3 (sampling frequency of 32KHz, 128Kbps), hereinafter referred to by Jing Hua; (4) Modern HYM-3698: MP3 (sampling frequency of 44.1KHz, 128Kbps), hereinafter referred to as the Modern; (5) Sanyo ICR-PS004M: MP3 (sampling frequency of 44.1KHz, bit rate of 192Kbps), hereinafter referred to by Sanyo. Baseline system uses 12-dimensional MFCC parameters that Cemal proposed in 2012 as a baseline characteristic parameters. Actually voice signal characteristic parameters including the speaker characteristic parameters have the best noise immunity. MFCC parameters characterize personality traits of the most effective. 4

Recognition model uses SVM classifier. The proposed method with hybrid characteristic parameters is compared with a baseline proposed in a paper [9]. Experimental results are listed in Table 2. Sony Sanyo Modern Tong Fang Jing Hua AVGERAGE Baseline proposed 82.2% 74.6% 76.9% 65.1% 68.4% 73.4% Proposed method 91.7% 78.5% 81.4% 73.0% 75.5% 80.0% Table 2: Identify performance comparison of no projection of hybrid feature parameters Table 2 gives a comparison of recognition rate between hybrid characteristic parameters and the baseline system. Recognition mode uses the text-independent manner. From the table, recognition rate of hybrid characteristic parameters increases by more than 6% compared with baseline system. The most obvious improvement is Sony, which improve by 9.5%. For a variety of devices, recognition rate of Sony is highest, Sanyo and modern secondly, between 75% to 83%. Tong Fang and Jing Hua are poor, around 70%. An average accuracy of 80.0% is achieved, compared with that of 73.4% obtained by the baseline. The results shows that combination of the proportion of low time-domain roughness and MFCC can improve the performance of the device identification Table 3 shows the result of picking up characters from characteristic parameters of base line and mixing characteristic parameters through the way of orthogonal projection operator. From the table, it is obvious that adopting the orthogonal projection operator improves the recognition rate of system. For example, equipments like Sony, Sanyo and Modern get a significant improvement of 3% to 5% approximately. However, the improvement of property seems not very obvious for Tong Fang and Jing Hua, whose improvements are approximately below 1%. with orthogonal projection operator Proposed method with orthogonal projection operator Sony Sanyo Modern Tong Fang Jing Hua AVGERAGE 86.3% 77.9% 80.6% 66.7% 69.2% 76.1% 93.1% 83.2% 84.0% 74.4% 75.9% 82.1% Table 3: Comparison of Identifying performance by orthogonal projection of mixing characteristic parameters 4. Conclusion The original-evidence research mainly consists of obtaining evidence with the recording equipment, recognizing the time and place of recording and so on. The progress of recognizing recording time and place achieve less among home and abroad. The judge mainly depends on the relevance of other evidence during the actual operation. But research of obtaining evidence of recording equipment is still the hot issue among domestic and overseas in terms of speech single processing, which remains in the technology trigger and has not raised or analyzed the special characteristic parameter of recording evidence. The article goes deep into the characteristic parameter of recording evidence, raises the time-domain low proportion roughness and other two characteristic parameters of recording evidence, which constitutes 92- dimensional feature mixing parameters combined with the modified MFCC characteristic parameters. The experiment demonstrates that the mixed characteristic parameters are able to represent the feature of recording evidence effectively, by collecting sixty youth that ten different speech each of them and two speech of the same model with five different brand of recording evidence, whose recognition rate raises up by 10.4 percent comparing with the ordinary parameters of cepstrum. 5

References [1] Y. Panagakis, C. Kotropoulos. Automatic telephone handset identification by sparse representation of random spectral features[c]. MM and Sec'12 - Proceedings of the 14th ACM Multimedia and Security Workshop, ACM, USA. pp, 91-95(2012). [2] O. Farooq, S. Datta, J. Blackledge. Blind tamper detection in audio using chirp based robust watermarking[j]. WSEAS Transactions on Signal Processing, 4(4): 190-200(2008). [3] M. Unoki, R. Miyauchi, Detection of tampering in speech signals with inaudible watermarking technique[c]. Proceedings of the 2012 8th International Conference on Intelligent Information Hiding and Multimedia Signal Processing(IIH-MSP), IEEE, USA. pp, 118-121(2012). [4] J. Lukas, J. Fridrich, M. Goljan. Digital camera identification from sensor pattern noise[j]. IEEE Transaction on Information Forensics and Security, 1(2): 205 214(2006). [5] E. Dirik, H. T. Sencar, N. Memon. Source camera identification based on sensor dust characteristics[c]. Proceedings IEEE Workshop Signal Processing Applications Public Security Forensics, IEEE, USA. pp,1-6(2007). [6] A. E. Dirik, H. T. Sencar, N. Memon. Digital single lens reflex camera identification from traces of sensor dust[j]. IEEE Transaction on Information Forensics and Security, 3(3): 539 552(2008). [7] M. J. Tsai, C. L. Lai, J. Liu. Camera/mobile phone source identification for digital forensics[c]. Proceeding of IEEE International Conference on Acoustics, Speech Signal Processing, IEEE, USA. pp, II-221 - II-224(2007). [8] O. Celiktutan, B. Sankur, I. Avcibas. Blind identification of source cell phone model[j]. IEEE Transaction on Information Forensics and Security, 3(3): 553 566(2008). [9] C. Hanilci, F. Ertas. Recognition of Brand and Models of Cell-Phones From Recorded Speech Signals[J]. IEEE Transaction on Information Forensics and Security, 7(2): 625-634(,2012). [10] S. Gupta, S. Cho, C.-C.J. Kuo. Current Developments and Future Trends in Audio Authentication [J]. IEEE MultiMedia, 19(1): 50-59(2012). 6