Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Similar documents
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Mel Spectrum Analysis of Speech Recognition using Single Microphone

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

Audio Fingerprinting using Fractional Fourier Transform

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Change Point Determination in Audio Data Using Auditory Features

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Gammatone Cepstral Coefficient for Speaker Identification

SOUND SOURCE RECOGNITION AND MODELING

Speech Signal Analysis

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering

Isolated Digit Recognition Using MFCC AND DTW

MFCC AND GMM BASED TAMIL LANGUAGE SPEAKER IDENTIFICATION SYSTEM

Speech Synthesis; Pitch Detection and Vocoders

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Using RASTA in task independent TANDEM feature extraction

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Relative phase information for detecting human speech and spoofed speech

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Original Research Articles

RECENTLY, there has been an increasing interest in noisy

Chapter 4 SPEECH ENHANCEMENT

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Speech/Music Change Point Detection using Sonogram and AANN

Text and Language Independent Speaker Identification By Using Short-Time Low Quality Signals

Speech Compression Using Voice Excited Linear Predictive Coding

Introduction of Audio and Music

An Improved Voice Activity Detection Based on Deep Belief Networks

Audio Signal Compression using DCT and LPC Techniques

Robust telephone speech recognition based on channel compensation

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

Identification of disguised voices using feature extraction and classification

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

Real time speaker recognition from Internet radio

ANALYSIS OF ACOUSTIC FEATURES FOR AUTOMATED MULTI-TRACK MIXING

Speech Synthesis using Mel-Cepstral Coefficient Feature

Spectral analysis of seismic signals using Burg algorithm V. Ravi Teja 1, U. Rakesh 2, S. Koteswara Rao 3, V. Lakshmi Bharathi 4

DWT and LPC based feature extraction methods for isolated word recognition

651 Analysis of LSF frame selection in voice conversion

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

ENHANCEMENT OF THE TRANSMISSION LOSS OF DOUBLE PANELS BY MEANS OF ACTIVELY CONTROLLING THE CAVITY SOUND FIELD

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

A Real Time Noise-Robust Speech Recognition System

Autonomous Vehicle Speaker Verification System

Audio processing methods on marine mammal vocalizations

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. Department of Signal Theory and Communications. c/ Gran Capitán s/n, Campus Nord, Edificio D5

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

LOCALIZATION AND IDENTIFICATION OF PERSONS AND AMBIENT NOISE SOURCES VIA ACOUSTIC SCENE ANALYSIS

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Advanced audio analysis. Martin Gasser

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

Sound source localization accuracy of ambisonic microphone in anechoic conditions

A DEVICE FOR AUTOMATIC SPEECH RECOGNITION*

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

New Techniques to Suppress the Sidelobes in OFDM System to Design a Successful Overlay System

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Basic Characteristics of Speech Signal Analysis

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Linguistic Phonetics. Spectral Analysis

IN recent decades following the introduction of hidden. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition

Adaptive Filters Application of Linear Prediction

Auditory modelling for speech processing in the perceptual domain

JOURNAL OF OBJECT TECHNOLOGY

Auditory Based Feature Vectors for Speech Recognition Systems

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Signal Analysis Using Autoregressive Models of Amplitude Modulation. Sriram Ganapathy

Audio and Speech Compression Using DCT and DWT Techniques

Transcription:

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to the underwater acoustic noise radiated by ships. A frequency band is specified to characterize the sound produced by different platforms. Hence the paper proposes a new technique for feature extraction by applying autocorrelation technique and discrete cosine transform. The features extracted are employed by a recognition engine that uses Gaussian mixture model for the classification of the underwater acoustic noise radiated by ships. The performance of the recognition is investigated for various settings of the proposed technique. Keywords: ships noise, underwater signals, identification 54.3 I-INCE Classification of Subjects Number(s): 1. INTRODUCTION Automatic classification of underwater acoustic signals has an increased amount of attention in the last decades. Ships radiate noise whose spectral features are related to machinery, propellers, generators, etc. These signals propagate through the underwater and form the tonal signature of the platform (1, 2). The paper aims to recognize ships according to their unique sound signature. Underwater acoustic signals are recorded, preprocessed, and analyzed. Then acoustic features are extracted, and are employed by a recognition algorithm for platform identification. The ships radiate noise in all frequency bands. The key is to develop methods for extracting features that fit better with the recognizer. Various techniques have been developed for extracting speech features, such as linear predictive coefficients (LPC), Mel-frequency cepstral coefficients (MFCC), perceptual linear predictive cepstral coefficients (PLPCC) and relative spectral perceptual linear predictive cepstral coefficients (RASTA_PLPCC) (3,4,5). Those techniques have been used for feature extraction to characterize the noise radiated by ships, and MFCC show the best performance when it is employed by Gaussian mixture model in clean environments (6, 7). The paper proposes a method to extract important features at various amplitudes and frequencies. These features form the acoustic signature of the platform. The method is based on the application of autocorrelation technique and the discrete cosine transform (DCT) to the signals produced by various platform. The objective is to specify the frequency bands and/or energy levels that provide the best performance for the classification problem. The paper is organized as follows. The next section describes the proposed method for feature extraction. Section 3 presents the database, whereas section 4 develops the experiments and discuss the results. Finally, section 5 summarizes the paper. 2. FEATURE EXTRACTION METHOD The first operation for automatic classification of underwater acoustic signals is extracting features to describe their spectral characteristics. The noise produced by various platforms is periodic, fairly continuous, and appears over a large time area. To characterize these signals, tonal signatures are extracted. These tonal signatures are full of discrete frequencies at various amplitudes. Figure 1 shows the noise radiated from a ship. 1 Noha.korany@alexu.edu.eg 7097

Figure 1 - Noise radiated from a ship over 1 s. The autocorrelation technique is a useful tool for detecting periodicity. For a discrete signal x, the autocorrelation function, R, is defined in equation (1), where N is the number of samples in x, and Mo is the number of autocorrelation points to be computed. The autocorrelation function for 1s of ship s noise is shown in figure 2. R N m 1 n 0 m 1 N x n x n m 0 m Mo (1) Figure 2 - The autocorrelation function for 1s of ship s noise The objective is to represent the signal energy at various frequencies. This is accomplished by applying DCT to the autocorrelation function. DCT has a higher degree of spectral compaction as compared to discrete Fourier transform (DFT). This is a good choice, as the signal is represented using a relatively small set of DCT coefficients that contain significant amount of energy (8). Figure 3 shows the block diagram of the proposed feature extraction method. First, the noise radiated from ships are recorded, and are stored in a digital format using analog-to-digital converter card. The next step is to compute the autocorrelation function. The autocorrelation signal is blocked into short time segments, normally from 20 30 ms, and each segment is multiplied by a window function. Hamming window is often used. Then, DCT is applied to the windowed signal, and the autocorrelation-based DCT feature vector is computed for each segment. Finally, the compression step reduces the number of coefficients within the feature vector. The reduction takes place according to the amplitude and the frequency of the coefficients, as will be discussed in section 4. The main goal of reduction is to extract the most relevant features using a minimum number of coefficients. 7098

Ship noise Preprocessing x(n) Autocorrelation R(m) Windowing Reduced dimension of autocorrelation-based DCT feature vectors Compression Autocorrelation-based DCT feature vectors DCT Figure 3 The block diagram of the proposed feature extraction method 3. DATABASE Sounds that are produced from ships are simulated using predefined characteristics like number of shafts, blades per shaft, speed, direction and distance for each ship. The database consists of three-surface and three-subsurface ships with different numbers of blades and shafts. Every ship was recorded in course 000 with four different speeds, also was recorded with four different ranges and finally was recorded with four different directions degree. Every range and direction was measured in relation to own ship with course 000. Now the data sets contain 6 x 3 x 4 = 72 audio files from six different types of simulated ships. Each file was recorded using mono-format with same microphone, same sound card, and has approximately durations of 9 seconds long. Each file was sampled at 44100 Hz; 16-bit quantization level was used. Next the data were segmented into approximately 23.2 ms frame s length, overlapped by 50% overlapped frame. A Hamming window was then applied to each frame. 4. EXPERIMENTS AND RESULTS 4.1 Experiments description In this part, the autocorrelation-based DCT feature vector is first extracted, and then it is employed by the recognition engine. Gaussian mixture model is employed for the identification problem (9). The recognition engine employs a train signal, and a test signal each of 3 s duration. Two Gaussian components are used (7). Three experiments are conducted to investigate the performance of targets identification employing various number of the autocorrelation-based DCT feature vector. The identification rate is determined for each case. The aim of the first experiment is to determine the effect of the number of the autocorrelation-based DCT coefficients on the identification rate. The second experiment aims to specify the relevant frequency band that contains the most important features for the identification problem. The dimension of the autocorrelation-based DCT feature vector is reduced according to the frequency band chosen. Octave bands are used. The feature vector is limited to those coefficients that belong to a certain octave band centered at frequency f c, whereas the remaining coefficients are discarded. The reduced feature vector is employed by the recognition engine, targets are identified and the identification rate is calculated. The recognition process is repeated for the different octave bands whose center frequencies are from 125 Hz to 4 khz. Moreover, the coefficients that belong to multiple number of frequency bands are combined, then they are employed for target identification, and the identification rate is calculated. Equation (2) relates the coefficient number, k, to its frequency, f. f s is the sampling frequency, and N is the number of samples per frame. k 2 N 1 f f s (2) The third experiment is conducted to determine if the most important DCT coefficients are those having high energy or if those coefficients with low amplitudes within a certain frequency band are the relevant features for the identification problem. The feature vector is reduced by selecting those coefficients having the highest energy within a specified range of frequencies. The reduced feature vector is employed by the recognition engine, and the identification rate is calculated. 7099

4.2 Results and discussion Figure 4 shows the results for experiment1. The number of the coefficients varies from 10 to 1024, discarding the zero-order coefficient. The maximum identification rate equals 93%, and it is reached when 512 coefficients are employed for the identification problem. Figure 4 The identification rate employing various number of autocorrelation-based DCT coefficients The results of the second experiment are presented on tables 1 and 2. Table 1 shows the identification rate when those coefficients belonging to single octave bands are employed by the recognition engine, whereas table 2 shows the identification rate when coefficients belonging to multiple bands are used for the identification problem. Table 2 shows that maximum identification rate is obtained when coefficients that belong to the frequency band centered at 1 khz are employed by the recognition engine. It is found that identification rate reaches 86% when employing 34 coefficients. Moreover, table 3 shows that combining those coefficients that belong to the frequency band centered at 500 Hz and those belonging to that band centered at 1 khz, the identification rate increases, and it reaches 90%. It is concluded that the most important features are those within the frequency band centered at 1 khz. Hence, the identification rate is calculated employing a number of the coefficients that belong to the band centered at 1 khz. The number of the coefficients varies from 15 to 30. Table 3 shows the identification rate for various number of coefficients within that frequency band. Table 3 concludes that maximum identification rate is reached when selecting the most relevant coefficients within that band. It is also concluded that the coefficients lying at the highest frequency range within the 1 khz band are the most relevant for targets identification, as the identification rate reaches 84% for high-order 15 coefficients, whereas 67% of the targets are identified when low-order 15 coefficients are employed by the recognition engine. Table 1 The identification rate versus number of coefficients within a certain octave band Center frequency of an octave band (Hz) Lower higher frequency within a band (Hz) Number of coefficients employed Identification rate (%) 125 88-177 5 15.28 250 177-354 9 23.61 500 354 707 18 62.5 1000 707 1414 34 86.11 2000 1414 2828 66 83.33 4000 2828-5657 132 81.94 7100

Table 2 The identification rate versus number of coefficients within combined octave bands Center frequency of Number of Identification rate combined octave coefficients (%) bands (Hz) employed 1000, 2000, 4000 230 87.5 500, 1000 51 90.28 250, 500, 1000 59 86.11 125, 250, 500, 1000 63 88.89 125, 500, 1000 56 87.5 Table 3 The identification rate versus various number of coefficients within the 1 khz octave band Lower higher frequency (Hz) Number of coefficients employed Identification rate (%) 707-1336 30 91.67 707-1228 25 80.56 707-1121 20 72.22 707-1013 15 66.67 797-1414 30 90.28 905-1414 25 86.11 1013-1414 20 83.33 1121-1414 15 84.72 The results for the third experiment are demonstrated on tables 4 and 5 where the identification rate for each setting of reduced feature vector is shown. Tables 4 and 5 show the identification rate for various number of coefficients within the 1 khz frequency band. For table 4 the coefficients that have the highest energy are selected, whereas the coefficients having the lowest energy are employed in table 5. Tables 4 and 5 show that maximum identification rate of 47% is reached when highest energy coefficients within the 1 khz band are selected and are employed for targets identification. Comparing table 4 to table 5, it is found that the highest energy coefficients are more significant that the lowest energy ones for the identification problem. Comparing table 3 to table 4, it is shown that the identification rate reaches 92% for 30 consecutive coefficients within the 1 khz band, whereas the rate decreases to 47% for 30 coefficients that having the highest energy within the same band. Then, important features are found at low amplitudes within the frequency band. It is concluded that the spectral distribution of the feature vector affects the identification rate more than the energy level of discrete frequencies. Table 4 The identification rate versus various number of coefficients within the 1 khz octave band, coefficients with highest energy are selected Number of coefficients employed Identification rate (%) 30 47.22 25 29.17 20 26.39 15 19.44 7101

Table 5 The identification rate versus various number of coefficients within the 1 khz octave band, coefficients with lowest energy are selected Number of coefficients employed Identification rate (%) 30 33.33 25 26.4 20 20.83 15 13.89 5. CONCLUSIONS The paper proposes the extraction of the autocorrelation-based DCT coefficients to characterize the noise radiated by ships. Those coefficients are employed by the Gaussian mixture model to identify the targets. The performance of the recognition system is investigated. The number of coefficients employed by the recognition engine affects the identification rate. Coefficients at various amplitudes and frequencies are selected and are employed for the identification problem. The goal is to find those coefficients that fit better with the recognizer. It is concluded that the most important features are those within the frequency band centered at 1 khz, and they yield to the highest identification rate. Moreover, relevant features are found at low amplitudes within that frequency band, and it is concluded that the spectral distribution of the feature vector affects the identification rate more than the energy level of discrete frequencies. High identification rate is obtained when consecutive coefficients within the 1 khz band are employed by the recognition engine. On the other hand this rate decreases significantly when employing the highest energy coefficients within the same band. ACKNOWLEDGEMENTS The author would like to acknowledge gratefully Eng. Mohammed abd Elzaher, and Dr. Hatem Khater for the construction of database. REFERENCES 1. McKenna M. F., Ross D., Wiggins S.M., Hildebrand J. A. Underwater radiated noise from modern commercial ships. J Acoust Soc Am. 2012; 131(1):92-103. 2. Wang L.S., Robinson S.P., Theobald P., Lepper P.A., Hayman G., Humphrey V.F. Measurements of radiated ship noise. Proceedings of meetings on acoustics; 2-6 July 2012; Edinburgh, Scottland 2012. p. 1-10. 3. Davis S., Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust, speech, signal processing 1980; 28(4):357-366. 4. Hermansky H. Perceptual Linear Predictive (PLP) analysis of speech. J Acoust Soc Am. 1990; 87(4):1738-1752. 5. Hermansky H., Morgan N. RASTA processing of speech. IEEE Trans. speech and audio processing 1994; 2:587-589. 6. Korany N., Abd Elzaher M., Khater H. Classification of underwater acoustic signals using various extraction methods. Fortschritte der Akustik Deutsche Gesellschaft fuer Akustik DAGA 2012; March 2012; Darmstadt, Germany 2012. p 655-654. 7. Korany N., Abd Elzaher M., Khater H. Investigation about the performance of GMM for recognition of underwater acoustic signals. Fortschritte der Akustik Deutsche Gesellschaft fuer Akustik DAGA 2012; March 2012; Darmstadt, Germany 2012. p 653-656. 8. Wihelm B., Burge M.J. Digital image processing: an algorithmic introduction using JAVA. Springer London; 2008. 9. Reynolds D.A., Rose R.C. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. speech and audio processing 1995; 3(1):72-83. 7102