Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to the underwater acoustic noise radiated by ships. A frequency band is specified to characterize the sound produced by different platforms. Hence the paper proposes a new technique for feature extraction by applying autocorrelation technique and discrete cosine transform. The features extracted are employed by a recognition engine that uses Gaussian mixture model for the classification of the underwater acoustic noise radiated by ships. The performance of the recognition is investigated for various settings of the proposed technique. Keywords: ships noise, underwater signals, identification 54.3 I-INCE Classification of Subjects Number(s): 1. INTRODUCTION Automatic classification of underwater acoustic signals has an increased amount of attention in the last decades. Ships radiate noise whose spectral features are related to machinery, propellers, generators, etc. These signals propagate through the underwater and form the tonal signature of the platform (1, 2). The paper aims to recognize ships according to their unique sound signature. Underwater acoustic signals are recorded, preprocessed, and analyzed. Then acoustic features are extracted, and are employed by a recognition algorithm for platform identification. The ships radiate noise in all frequency bands. The key is to develop methods for extracting features that fit better with the recognizer. Various techniques have been developed for extracting speech features, such as linear predictive coefficients (LPC), Mel-frequency cepstral coefficients (MFCC), perceptual linear predictive cepstral coefficients (PLPCC) and relative spectral perceptual linear predictive cepstral coefficients (RASTA_PLPCC) (3,4,5). Those techniques have been used for feature extraction to characterize the noise radiated by ships, and MFCC show the best performance when it is employed by Gaussian mixture model in clean environments (6, 7). The paper proposes a method to extract important features at various amplitudes and frequencies. These features form the acoustic signature of the platform. The method is based on the application of autocorrelation technique and the discrete cosine transform (DCT) to the signals produced by various platform. The objective is to specify the frequency bands and/or energy levels that provide the best performance for the classification problem. The paper is organized as follows. The next section describes the proposed method for feature extraction. Section 3 presents the database, whereas section 4 develops the experiments and discuss the results. Finally, section 5 summarizes the paper. 2. FEATURE EXTRACTION METHOD The first operation for automatic classification of underwater acoustic signals is extracting features to describe their spectral characteristics. The noise produced by various platforms is periodic, fairly continuous, and appears over a large time area. To characterize these signals, tonal signatures are extracted. These tonal signatures are full of discrete frequencies at various amplitudes. Figure 1 shows the noise radiated from a ship. 1 Noha.korany@alexu.edu.eg 7097
Figure 1 - Noise radiated from a ship over 1 s. The autocorrelation technique is a useful tool for detecting periodicity. For a discrete signal x, the autocorrelation function, R, is defined in equation (1), where N is the number of samples in x, and Mo is the number of autocorrelation points to be computed. The autocorrelation function for 1s of ship s noise is shown in figure 2. R N m 1 n 0 m 1 N x n x n m 0 m Mo (1) Figure 2 - The autocorrelation function for 1s of ship s noise The objective is to represent the signal energy at various frequencies. This is accomplished by applying DCT to the autocorrelation function. DCT has a higher degree of spectral compaction as compared to discrete Fourier transform (DFT). This is a good choice, as the signal is represented using a relatively small set of DCT coefficients that contain significant amount of energy (8). Figure 3 shows the block diagram of the proposed feature extraction method. First, the noise radiated from ships are recorded, and are stored in a digital format using analog-to-digital converter card. The next step is to compute the autocorrelation function. The autocorrelation signal is blocked into short time segments, normally from 20 30 ms, and each segment is multiplied by a window function. Hamming window is often used. Then, DCT is applied to the windowed signal, and the autocorrelation-based DCT feature vector is computed for each segment. Finally, the compression step reduces the number of coefficients within the feature vector. The reduction takes place according to the amplitude and the frequency of the coefficients, as will be discussed in section 4. The main goal of reduction is to extract the most relevant features using a minimum number of coefficients. 7098
Ship noise Preprocessing x(n) Autocorrelation R(m) Windowing Reduced dimension of autocorrelation-based DCT feature vectors Compression Autocorrelation-based DCT feature vectors DCT Figure 3 The block diagram of the proposed feature extraction method 3. DATABASE Sounds that are produced from ships are simulated using predefined characteristics like number of shafts, blades per shaft, speed, direction and distance for each ship. The database consists of three-surface and three-subsurface ships with different numbers of blades and shafts. Every ship was recorded in course 000 with four different speeds, also was recorded with four different ranges and finally was recorded with four different directions degree. Every range and direction was measured in relation to own ship with course 000. Now the data sets contain 6 x 3 x 4 = 72 audio files from six different types of simulated ships. Each file was recorded using mono-format with same microphone, same sound card, and has approximately durations of 9 seconds long. Each file was sampled at 44100 Hz; 16-bit quantization level was used. Next the data were segmented into approximately 23.2 ms frame s length, overlapped by 50% overlapped frame. A Hamming window was then applied to each frame. 4. EXPERIMENTS AND RESULTS 4.1 Experiments description In this part, the autocorrelation-based DCT feature vector is first extracted, and then it is employed by the recognition engine. Gaussian mixture model is employed for the identification problem (9). The recognition engine employs a train signal, and a test signal each of 3 s duration. Two Gaussian components are used (7). Three experiments are conducted to investigate the performance of targets identification employing various number of the autocorrelation-based DCT feature vector. The identification rate is determined for each case. The aim of the first experiment is to determine the effect of the number of the autocorrelation-based DCT coefficients on the identification rate. The second experiment aims to specify the relevant frequency band that contains the most important features for the identification problem. The dimension of the autocorrelation-based DCT feature vector is reduced according to the frequency band chosen. Octave bands are used. The feature vector is limited to those coefficients that belong to a certain octave band centered at frequency f c, whereas the remaining coefficients are discarded. The reduced feature vector is employed by the recognition engine, targets are identified and the identification rate is calculated. The recognition process is repeated for the different octave bands whose center frequencies are from 125 Hz to 4 khz. Moreover, the coefficients that belong to multiple number of frequency bands are combined, then they are employed for target identification, and the identification rate is calculated. Equation (2) relates the coefficient number, k, to its frequency, f. f s is the sampling frequency, and N is the number of samples per frame. k 2 N 1 f f s (2) The third experiment is conducted to determine if the most important DCT coefficients are those having high energy or if those coefficients with low amplitudes within a certain frequency band are the relevant features for the identification problem. The feature vector is reduced by selecting those coefficients having the highest energy within a specified range of frequencies. The reduced feature vector is employed by the recognition engine, and the identification rate is calculated. 7099
4.2 Results and discussion Figure 4 shows the results for experiment1. The number of the coefficients varies from 10 to 1024, discarding the zero-order coefficient. The maximum identification rate equals 93%, and it is reached when 512 coefficients are employed for the identification problem. Figure 4 The identification rate employing various number of autocorrelation-based DCT coefficients The results of the second experiment are presented on tables 1 and 2. Table 1 shows the identification rate when those coefficients belonging to single octave bands are employed by the recognition engine, whereas table 2 shows the identification rate when coefficients belonging to multiple bands are used for the identification problem. Table 2 shows that maximum identification rate is obtained when coefficients that belong to the frequency band centered at 1 khz are employed by the recognition engine. It is found that identification rate reaches 86% when employing 34 coefficients. Moreover, table 3 shows that combining those coefficients that belong to the frequency band centered at 500 Hz and those belonging to that band centered at 1 khz, the identification rate increases, and it reaches 90%. It is concluded that the most important features are those within the frequency band centered at 1 khz. Hence, the identification rate is calculated employing a number of the coefficients that belong to the band centered at 1 khz. The number of the coefficients varies from 15 to 30. Table 3 shows the identification rate for various number of coefficients within that frequency band. Table 3 concludes that maximum identification rate is reached when selecting the most relevant coefficients within that band. It is also concluded that the coefficients lying at the highest frequency range within the 1 khz band are the most relevant for targets identification, as the identification rate reaches 84% for high-order 15 coefficients, whereas 67% of the targets are identified when low-order 15 coefficients are employed by the recognition engine. Table 1 The identification rate versus number of coefficients within a certain octave band Center frequency of an octave band (Hz) Lower higher frequency within a band (Hz) Number of coefficients employed Identification rate (%) 125 88-177 5 15.28 250 177-354 9 23.61 500 354 707 18 62.5 1000 707 1414 34 86.11 2000 1414 2828 66 83.33 4000 2828-5657 132 81.94 7100
Table 2 The identification rate versus number of coefficients within combined octave bands Center frequency of Number of Identification rate combined octave coefficients (%) bands (Hz) employed 1000, 2000, 4000 230 87.5 500, 1000 51 90.28 250, 500, 1000 59 86.11 125, 250, 500, 1000 63 88.89 125, 500, 1000 56 87.5 Table 3 The identification rate versus various number of coefficients within the 1 khz octave band Lower higher frequency (Hz) Number of coefficients employed Identification rate (%) 707-1336 30 91.67 707-1228 25 80.56 707-1121 20 72.22 707-1013 15 66.67 797-1414 30 90.28 905-1414 25 86.11 1013-1414 20 83.33 1121-1414 15 84.72 The results for the third experiment are demonstrated on tables 4 and 5 where the identification rate for each setting of reduced feature vector is shown. Tables 4 and 5 show the identification rate for various number of coefficients within the 1 khz frequency band. For table 4 the coefficients that have the highest energy are selected, whereas the coefficients having the lowest energy are employed in table 5. Tables 4 and 5 show that maximum identification rate of 47% is reached when highest energy coefficients within the 1 khz band are selected and are employed for targets identification. Comparing table 4 to table 5, it is found that the highest energy coefficients are more significant that the lowest energy ones for the identification problem. Comparing table 3 to table 4, it is shown that the identification rate reaches 92% for 30 consecutive coefficients within the 1 khz band, whereas the rate decreases to 47% for 30 coefficients that having the highest energy within the same band. Then, important features are found at low amplitudes within the frequency band. It is concluded that the spectral distribution of the feature vector affects the identification rate more than the energy level of discrete frequencies. Table 4 The identification rate versus various number of coefficients within the 1 khz octave band, coefficients with highest energy are selected Number of coefficients employed Identification rate (%) 30 47.22 25 29.17 20 26.39 15 19.44 7101
Table 5 The identification rate versus various number of coefficients within the 1 khz octave band, coefficients with lowest energy are selected Number of coefficients employed Identification rate (%) 30 33.33 25 26.4 20 20.83 15 13.89 5. CONCLUSIONS The paper proposes the extraction of the autocorrelation-based DCT coefficients to characterize the noise radiated by ships. Those coefficients are employed by the Gaussian mixture model to identify the targets. The performance of the recognition system is investigated. The number of coefficients employed by the recognition engine affects the identification rate. Coefficients at various amplitudes and frequencies are selected and are employed for the identification problem. The goal is to find those coefficients that fit better with the recognizer. It is concluded that the most important features are those within the frequency band centered at 1 khz, and they yield to the highest identification rate. Moreover, relevant features are found at low amplitudes within that frequency band, and it is concluded that the spectral distribution of the feature vector affects the identification rate more than the energy level of discrete frequencies. High identification rate is obtained when consecutive coefficients within the 1 khz band are employed by the recognition engine. On the other hand this rate decreases significantly when employing the highest energy coefficients within the same band. ACKNOWLEDGEMENTS The author would like to acknowledge gratefully Eng. Mohammed abd Elzaher, and Dr. Hatem Khater for the construction of database. REFERENCES 1. McKenna M. F., Ross D., Wiggins S.M., Hildebrand J. A. Underwater radiated noise from modern commercial ships. J Acoust Soc Am. 2012; 131(1):92-103. 2. Wang L.S., Robinson S.P., Theobald P., Lepper P.A., Hayman G., Humphrey V.F. Measurements of radiated ship noise. Proceedings of meetings on acoustics; 2-6 July 2012; Edinburgh, Scottland 2012. p. 1-10. 3. Davis S., Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust, speech, signal processing 1980; 28(4):357-366. 4. Hermansky H. Perceptual Linear Predictive (PLP) analysis of speech. J Acoust Soc Am. 1990; 87(4):1738-1752. 5. Hermansky H., Morgan N. RASTA processing of speech. IEEE Trans. speech and audio processing 1994; 2:587-589. 6. Korany N., Abd Elzaher M., Khater H. Classification of underwater acoustic signals using various extraction methods. Fortschritte der Akustik Deutsche Gesellschaft fuer Akustik DAGA 2012; March 2012; Darmstadt, Germany 2012. p 655-654. 7. Korany N., Abd Elzaher M., Khater H. Investigation about the performance of GMM for recognition of underwater acoustic signals. Fortschritte der Akustik Deutsche Gesellschaft fuer Akustik DAGA 2012; March 2012; Darmstadt, Germany 2012. p 653-656. 8. Wihelm B., Burge M.J. Digital image processing: an algorithmic introduction using JAVA. Springer London; 2008. 9. Reynolds D.A., Rose R.C. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. speech and audio processing 1995; 3(1):72-83. 7102