High capacity robust audio watermarking scheme based on DWT transform Davod Zangene * (Sama technical and vocational training college, Islamic Azad University, Mahshahr Branch, Mahshahr, Iran) davodzangene@mail.com Ali Nayeri Rad (Sama technical and vocational training college, Islamic Azad University, Ahvaz Branch, Ahvaz, Iran) Abstract This paper proposed an audio digital watermarking algorithm based on the Discrete wavelet transform (DWT) by combining with human auditory model and using the masking effect of human ears. This algorithm is realized to embed a binary image watermark into the audio signal and improved the imperceptibility of watermarks. Experimental results show that this algorithm has a better robustness against common signal processing such as noise, filtering, resampling and lossy compression. Keywords: Audio Watermarking, wavelet transform, high capacity. 1. Introduction The recent growth in computer networks, and more specifically, the World Wide Web, copyright protection of digital audio becomes more and more important. Digital audio watermarking has drawn extensive attention for copyright protection of audio data. A digital audio watermarking is a process of embedding watermarks into audio signal to show authenticity and ownership. Audio watermarking should meet the following requirements: (a)imperceptibility: the digital watermark should not affect the quality of original audio signal after it is watermarked; (b) Robustness: the embedded watermark data should not be removed or eliminated by unauthorized distributors using common signal processing operations and attacks; (c) Capacity: capacity refers to the numbers of bits that can be embedded into the audio signal within a unit of time. All these requirements are often contradictory with each other. Furthermore, there are two kinds of detections in watermarking system. Blind detection which has no knowledge of the original signal and informed (non-blind) detection, which uses the original signal for extracting the watermark signal. Blind detection is beneficial when the original signal is not easily accessible. According to the implementation process of audio watermarking algorithms, they can be divided into the time domain methods and transform domain methods. In time domain schemes, the hidden bits are embedded directly into the time signal samples. Time domain watermarking systems are usually weaker against signalprocessing attacks compared to the transform domain counterparts. From the view of the performance of watermarks against attacks, the performance of the transform domain methods are commonly considered better than that of the time domain methods [1]. Phase modulation [] and echo hiding [3] are well known methods in the time domain. In frequency domain watermarking, after taking one of the usual transforms such as Discrete Fourier Transform (DFT), the Discrete Cosine Transform (DCT),the Modified Discrete Cosine Transform (MDCT) or the Wavelet Transform (WT) [4,, 6, 7,8,9] from the signal, the hidden bits are embedded into the resulting transform coefficients. In this paper, we discuss the audio watermark embedding based on the human auditory characters using the wavelet transform. The wavelet transform has many advantages in audio signal processing. Its inherent frequency multi-resolution and logarithmic decomposition of frequency bands resembles the human perception of frequencies, since it provides the decomposition to mimic the critical band structure of the HAS. In the proposed scheme, the last low frequency band of the 4th level wavelet decomposition (CA4), is used for embedding. This band of wavelet samples is divided into frames and then, the average of the absolute values of each frame s samples is computed [7]. After that, in the embedding process, all wavelet coefficients are scanned and if each coefficient satisfies a given condition then the corresponding 14
secret bit is embedded into it. The corresponding two secret bits is embedded into a single wavelet coefficient and the two next secret bits is embedded into the next suitable coefficient. The experimental results show that this scheme has an excellent capacity with transparency and robustness against common signal processing attacks is achieved. In section two, we give a brief description of the discrete wavelets transform. In section three, we describe in details the watermarking embedding and extraction procedures of the proposed algorithm. In section four, we evaluate the performance of the algorithm and present simulation results with respect to inaudibility and robustness. We conclude in section five with some remarks.. The Discrete Wavelets Transform Suppose an original audio signal is f(n) and the filters corresponding to the scale function and wavelet function are the H and G respectively. The discrete wavelet transform (DWT) of audio signal f(n) is showed as Fig.. CA is the approximate components sub-band which mainly represent the low frequency components of the audio signal, and CD is the detail components sub-band which mainly represent the high frequency components of the audio signal. If continued to decompose the approximate components CA with J levels, we can obtain the wavelet decomposing components in different decomposing levels. This decomposing process is called the wavelet multi-resolution analysis with J levels to the audio signal f (n). The detail components and the approximate component obtained are respectively CDi and CAJ, where the CDi (i = 1,, L, J) represents the i- th level detail component. This multi-resolution decomposing process is approximate to the human auditory system. Depending on the application and the length of the signal, the low frequencies part might be further decomposed into two parts of high and low frequencies. Figure 1 shows a 4-level DWT decomposition of signal f(n). The original signal f(n) can be reconstructed using the inverse DWT process. Fig 1: Four-level DWT decomposition Since the hearing of human ears is not much sensible to the minute change of the wavelet high frequency components and the coefficients of the high frequency component are smaller, so we can embed the watermarks into the high frequency components sub-band to realize the inaudibility of watermarks effectively. But the high frequency components are easily to be destroyed or removed by some kind of common signal processing, so that the robustness of the watermarks can t be ensured. While the low frequency components are the main components of the signal, because in the high frequency sub-band, the coefficients of the high frequency components are bigger and they carry most of the energy of the audio signal. So selecting the low frequency components sub-band to embed the watermarks can obtain an excellent robustness. Although the wavelet approximate components are affected little by the outer environment and their stabilities are good, selecting this area to embed the watermarks can enhance the robustness of the watermarks, but at the same time they carry the main energy of the audio signal and are the main components of the signal. So the change to the approximate components can easily destroy the quality of the original audio signal. Due to its excellent spatioe-frequency localization properties, the DWT is very suitable to identify areas in an audio signal where a watermark can be embedded effectively. Many DWT-based audio watermarking techniques can be found in literature [4,, 6, 7, 8, 9]. 1
3. Proposed scheme The algorithm we propose here is based on applying the DWT on the digital audio signal in which a watermark should be embedded. The watermark embedding method is performed in the wavelet domain. A binary image is embedded into the significant coefficients CA4 selective from detail coefficients. The algorithm consists of two procedures; watermarking embedding procedure and watermarking extraction procedure. 3.1. Watermark Embedding Procedure The embedding procedure performs three major operations; Watermark pre-processing DWT-based frequency decomposition of the audio signal Watermark embedding in the DWT-transformed audio signal. The operations are described in the following steps: 1. Convert the two-dimensional image matrix Img into a one-dimensional vector W of length M1M. W = {w(i) = Img(k,j),i = k M + j, k M1, j M} (1). The audio signal was decomposed using wavelet with four levels. 3. Divide the CA4 samples into frames of a given length and, based on the average of the absolute values of each frame s samples, compute the average mi for each frame. where (cj) are the wavelet coefficients of the high frequency sub-band (cd4), (s) is the frame size and (mi) is the average of the i-th frame. 4. The marked wavelet coefficients (Cj') are achieved by using equation (3). cj' = - mi - 3* mi/ 3* mi/ mi cj (k) Is the embedding interval (k > ). m i 1 s is j i s 1) ( 1 cj / mi > k () w(l) = w(l) = w(l) = 1 w(l) = 1 w(l+1) = w(l+1) = 1 w(l+1) = w(l+1) = 1 Two secret bit is embedded in a single suitable coefficient and thus, after embedding the bit, the index 1, incremented two unit and the two next secret bits is embedded in the next suitable coefficient. By increasing k, the interval is extended in such a way that the number of modified coefficients which satisfy the condition c j / m i < k is increased and, thus, capacity and distortion also become greater. To manage robustness and transparency, we use a scale factor, α, which defines strength of watermark. 4. Finally, the inverse DWT is applied to the modified wavelet coefficients to get the marked audio signal. The basic procedure of embedding watermark is shown in Figure. c j )3( 16
Fig : The watermark embedding procedure. 3.. Watermark Extraction Procedure The watermark detection is performed by using the DWT transform and the embedding parameters. Since the host audio signal is not required in the detection process, the detector is blind. The detection process can be summarized in the following steps: 1. The watermarked audio was decomposed using wavelet with 4 levels.. Divide the CA4 samples into frames of a given length and, based on the average of the absolute values of each frame s samples, compute the average m'i for each frame. is 1 m ' c ' i s j ( i 1) s 1 j (4) 3. The secret bit stream is achieved by using equation (). -((k+α) / 6) c'j / m'i< (-(k+α) / ), w'(l) =, w'(l) = -((k+α) / 6) c'j / m'i < ۰, w'(l) =, w'(l) = 1 W'l = () ۰ c'j / m'i< ((k+α) / 6), w'(l) = 1, w'(l) = ((k+α) / 6) c'j / m'i < ( (k+α) /), w'(l) = 1, w'(l) = 1 Where c'j is the sample of the high frequency band of the 4th level wavelet decomposition (CA4) of the marked signal, α is the strength of watermark and w'l is the l-th bit of the extracted secret stream.the suggested algorithm is blind, since the original signal values are not required in the receiver. 4. Experiment Results and Analysis In order to evaluate the imperceptibility and robustness of the audio proposed embedding algorithm, we have done a lot of the simulation experiments on the computer, some results are given as follows. Figure 6(a) gave the original audio and watermarks. The original audio is a segment of the song music whose length is 1 seconds, single-channel, the sample rate = 44.1 f s KHz and the resolution is 16 bits. The original watermark is a grayscale image whose size was 64 64.The wavelet is the Db-8, and the level of the wavelet decomposition J = 4. In order to remove the effects of the subject factor, we adopted the SNR and the normalized correlation (NC) to measure the performance of the embedding algorithm in this paper. Signal to Noise Ratio (SNR) is a statistical difference metric which is used to measure the similitude between the undistorted original audio signal and the distorted watermarked audio signal. The SNR computation is done according to Equation (6), where x (n) corresponds to the original signal, and x'(n) corresponds to the watermarked signal. SNR 1log N 1 x (n) n 1 N 1 ^ n [ x (n)-x '(n)] 17
After embedding watermark, the SNR of all selected audio signals using the proposed method are above 6dB [8] which ensures the imperceptibility of our proposed system. The normalized correlation (NC)are defined as follow: NC M lenght M width i 1 j1 M M M i j M lenght width i 1 j1 ' (, ) ( i, j ) M ( i, j ) where M (i, j) is the watermark of the original binary image, M' (i, j) is the watermark of the recovery binary image, N1 N is the size of the binary image watermark. Table 1 shows the perceptual distortion and the payload obtained for the three songs with BER equal to zero (or near zero) under the attacks detailed in Table,for k = 4, α =1.1 and the frame size equal to 1. TABLE I RESULTS OF 3 SIGNALS (ROBUST AGAINST TABLE ATTACKS) Audio File Time (m:sec) SNR (db) Payload (bps) Beginning of the End 3:16 7.1 489 Citizen,Go Back tosleep 1:7.3 489 Loopy Music 1.7 489 Average 1:47 6.3 489 Note that all the results have an average SNR is 6dB and capacity is around 14 bps for all the experiments. The proposed method is thus able to provide large capacity and robustness whilst keeping imperceptibility. Table illustrates the effect of various attacks provided in the Stirmark Benchmark for Audio v1. on audio signal Loopymusic. TABLE ROBUSTNESS TEST RESULTS FOR FIVE SELECTED FILES AND COMPARISON WITH SCHEMES IN THIS Attack NC propose d [7] [1 ] BER % [9 ] [11 ] [1 ] Addnoise1.883 3. 1-4 -1 Add FFT noise1.8848 3 1-1- FFTHLPassQuick.99. 9-1-4 MP3 18.8763 7.1-9 1.3 7-18
Resampling 44//44 Echo.998.914. 8 1.8 38-47 -3 1 63 RCHighPass 1 to k.9998-1 -1 RCLowPass to k.781 9.1-4 7.1 A few attacks such as FFTStat1and echo in Table remove the hidden data (BER > 1%). Using frames of wavelet samples results in an increased robustness against attacks, since the average of the samples is more robust than the value of each sample. Thus, by increasing the frame size, better robustness can be achieved. However, by increasing the frame size, we enforce the same value for a greater number of samples, which decreases the audio quality and transparency. The method proposed in this paper has been compared with several recent audio watermarking strategies. Almost all the audio data hiding schemes which produce very high capacity are fragile against signal processing attacks. Because of this, it is not possible to establish a fair comparison of the proposed scheme with some fragile audio watermarking schemes which are similar to it as capacity is concerned. Hence, we have chosen a few recent and relevant robust audio watermarking schemes in the liter.. Conclusion In this paper, a high-capacity watermarking algorithm for digital audio,which is robust against common audio signal processing attacks and the Stirmark Benchmark for audio,is presented. A scaling factor, the frame size and the selected frequency band are the three adjustable parameters of this method which regulate the capacity, the perceptual distortion and the robustness trade-off of the scheme accurately. Furthermore, the suggested scheme is blind, since it does not need the original signal for extracting the hidden bits. The experimental results show that this scheme has an excellent capacity (about 4 kbps) with SNR (about 6 db) and provides robustness against common signal processing attacks such as highpass, lowpass, resampling, requantization. A comparison with other schemes in the audio watermarking literature is also provided, illustrates that the suggested scheme outperforms the robustness of other approaches while keeping transparency and capacity in the acceptable ranges. Acknowledgements Funding support of this research was provided by the, Sama technical and vocational training college, Islamic Azad University, Mahshahr Branch, Mahshahr, Iran. References [1] J. Cox, J. Kilian, and T. Shamoon, Secure Spread Spectrum Watermarking for Image, Audio and Video, IEEE trans. on Image Processing, vol.6, pp.1673-1687, 1997. [] W. N. Lie, L. C. Chang, Multiple Watermarks for Stereo Audio Signals Using Phase-Modulation Techniques, IEEE Trans.Signal Processing, Vol. 3, No., pp. 86 81, [3] H. J. Kim, Y. H. Choi, A novel echo hiding scheme with back ward and forward kernels, IEEE Trans. Circuit and Systems, pp. 88-889, Aug. 3. 19
[4] S. Wu, J. Huang, D. Huang, Y. Q. Shi, Efficiently Self-Synchronized Audio Watermarking for Assured Audio Data Transmission, IEEE Trans. Broadcasting, Vol. 1, No. 1, pp. 69-76, Mar.. [] Z.Xu, W.Wang, Digital Audio Watermarking Algorithm Based on Quantizing Coefficients, International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 6), Pasadena, CA USA, pp. 41-46, 6. [6] M.A.Akhaee, S.GhaemMaghami, and N.Khademi, A Novel Technique for Audio Signals Watermarking in the Wavelet and Walsh Transform Domains, IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Tottori, Japan, p. 171-174,6. [7] M.Fallahpour, and D.Megias, High capacity audio watermarking using the high frequency band of the wavelet domain, Multimedia tools and Applications, 1. [8] N.Khademi Kalantari, S.M.Ahadi, A Robust Audio Watermarking Scheme Using Mean Quantization in the Wavelet Transform Domain, Submitted to IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Cairo, Egypt,7. [9] M.Pooyan, A.Delforouzi, Adaptive and robust audio watermarking in wavelet domain, Third International Conference on International Information Hiding and Multimedia Signal Processing, V, Pages 78-9, 7. [1] J.J.Garcia-Hernandez, M.Nakano-Miyatake, and H.Perez-Meana, Data hiding in audio signal using Rational Dither Modulation, IEICE Electron. Express, Vol., No. 7, pp.17-, 8. [11] M.Fallahpour, and D.Megias, High capacity audio watermarking using FFT amplitude interpolation, IEICE Electronics Express, vol.6, no.14, pp. 17-163,9. [1] W.Lanxun, Y.Chao, P.Jiao, An audio watermark embedding algorithm based on mean-quantization in