Adaptive Selection of Embedding. Spread Spectrum Watermarking of Compressed Audio

Adaptive Selection of Embedding Locations for Spread Spectrum Watermarking of Compressed Audio Alper Koz and Claude Delpha Laboratory Signals and Systems Univ. Paris Sud-CNRS-SUPELEC SUPELEC

Outline Introduction : Audio Watermarking overview Spread Spectrum (SS) Watermarking of Audio Content Existing work Selection of Watermarking Locations in the Literature Our Proposal Adaptive Selection of Watermarking Locations Results and Discussion Testing the Invariance of Embedding Locations Comparison of the Adaptive Selection to Fixed Selection Imperceptibility Tests 2 Robustness Tests Conclusions

Audio Watermarking Overview of the Existing Methods Audio watermarking a Time Domain Methods Frequency Domain Methods Additive Delay based Energy based Spread Spectrum QIM based Embedding Embedding Embedding based Methods Methods [Swanson 98] [Lemma 2003] [Xu 2001] [Foote 2001] [Lie 2001] [Neubaer 98] [Neubaer 2000] [Kirovski 2003] [Li 2006] [Wang 2006] [Zeng 2008] [Baum 2011] Main trend is on Spread Spectrum and QIM watermarking as in image and video.

Audio Watermarking Spread Spectrum Watermarking HAS Model H i (k) x i (n) Frequency Transform X i (k) Fixed Selection X i (1).. X i (M) Watermark Embedding: X w i(k) = Xi(k) + br Hi(k) W(M(i-1)+k) k=1..m ; i = 1..N Q C Coded Stream W(.) PN Sequence Generator b (Message bit) Watermark Embedding procedure Key Coded X * X * i(1).. X * Watermark Detection: i(k) i(m) N 16-1 -1 Stream C Q Fixed Selection * X i ( k) W( M( i 1) k) Sign(.) i 1 k 1 b Watermark Detection procedure Key W(.) PN Sequence Generator

Audio Watermarking Selection of Embedding Locations Method Utilized Watermarked Frequency Payload Transform coefficients Range (~ Hz) (bits/sec) Li et al. 2006 Kirovski et al. 2003 1024 point int-mdct 2048 point MCLT 1-40 0-1700 2-13 bits 8-83 200-2000 0.5-1 bit Works well for the low frequency content. But not suitable for high frequency audio )g files or audio frames. Signal Energy) 12 10 8 6 4 2 0 0 02 0.2 04 0.4 06 0.6 08 0.8 1 12 1.2 14 1.4 16 1.6 18 1.8 2 22 2.2 Frequency(Hz) Audio Track: Sugarland - Stay x 10 4

Proposed Adaptive Selection: General Scheme HAS Model H i (k) x i (n) Frequency Transform X i (k) Fixed Selection X i (1).. X i (M) Watermark Embedding: X w i (k) = X i (k) + b r H i (k) W(M(i 1)+k) k=1..m ; i = 1..N Q C Coded Stream Watermark Embedding procedure Adaptive Selection X i (a 1 ).. X i (a M ) Key W(.) PN Sequence Generator b (Message bit) Coded Stream X * Watermark Detection: i(k) X * i(1).. X * C 1 Q 1 i(m) N 16 * Fixed Selection X i( k) W( M( i 1) k) Sign(.) b i 1 k 1 W atermark Detection procedure Adaptive Selection X * i(a 1 ).. X * i(a M ) Key W(.) PN Sequence Generator Problem Statement : Determine the same M spectrum coefficients i both during the embedding and detection ti after the audio coding, while still watermarking (protecting) the significant part of the content.

Proposed Adaptive Selection: Initial Approach Use the highest M spectrum coefficients of each audio frame for watermarking. Bitrate (kbps) 128 96 64 32 Ratio (%) 44.8 39.2 34.8 29.2 Ratio of the correctly found coefficients among M highest coefficients after bit rate coding (File: Madonna Music, M=16) The order of the coefficients in such an approach is sensitive to bitrate coding => An error in the location of one coefficient in the order changes all the locations after that coefficient.

Proposed Adaptive Selection: Grouping Approach Strategy: Select the M coefficients not individually but in groups to provide robustness against coding. F ( jw ) 2 100 80 60 Magnitude (db) 40 20 0-20 -40 0 50 100 150 200 250 300 350 400 450 500 Frequency w S G Algorithm: For each frame of the audio file, group each G number of consecutive coefficients i with a shift of S between the groups. Calculate the signal energy for each group and order the groups with respect to their energy. Select (M/G) number of groups with highest energy for watermarking.

Testing the Invariance of Selected Coefficients: Dataset Selected mostly from Pop music tracks which dominates exchange networks. Name 1. Madonna Music 2. L. Fabian I am who I am 3.Ace of Base Unspeakable 4. M. Jackson Speechless 5. Sugarland Stay 6. C. Aguleria Hurt 7. Dj mhd Ya Zina Club 8. Greg Cerrone Pilling me 9. Gwen Stefani The sweet escape 10. The police Roxanne Duration (min:sec) Smp. Rate (KHz) Bitrate (kbps) # of Frames 3:47 44.1 128 26070 3:47 44.1 96 26070 3:14 22.05 192 11140 3:18 44.1 128 22739 4:43 44.1 224 32501 4:03 44.1 181 27907 4:30 44.1 256 31008 5:51 44.1 320 40310 4:06 44.1 128 28252 3:12 44.1 192 22050

Testing the Invariance of Selected Coefficients: Results for Grouping Size (G) Compare the Average Detection rate of Embedding Locations (%) vs. Grouping size (G). Compare the Average Signal Energy in Embedding Locations (%) vs. Grouping size (G). 100 Avg. Detection Rate and Avg. Signal Energy vs. Grouping Number (G) 90 80 M=16, S=G, Bitrate: 64 kbps Percentage (% %) 70 60 Average Detection Rate (%) Average Signal Energy (%) 50 40 30 0 2 4 6 8 10 12 14 16 Grouping Number (G) Very rapid increases in detection rates for increasing values of G. AfterG=8, Detection Rate > 90 % (with a saturation behavior). Trade-off: Decreased signal energy for higher values of G. However, more than 76 % signal energies for the worst case, (e.g. G=16).

Testing the Invariance of Selected Coefficients: Results for Shift Size (S) Compare the Average Detection rate of Embedding Locations (%) vs. Shift size (S). Compare the Average Signal Energy in Embedding Locations (%) vs. Shift size (S) 100 Avg. Detection Rate and Avg. Signal Energy vs. Shift(S) 95 M=16, G=16, Bitrate: 64 kbps Percentage (% %) 90 85 Average Detection Rate (%) Average Signal Energy (%) 80 75 4 6 8 10 12 14 16 Shift (S) during the grouping For larger values of S, the distance between the consecutive groups are higher. => The probability of detecting the neighbor group instead of the original is lower. Trade-off: Decreased signal energy for higher values of S due to the lower # of groups. However, the difference between the min. and max. signal energy is only 2 %.

Testing the Invariance of Selected Coefficients: Results for Different Coding Bitrate Compare the Average Detection rate of Embedding Locations (%) vs. Coding bitrate. 99 Average Detection Rate (%) vs. Coding Bitrate (kbps), G=16, S=16. Av verage Detection Rate (%) 98 97 96 95 M=16 G=16 S=16 94 93 32 64 96 128 Coding Bitrate (kbps) Down to 32 kbps, 95 % of the embedding locations found correctly.

Comparison of the Adaptive and Fixed Embedding: Procedure A compressed domain SS method ([Li et al. 2006 IEEE TSP]) is taken as a base for comparison. An m-sequence of length 511 is generated to embed one bit of watermark. This sequence is added to the fixed range of 512 point frequency spectrum of 32 audio frames as in [Li 2006] as follows: for i X i ( k) X ( k) b H ( k) W (16( i 1) k), i 131 1..31, k 116; 1..16 for i 32, k 115 1..15. i X i (k) : k th spectrum coefficient of i th audio frame H i (k) : corresponding HAS threshold for that spectrum coefficient W : generated m-sequence b : watermark bit (i.e. +1 or -1). Psychoacoustic model I in ISO standard is utilized for the calculation of HAS thresholds. 1 bit is embedded to 32 audio frames => 3.2 bits/sec (similar levels to previous methods) The same procedure is followed for the adaptive approach by selecting the embedding regions in each audio frame as described.

Comparison of the Adaptive and Fixed Embedding: SNR RESULTS Song 1 Madonna Music 2. L. Fabian I am who I am 3.Ace of Base Unspeakable 4. M. Jackson Speechless 5. Sugarland Stay 6. C. Aguleria Hurt 7. Dj mhd Ya Zina Club 8GregCerrone 8. Pilling me 9. Gwen Stefani The sweet escape 10. The police Roxanne SNR (db) Fixed SNR (db) Adaptive Embedded Bits HF Blocks 18.94 18.67 800 85 21.56 20.58 341 35 19.40 19.30 695 27 19.80 19.67 798 27 24.17 22.47 1006 108 21.82 21.37 864 29 18.86 18.49 968 212 18.81 18.55 1253 206 19.52 19.23 880 66 20.79 20.56 664 48 The resulting SNR values, # of embedded bits and # of High Frequency (HF) watermark blocks. R lt d SNR l ll f th d ti h Resulted SNR values are smaller for the adaptive approach The HAS threshold values are greater when the selected embedding region is in high frequency regions.

Comparison of the Adaptive and Fixed Embedding: Imperceptibility Tests Two Alternative Two Choice Test [Neubaer 2000]: Generate a set of 10 pairs each of which is selected randomly from the pairs: {(O,O),(O,W),(W,O),(W, W)} Ask the subject both items are equal or not. The level of significance is taken as 0.05. # of hits is {8,9,10} => Non-Transparent with 95 % probability H: # of Hits D: Decision T: Transparent NT: Not Transparent Both approaches adjust the watermark strength with respect to HAS thresholds. => No significant difference in imperceptibility for the fixed and adaptive approaches.

Comparison of the Adaptive and Fixed Embedding: Robustness Tests against Coding 7 6.5 Average BER vs. Bitrate Fixed Embedding Adaptive Embedding 11 10 Average BER vs. Bitrate for High Frequency Blocks Fixed Embedding Adaptive Embedding Average BE ER (%) Average BER (% %) 6 5.5 5 45 4.5 2 1 Av verage BER (%) ) Average BER (% 9 8 7 6 5 4 3 4 3 2 3.5 32 64 96 128 Coding Bitrate (kbps) Coding Bitrate (kbps) (all files) 1 32 64 96 128 Coding Bitrate (kbps) Coding Bitrate (kbps) (only high frequency blocks) More than 1.5 % gain in BER for the same bitrate. More than 64 kbps gain in coding bitrate for the same BER. More than 8 % gain in BER for the same bitrate for the high frequency blocks.

Comparison of the Adaptive and Fixed Embedding: Robustness Tests against Additive Noise 7 6.5 Average BER vs. Noise SNR (db) Fixed Embedding Adaptive Embedding 8 7 Average BER vs. Noise SNR (db) for High Frequency Blocks Fixed Embedding Adaptive Embedding Average BE ER (%) (%) Average BER ( 6 5.5 5 4.5 4 Av verage BER (%) Average BER (% %) 2 3 3 1 6 5 4 2 3.5 1 3 10 15 20 25 30 35 40 Noise SNR(dB) Noise SNR (db) (all files) 0 10 15 20 25 30 35 40 Noise SNR(dB) Noise SNR (db) (only high frequency blocks) More than 1.5 % gain in BER for the same noise SNR. More than 10 db gain in Noise SNR for the same BER. More than 4-5 % gain in BER for the same Noise SNR for the high frequency blocks.

Comparison of the Adaptive and Fixed Embedding: Robustness Tests against Low Pass Filtering 9 8 Average BER vs. Cutoff Frequency(kHz) Fixed Embedding Adaptive Embedding 40 35 Average BER vs. Cutoff Frequency(kHz) for High Frequency Blocks Fixed Embedding Adaptive Embedding Average BE ER (%) Average BER (% %) 7 6 5 3 Average BER (% R %) (%) Av verage BER 30 25 20 15 10 4 5 3 0 2 4 6 8 10 12 14 Cutoff Frequency(kHz) Cutoff Freq. (khz) (all files) 0 0 2 4 6 8 10 12 14 Cutoff Frequency(kHz) Cutoff Freq.(kHz) (only high frequency blocks) Adaptive approach is better than the fixed approach. The gain is higher for the high frequency blocks. Sudden increase in BER after 2.8 khz.

Comparison of the Adaptive and Fixed Embedding: Distribution of the highest energy subbands 80 Distribution of the index of highest energy subbands 70 es (%) (%) of of th the all frames fram 60 50 40 30 20 10 0 0 5 10 15 20 25 30 index of the subband Index of the subband After a low pass filtering with a cutoff frequency 1.4 khz., about 10 % of the highest energy subbands including watermark is being blocked. => Cause a sudden increase in BER.

Summary The weaknesses of fixing the embedding locations always in the low frequency band are figured out. An adaptive method solving the problem by embedding the watermark into highest energy region is developed. The method is well suited in particular for the exchange networks consisting of various audio files in different characteristics. enables the automation of watermarking process for large databases. forms also a base for the selection of watermarking locations for the other schemes such as QIM. 20