Digital Image Watermarking Yun Q. Shi Electrical and Computer Engineering New Jersey Institute of Technology shi@njit.edu 19 th November 2004 shi 1
Outline Introduction What is image data hiding? Fundamentals Robust Image Data Hiding JPEG compression, geometric distortion Lossless Image Data Hiding Reversibility, image authentication Steganography and Steganalysis Status, and some of current work shi 2
Introduction to Data Hiding A process to hide a set of data into a cover medium imperceptibly. The former set of data is referred to as hidden data (mark signal). The latter, cover media, is a second set of data. The cover media with the hidden data inside is referred to as marked media (stegomedia). shi 3
Embedding cover media mark data hiding marked media Extraction marked media retrieval media after data retrieval extracted data shi 4
Applications Copy-right protection (original driving-force application, now Hollywood movie fingerprinting) Authentication (temper detection, monitoring) Covert communications (confidentiality) Many more, including (Multi-level) secure data systems in military, medical and law enforcement fields Digital notarization On-line identity verification shi 5
Example: Tamper Detection (could be much more advanced) Watermarked Altered image image Altered Watermark watermark shi 6
Three Major Types of Data Hiding Algorithms Least Significant Bit-plane (LSB) Data are hidden into the least significant bit-plane. Vulnerable to attacks. Spread Spectrum (SS) Inspired by the idea of spread spectrum RF communication system which was invented during WWII for RF covert communication. Most robust to attack Most difficult for steganalysis Quantization Index Modulation (QIM) Use different quantization strategies to hide data. shi 7
Fundamental Requirements Imperceptibility for invisible watermarking Robustness for robust watermarking Common signal processing procedure In particular, JPEG compression Geometric distortions (e.g. rotation) Malicious attacks such as collusion Not required for fragile watermarking Security aspect shi 8
Imperceptibility Original Lena Baboon image image Watermarked image image shi 9
JPEG Compression Original image (8 bpp) After JPEG compression (quality factor 50 (1 bpp)) shi 10
Geometric Distortion It embeds more than 100 bits in the image. After geometric attack (includes rotation, scale, translation), it still can extract the information successfully. Geometrically distorted image Geometrically corrected image shi 11
I. Robust Image Data Hiding Various techniques developed in image processing to correct geometric distortion have been used to tackle this issue. One successful technique: Template hidden in Fourier transform to detect affine transform Training sequence in wavelet transform to detect translation Watermark in wavelet transform to achieve robustness against compression X. Kang, J. Huang, Y. Q. Shi, Y. Lin, A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression, IEEE CSVT, August 2003. Problem still open Random small geometrical distortion shi 12
II. Lossless Data Hiding What is lossless data hiding? Lossless data hiding is such a kind of data hiding techniques in which the original cover media can be recovered without any distortion after hidden data extraction. (refer to figure in the next slide) Also, referred to as: distortion-free, invertible, reversible. shi 13
Embedding cover media mark data hiding Stego-media Extraction Stego-media retrieval media after data retrieval (=original cover media) extracted data shi 14
The following are two marked images by using a reversible marking method. (The American Independence Declaration contains 6,760 characters ) 7000 bytes hidden in this medical image (512x512) 2000 bytes hidden in this image (512x512) shi 15
Most of Existing Watermarking Schemes are not Reversible Spread spectrum method is not invertible due to truncation error (to prevent over/underflow), and round-off error. LSB scheme is not due to bit-replacement without memory. Quantization-Index-Modulation (QIM) is not due to quantization error. shi 16
Applications Driving application of reversible date hiding is: Authentication Can be used in some special applications, such as law enforcement and medical fields, where original cover media is required for legal reasons. Can be used in military, remote sensing and high energy physics experiments, where high accuracy is required, or data acquisition is expensive. Embedding data into cover media while keeping the media reversible opens a new door for: Linking some dada with original media. For instance: Medical data system shi 17
State of the Art 1. Barton s U.S. Patent 5,646,997 (97) (1 st ) 2. Honsinger et al. s US Patent 6,278,791 B1 (01) (1 st ) 3. Fridrich et al. s method (SPIE01) (1 st ) 4. de Vleeschouwer et al. s method (MMSP01) (3 rd ) 5. Goljan et al. s method (IHW01) (2 nd ) 6. Xuan et al. s method (MMSP02) (2 nd ) 7. Celik et al. s method (ICIP02) (2 nd ) 8. Ni et al. s method (ISCAS03) (2 nd ) 9. Tian s method (CSVT03) (2 nd ) 10. Yang et al. s method (SPIE04) (2 nd ) 11. Thodi & Rodríguez s method (SWSIAI04) (2 nd ) 12. Ni et al. s method (ICME04) (3 rd ) 13. Zou et al. s method (MMSP04) (3 rd ) 14. Xuan et al. s method (MMSP04) (2 nd ) 15. Xuan et al. s method (IWDW04) (2 nd ) shi 18
State of the Art 1 st Category 1. Barton s U.S. Patent 5,646,997 (97) 2. Honsinger et al. s US Patent 6,278,791 B1 (01) (modulo-256 addition) 3. Fridrich et al. s method (SPIE01) (bit-plane compression in spatial domain) shi 19
Data embedding: Honsinger et al. s Modulo-256 Addition Method (the representative of 1 st category) Spatial domain Modulo 256 addition (reversible, key idea) Non-adaptive Iw = (I + W )mod 256 Iw: watermarked image I: original image W = W(H(I),K): watermark operation H(I): hash function of original image I K: secret key. shi 20
Eastman Kodak s Method (cont.) Reversibility is obvious: (I+W) mod 256 = Iw (Iw-W) mod 256 = I Comments: If the image is authentic, the original image data can be recovered without any distortion. Marked image may suffer from salt-and-pepper noise due to possible grayscale flipping over between 0 and 255 in either direction. shi 21
State of the Art 2 nd Category 1. Goljan et al. s R-S method (IHW01) 2. Celik et al. s G-LSB method (ICIP02) 3. Xuan et al. s Bit-plane Compression in IWT method (MMSP02) 4. Ni et al. s Histogram Manipulation method (ISCAS03) 5. Tian s Difference Expansion (DF) method (CSVT03) 6. Yang et al. s Companding in IDCT method (SPIE04) 7. Thodi & Rodríguez s Prediction-error Expansion method (SWSIAI04) 8. Xuan et al. s spread spectrum (SS) in IWT method (MMSP04) 9. Xuan et al. s companding in IWT method (IWDW04) shi 22
Xuan et al. s IWT-based 3 rd Method: Companding Based on IWT, histogram modification Data are embedded using companding scheme The performance in terms of data embedding capacity versus visual quality of marked image are clearly better than Tian difference expansion method, and Xuan et al. s first two methods. shi 23
Integer Wavelet Transform Efficient in coefficient de-correlation Features consistent with that of our human vision system Integer to integer transform (reversible) Efficient in calculation: lifting scheme (2 nd generation) shi 24
parts to be merged 0 255 (a) original histogram parts after merge 0 G/2 G/2 255 (b) modified histogram Histogram Modification shi 25
Companding: Compression and Expansion Used for nonuniform quantization of speech signal to enhance signal to quantization noise ratio Figures from a communication text by Lathi: shi 26
HL 1 LL1 HL1 LH 1 LH1 HH1 f ( x) = λ λ x 2 e (a) Wavelet subbands (b) Laplacian distribution function, λ = 2 HH1 HL1 LH1 (c) Wavelet coefficient distribution for Lena image in high-frequency subbands Laplacian-like distribution of wavelet high-frequency coefficients shi 27
shi 28 Compression function + < = = T x as T T x sign x T x as x x C C Q Q, 2 ) (, ) ( ) ( ( ) Cx 2T T T 3T x
Companding ECx ( ( )) = x C: compression, E: expansion y = C(x) = pp 1 2 pn y' = pp 1 2 n p {0,1} pb b {0,1} i y ' x Then distortion will be imperceptible b = LSB( y ) Data extraction y = y 2 x = E( y) Original cover medium recovery 29
Companding, Spread-spectrum, Difference expansion Lena image test results (PSNR versus Payload) shi 30
Companding, Spread-spectrum, Difference expansion Barbara image test results (PSNR vs Payload) shi 31
Comments on Algorithms in 2 rd Category Payload vs PSNR of marked image is normally the major performance criterion. Two factors heavily influence this performance. The amount of change in pixel value or transform coefficient magnitude required to embed one bit. The number of pixels/coefficients are required to embed one bit. shi 32
State of the Art 3 rd Category 1. de Vleeschouwer et al. s method (MMSP01) (patchwork, modulo-256 addition) 2. Ni et al. s method (ICME04) (patchwork, average difference of pair pixels) 3. Zou et al. s method (MMSP04) (patchwork, integer wavelet transform) shi 33
de Vleeschouwer et al. s Patchwork, Modulo-256 Addition Based Robust Reversible Data Hiding 1 st in the 3 rd category Mostreversible data hiding techniques are fragile in the sense that once the marked media go through any alteration, the hidden data may not be able to be extracted with no error. De Vleeschouwer et al s method is the first and only reversible data hiding algorithm for several years, which has some robustness against high quality JPEG compression. shi 34
Patchwork and Modulo-256 Based Algorithm The host image is divided into non overlapped blocks. In each block, pixels are randomly grouped into two sets, A and B. Each pixel value in a set is projected as a unitary weighted ball onto a circle. The mass center of the circle is calculated. According to patchwork theory, the mass centers of the two sets should be close to each other, so the angle between two vectors (each from the origins of circle to the mass center) will be very small. shi 35
(Cont.) To embed one bit of information into this block, the gray level values of pixels in this block are modified according to following rule: To embed 1 C =C+P for pixels in group A C =C-P for pixels in group B To embed 0 C =C-P for pixels in group A C =C+P for pixels in group B In some blocks, the mass centers of the two sets are far away from each other, which will cause problem when embedding. We call such blocks Problem blocks. shi 36
Bit 1 is embedded into a block shi 37
Data Extraction and Restoration of Original Image For a test image, it will be divided into blocks and each block is grouped into two sets the same way we did previously at the embedding stage. The angle between the two mass center vectors of each block described previously is calculated in the same way. If the angle is great than 0, we extract a 1 from this block. Otherwise, we retrieve a 0. After all blocks are scanned and hence the hidden bits are retrieved, the pixel values of each block are reversed with respect to what have done at the embedding stage. If no errors occur in extraction, the exact original image can be obtained. shi 38
Drawbacks Some marked images suffer from salt-and-pepper noise because the algorithm utilizes modulo-256 addition. That is, in doing modulo-256 addition, a very bright pixel with a large gray value close to 256 will be possibly changed to a very dark pixel with a small gray value close to 0, and vise versa. The marked image does not have high enough PSNR. Our extensive investigation shows that the subjective visual quality of marked images can be substantially lowered. shi 39
Marked medical image (severe salt-pepper noise) shi 40
Marked medical image (severe salt-pepper noise) shi 41
Marked medical image (severe salt-pepper noise) shi 42
Marked medical image (severe salt-pepper noise) shi 43
Marked medical image (severe salt-pepper noise) shi 44
Marked medical image (some salt-pepper noise). shi 45
Marked JPEG2000 test image (severe salt-pepper noise). shi 46
Marked JPEG2000 test image (severe salt-pepper noise). shi 47
Marked JPEG2000 test image (severe salt-pepper noise). shi 48
Marked JPEG2000 test image (severe salt-pepper noise). shi 49
Marked JPEG2000 image (severe salt-pepper noise). shi 50
Marked JPEG2000 image (some salt-pepper noise). shi 51
Marked JPEG2000 image (some salt-pepper noise). shi 52
Marked JPEG2000 image (some salt-pepper noise). shi 53
Table 1. Test results for eight medical images with block size as 8, embedding level as 6. Images (512x512) Mpic1 Mpic2 Mpic3 Mpic4 Mpic5 Mpic6 Mpic7 Mpic8 PSNR of marked image (db) 9.93 4.94 28.10 28.21 28.21 5.86 10.52 6.26 Data embedding capacity (bits) 100 100 100 100 100 100 100 100 Robustness (bpp) shi 54 0.8 1.6 0.8 0.4 0.8 2.0 0.8 1.6
Table 2. Test results for eight JPEG2000 test images with block size as 20, embedding level as 8. Images (1536x1920) PSNR of marked image (db) Data embedding capacity (bits) Robustness (bpp) N1A (Woman) 17.73 1410 0.8 N2A 17.73 1410 2.2 N3A 23.73 1410 0.6 N4A 19.67 1410 1.2 N5A 17.28 1410 1.2 N6A 23.99 805 0.6 N7A 20.66 1410 1.4 N8A 14.32 805 shi 55 1.4
Comments The 1 st robust reversible data hiding algorithm Modulo-256 addition generates annoying salt-andpepper noise. Our extensive experiments show that it is in fact un-acceptable in many cases, in particular, it is bad for medical images. We hence conclude that all reversible data hiding algorithms, using modulo-256 addition, have this problem. shi 56
shi 57 Ni et al. s Patchwork Based on Method A block statistical quantity is used as a robust parameter to embed information. For a given image block, we split it into two subsets A and B as shown below. + - + - + - + - - + - + - + - + + - + - + - + - - + - + - + - + + - + - + - + - - + - + - + - + + - + - + - + - - + - + - + - +
The difference value α is defined as the arithmetic average of differences of pixel pairs, α = 1 n n i= 1 ( a i b i ) It shows that most values of are very close to zero. α shi 58
Since the difference value α is based on the statistics of all pixels in the block, it has certain robustness against compression attacks. Hence the difference value α is used as a robust quantity for data embedding. shi 59
Bit Embedding Strategy If 1 is to be embedded, we shift the average difference value to right side or left side beyond a threshold, by adding or subtracting a fixed number from each pixel value within one subset. If 0 is to be embedded, the block is intact. Value shift toward left to embed 1 Value shift toward right to embed 1 T 0 T Original Difference Value shi 60
Differentiate bit-embedding schemes according to the content of the block. Four different patterns of pixel value distribution in a block are identified. For each type, a different bit-embedding scheme is developed and applied. ECCand Permutation techniques are used to achieve reversibility and enhance robustness. shi 61
0 d d1 255-d2 255-d 255 0 d 255-d 255 Type 1 Type 2 0 255 d 255-d 0 255 d 255-d Type 3 Type 4 Differentiating bit-embedding schemes based on block content shi 62
Bit Extraction Strategy Data extraction is actually the reverse process of the data embedding. shi 63
Experimental Results This novel algorithm has been applied to Commonly used grayscale images such as Lena, Baboon, etc. Eight medical images 1096 images in the CoralDRAW image database Eight JPEG2000 color test images. Our new algorithm can be applied to all of these test images successfully. shi 64
Original image Marked image shi 65
Original image Marked image shi 66
Original image Marked image shi 67
Original image Marked image shi 68
shi 69 Original image Marked image
JPEG2000 test image (a) original (b) marked shi 70
Table 3. Test results for eight medical images with block size as 8, embedding level as 6. Images (512x512) Mpic1 Mpic2 Mpic3 Mpic4 Mpic5 Mpic6 Mpic7 Mpic8 Images (512 PSNR of marked image (db) 37.6 37.7 37.6 37.6 37.6 37.6 37.6 37.6 Data embedding capacity (bits) 100 100 100 100 100 100 100 100 Robustness (bpp) shi 71 0.4 0.8 0.4 0.8 0.4 1.2 0.4 0.8
Table 4. Test results for eight JPEG2000 test images with block size as 20, embedding level as 8. Images (1536x1920) PSNR of marked image (db) Data embedding capacity (bits) Robustness (bpp) N1A(Woman) 41.5 1410 0.8 N2A 41.3 1410 1.2 N3A 41.3 1410 0.8 N4A 41.4 1410 0.8 N5A 41.3 1410 0.8 N6A 41.2 805 0.4 N7A 41.2 1410 0.8 N8A 41.2 805 shi 72 1.2
Superior Performance Have tested different combinations of block size (hence same embedding capacity) and embedding level. For each combination, the average PSNR of Ni et al. s method is much higher than the modulo-256 based method. Also, the average robustness is stronger. No salt-and-pepper noise. shi 73
Application Scenarios Authentication On-line identity verification Secure medical data system shi 74
Application 1: Authentication of JPEG2000 images Traditional authentication schemes fail in some JPEG2000 application scenarios, say, compression with different implementations, different transcoding schemes, multiple compression cycles due to introduced incidental alterations. Authentication framework for JPEG2000 images is needed For both Integrity and Non-repudiation purposes It should include the security solutions for JPEG2000 at the content level which achieves both security and robustness. shi 75
Pixel difference between original and decoded image shi 76
A Unified Authentication Framework for JPEG2000 Image Having cryptographic strength Features signature is embedded into images Fragile and semi-fragile authentication Lossy and lossless compression, hence Lossy module for Semi-fragile authentication Lossless module for Semi-fragile authentication Has been included into JPEG2000 Security Part, JPSEC, Commission Draft 2.0, September 2004. shi 77
System overview of the unified authentication system shi 78
Application 2: On-line Identity Verification A paper published in IWDW04 Sender s fingerprint image Features of fingerprint and sender information are reversibly embedded into the fingerprint image. Verification is conducted at the receiver side with accessing central database. shi 79
Conclusion Reversible data hiding has opened a new door of data hiding: Methodology Linking original media with headers shi 80
III. Steganography & Steganalysis Covert communications Classification of stego-images from the original cover images Urgently needed for homeland security, cyber security shi 81
State of Arts Approach I: Specific steganalysis algorithms aiming at detection of a specific data hiding algorithm. Fridrich s steganalysis for F5 method [1]. Approach II: General steganalysis algorithm. Farid s wavelet based statistical approach [2,3]. Avcibas et al. s image quality metric based statistical approach [4]. shi 82
Status Quo of General Steganalysis Systems Performance by Farid s approach: LSB: 42.2% - 90.2% QIM: not reported SS: not reported Performance by Avcibas et al. s approach: LSB: Around 70% QIM: not reported SS: 80% Size of image database used is small shi 83
Our Investigation Classifier: Feed-forward neural network with backpropagation training algorithm Image Database Used: All of 1096 images in CorelDRAW image database are used in investigation. shi 84
Our Investigation (cont.) Apply a data hiding algorithm to an image generate a stego-image. Thus, we come up with 1096 image pairs. Randomly select 100 image pairs for training. The rest 996 image pairs are used for testing. Experiments are conducted 10 times. The correct detection rates in these 10-time experiments are averaged and reported. shi 85
Performance of Farid s Approach Farid s method: reported in [2] Farid s method: implemented by my group Cox et al. s SS alpha=0.1,[5] Piva et al. s SS [6] No mention 57% No mention 66% LSB(0.3 bpp) 43%-90% 63% shi 86
Conclusion & Future Work Our investigation indicates that the current work on steganalysis is not sufficient. Further theoretical research and large number of experiments are necessary in order to have a blind steganalysis system that can detect most of the data embedding algorithms with a high success rate. shi 87
Acknowledgements Contributions from Professors G. Xuan, J. Huang Ph.D. candidates: J. Zhu, C. Yang, G. Kang, H. Liu, Z. Ni, D. Zou, C. Chen, W. Chen Supports from New Jersey Commission of Science and Technology via New Jersey Center of Wireless Networking and Internet Security (NJWINS) US Air Force Research Laboratory shi 88
References (Lossless Data Hiding) [barton 97] J. M. Barton, Method and apparatus for embedding authentication information within digital data, U.S. Patent 5,646,997, 1997. [honsinger et al. 99] C. W. Honsinger, P. Jones, M. Rabbani, and J. C. Stoffel, Lossless recovery of an original image containing embedded data, US Patent 6,278,791 B1, 2001. [macq and deweyand 99] B. Macq and F. Deweyand, Trusted headers for medical images, DFG VIII-D II Watermarking Workshop, Erlangen, Germany, October 19999. [fridrich et al. 01] J. Fridrich, M. Goljan and R. Du, Invertible authentication, Proc. SPIE, Security and Watermarking of Multimedia Contents, pp. 197-208, San Jose, CA, January 2001. [goljan et al. 01] M. Goljan, J. Fridrich, and R. Du, Distortion-free data embedding, Proceedings of 4 th Information Hiding Workshop, pp. 27-41, Pittsburgh, PA, April 2001. [de vleeschouwer et al. 01] C. de Vleeschouwer, J. F. Delaigle and B. Macq, Circular interpretation on histogram for reversible watermarking, IEEE International Multimedia Signal Processing Workshop, Cannes, France, October 2001. shi 89
[domingo-ferrer and seb e 02] J. Domingo-Ferrer and F. Seb e, Invertible spread-spectrum watermarking for image authentication and multilevel access to precision-critical watermarked images, Proceedings of the International Conference on Information Technology: Coding and Computing, pp. 152-157, April 2002. [celik et al. 02] M. Celik, G. Sharma, A.M. Tekalp, E. Saber, Reversible data hiding, in Proceedings of the International Conference on Image Processing 2002, Rochester, NY, September 2002. [xuan et al. 02] G. Xuan, J. Zhu, J. Chen, Y. Q. Shi, Z. Ni and W. Su, Distortionless data hiding based on integer wavelet transform, IEE Electronics Letters, vol. 38, no. 25, pp. 1646-1648, December 2002. [xuan et al. 04] G. Xuan, Y. Q. Shi, Z. C. Ni, J. Chen, C. Yang, Y. Zhen and J. Zheng, High capacity lossless data hiding based on integer wavelet transform, Proceedings of IEEE ISCAS, Vancouver, Canada, May 2004. [ni et al. 03] Z. Ni, Y. Q. Shi, N. Ansari and W. Su, Reversible data hiding in spatial domain, IEEE ISCAS03. [tian 03] J. Tian, Reversible data embedding using a difference expansion, IEEE Transaction on Circuits and Systems for Video Technology, vol. 13, no. 8, August 2003. shi 90
[yang et al. 04] B. Yang, M. Schmucker, W. Funk, C. Busch, and S. Sun, Integer DCT-based reversible watermarking for images using companding technique, Proceedings of SPIE Vol. #5306, 5306-41, January 2004. [thodi and rodriguez 04] Thodi, D.M., Rodriguez, J.J., Reversible watermarking by prediction-error expansion, 6th IEEE Southwest Symposium on Image Analysis and Interpretation, pp. 21-25, Lake Tahoe, CA, March 28-30, 2004. [ni 04] Z. Ni, Y. Q. Shi, N. Ansari, W. Su, Q. Sun and X. Lin, Robust image lossless data hiding, IEEE ICME, Taipei, Taiwn, June 2004. [xuan 04] G. Xuan and Y. Q. Shi, Integer wavelet transform based lossless data hiding using spread spectrum, IEEE MMSP04, Siena, Italy, September 2004. [zou et al. 04] D. Zou, Y. Q. Shi and Z. Ni, A semi-fragile lossless data hiding scheme based on integer wavelet transform, IEEE MMSP04, Siena, Italy, September 2004. [calderbank et al. 98] A. R. Calderbank, I. Daubechies, W. Sweldens and B. Yeo, Wavelet transforms that map integers to integers, Applied and Computational Harmonic Analysis, vol.5, no.3, pp.332-369, 1998. JPSEC Commission Draft Version 2.0, ISO/IEC JTC 1/SC29/WG1 N3397. shi 91
References (Steganalysis) [1] J. Fridrich, R. Du, and L. Meng, Steganalysis of LSB Encoding in Color Images, Proceedings IEEE International Conference on Multimedia and Expo, July 30 August 2, 2000, New York City, NY. [2] H. Farid, Detecting Hidden Messages Using Higher-Order Statistical Models International Conference on Image Processing (ICIP), Rochester, NY, 2002 [3] H. Farid and L. Siwei, Detecting Hidden Messages Using Higher-Order Statistics and Support Vector Machines, Proceedings 5th Information Hiding Workshop, Noordwijkerhout, Netherlands, Oct., 2002. [4] I. Avcibas, N. Memon, B. Sankur, Steganalysis using image quality metrics ; IEEE Transactions on Image Processing, vol.12, pp.221-229, Feb. 2003. [5] I. J. Cox, J. Kiliany, T. Leightonz and T. Shamoony, Secure Spread Spectrum Watermarking for Multimedia, IEEE Trans. on Image Processing, vol.6, col.12, pp. 1673-1687, 1997. [6] A. Piva, M. Barni, F. Bartolini, V.Cappellini, DCT-based watermark recovering without resorting to the uncorrupted original image, International Conference on Image Processing, vol.1, pp.520-523, Oct. 1997. shi 92