Information Hiding Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 regalia@cua.edu Baltimore IEEE Signal Processing Society Chapter, Sept 2006
Outline 1 Background 2 Dirty Paper Coding 3 Codeword Binning 4 Code construction 5 Remaining problems
Information Hiding Insert a message secretly/clandestinely in some other cover signal, and extract it later. Early application was watermarking: Attempt to assert ownership of digital content by hiding some signature; Basic zeal has been extended to fingerprinting codes and other snare technologies; More constructive applications may be identified: Inserting medical information in patient records; Inclusion within secure authentication systems; Confidential messaging using conventional media; Reaching capacity in multi-user communications;
Information Hiding: Simplest case With no constraints on storage or transmission, information hiding is trivial: Replace select bits imperceptibly. Cover signal Modified signal Hiding algorithm Clear channel Extraction algorithm message TOP SECRET Recovered message TOP SECRET If cover signal undergoes compression and/or transmission artifacts, the embedded message is readily lost.
Information Hiding: Simplest case With no constraints on storage or transmission, information hiding is trivial: Replace select bits imperceptibly. Cover signal Modified signal Hiding algorithm Nasty channel Extraction algorithm message TOP SECRET Recovered message? If cover signal undergoes compression and/or transmission artifacts, the embedded message is readily lost.
Communications viewpoint: Cover signal Hiding algorithm Modified signal Channel Extraction algorithm key message Estimated message Want modified signal to resemble cover signal (low distortion or imperceptibility); Want estimated message to equal original message (with high probability). Fundamental question What is the maximum message length that can be hidden, subject to these constraints?
Distortion: Measured vs. Perceived original modified Mathematical distortion measures may not correlate perfectly with perceptual distortion measures.
Resilience to Channel Errors Use some form of coding: Cover signal Hiding algorithm Channel Extraction algorithm Coding Message Estimated message Lock box choices: Spreading sequences: Noise-like qualities; easy to separate users; Error correction codes (turbo codes, LDPC,... ): Excellent robustness to channel errors; Cryptographic locks;
Embedding Capacity Channel randomly flips 10% of the bits: 0.6 0.5 Embedding capacity C 0.4 0.3 0.2 0.1 (c) (b) (a) 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Hamming distortion d (a) Spreading sequences; (b) Shannon limit error-correction code (e.g., turbo, ldpc,... ); (c) Theoretical embedding capacity.
First Analytic Result: Dirty paper coding (Costa, 1983) Continuous-amplitude signals; Interference known to encoder, but not decoder; Interference Noise b : E(b 2 ) = N Message Encoder Decoder Information-theoretic capacity: C = 1 ( 2 log 1 + P ) N w : E(w 2 ) P (bits/symbol) Estimated message This results holds independent of the interference power.
Adaptation to binary sources (Barron et al, IT 2003) Binary {0, 1} sequences; Interference known to encoder, but not decoder; Interference Noise: Bernoulli(p) Message Encoder Decoder w: Hamming rate D Estimated message Information-theoretic capacity: Concave envelope of the function g(d) = h(d) h(p) with h(p) = p log p (1 p) log(1 p). When D 0.5, we recover g(0.5) = 1 h(p) as capacity of BSC.
Illustration for p = 0.1 0.6 Embedding capacity C 0.5 0.4 0.3 0.2 0.1 Concave envelope p = 0.1 g(d) = h(d) h(p) 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Hamming distortion D
Codeword binning Message length: M Cover signal length: N Encoder operation: Start with a code book having 2 K code words, partitioned into 2 M bins (M < K < N ): 2 K M code words per bin } {{ } 2 M bins (one per message) A good error correction code (turbo or LDPC) gives a suitable code book; optimal distribution invokes nested codes.
Encoder operation (cont.) The message to hide determines which bin to use: (this one) 001011100100101 011000100010001 100010010000110 001000001111001 min distance Cover signal 001010100100101 Output codeword 001011100100101 The (portion of) the cover signal is replaced with the closest code word from the bin.
Decoder operation The decoder associates closest code word (from overall code book) to received signal: chosen code word Channel 001011100100101 Which bin? Decoder output code word The bin containing it is located, which identifies message.
Lossy source codes Compression code takes N bits x in; Output y is closest word from dictionary of size 2 L (L < N): x N bits Lossy compression N bits y = Q(x) Average Hamming distance: D = 1 Pr(x) d H (x, y) N x {0,1} N From rate-distortion theory, if Pr(x) is uniform, L/N 1 h(d) with h(d) = D log D (1 D) log(1 D).
Lossy source codes 1.0 0.8 Min rate L /N 0.6 0.4 1 h(d) 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Distortion D From rate-distortion theory, if Pr(x) is uniform, L/N 1 h(d) with h(d) = D log D (1 D) log(1 D).
Duality with Error Correction Codes A source compression code maps N bits into L < N bits (with some distortion): x N bits Lossy compression y = Q(x) N bits Index assignment z L bits An error correction code maps L bits into N bits: Duality z L bits Inverse map Error correction code y N bits A source code that attains the rate-distortion curve gives an error correction code that attains Shannon capacity.
Parity check matrices An error correction codes is often described by its parity check matrix H. Example: Hamming (7,4) code: 1 0 1 1 1 0 0 0 = H y = 1 1 0 1 0 1 0 1 1 1 0 0 0 1 which gives 0 = y 1 y 3 y 4 y 5 0 = y 1 y 2 y 4 y 6 0 = y 1 y 2 y 3 y 7 y 1. y 7 as the parity check constraints applicable to each code word y.
Nested codes Error correction code, described in terms of its parity check matrix H {0, 1} (N L) N : Split H row-wise as H y = 0 y is a code word. N L rows { [ ] N K rows { H1 H 2 } K L rows Observe: If H y = 0 then H 1 y = 0 as well; Nest terminology: CodeBook(H) CodeBook(H 1 ). Null space of H is called the course code ; Null space of H 1 is called the fine code ;
Recipe for Information Hiding code: Given cover signal x and message m, transmit y where: y x subject to [ ] 0 = m [ H1 H 2 ] y At decoder: From received signal η = Channel(y), find nearest code word: ŷ η subject to H 1 ŷ = 0. Estimate message as m = H 2 ŷ
Rate Constraints: Upper bound K/N (fine code rate) as K N 1 h(p) (channel resilience) Lower bound L/N (course code rate) as L N 1 h(d) (distortion constraint) Available message rate: provided K > L. M N = K L h(d) h(p) N Concave envelope of message rate gives cited bound.
Code word quantization Finding y = Q(x) is a well-known problem: vector quantization. considered a hard problem (exponential complexity in length N). BUT: y is drawn from an error correction code. vector quantization = error correction decoding! TRUE, equivalent in fact to maximum likelihood decoding, which is computationally efficient only for certain code classes.
Some code classes Trellis codes: maximum likelihood decoding = Viterbi algorithm 010110 D 011000 D 010011 state 00 01 1/00 0/10 10 11 0/01 0/00 0/11 1/10 1/11 1/01 time Convolutional codes fall short of Shannon limit. Turbo and LDPC codes approach Shannon limit, but their trellis descriptions have very high complexity; Why not use belief propagation decoding?
On belief propagation decoding Not the same as maximum likelihood decoding, even though error correction performance is almost as good; Does not work well for code word quantization: 0 200 400 600 800 1000 0.0 0.2 0.4 0.6 0.8 1.0 Symbol index Symbol estimate
Nonetheless... 115 115 image, quantizing upper left 51 80 block: Original LDPC Trellis code 4 pixels change 319 pixels change (Obtained using modified belief propagation algorithm; work in progress... )
Concluding remarks Information hiding can be seen as an application of dirty paper coding, of interest in multi-user communications; Code construction involves nested codes and code word binning; Practical obstacle (at present) involves code word quantization, akin to the classical problems of vector quantization or maximum likelihood decoding. Various applications would benefit nicely from overcoming the previous obstacle.