Image Data Compression. Introduc<on to watermarking and steganography

Image Data Compression Introduc<on to watermarking and steganography 1

Examples and common terminology Example: watermarks (WMs) embedded in paper money ing Hidden from view (or non-obstruc<ng) in normal use Recovered via a special process (holding up to light) Contains informa<on related to object (bill authen<city) The presence of a watermark may be known to user Example: a secret wri;en with milk on top of a non-secret (cover, or decoy) le;er wri;en with ink Hidden is unrelated to the decoy leuer The presence of the hidden itself is secret (otherwise called overt embedded communica<on ) From Greek steganos = covered Steganographic communica<on ing of electronic signals: Work: a speciﬁc signal, such as an image, audio or video record (e.g., song in MP3) Content: set of all Works (e.g., all audio music) that may be WM ed Medium: means of represen<ng, transmi_ng or recording the content (e.g., a CD) Book on subject: Cover Work: original signal before modiﬁca<on (called embedding) [Cox et al, 08] ing: impercep+bly altering a Work to embed a about that Work Steganography: undetectably altering a Work to embed a secret Steganalysis: detec<on whether secret steganographic communica<on is taking place 2

Categoriza8on of informa8on hiding Cover Work-dependent Cover Work-independent Existence of is hidden Covert watermarking Example: In 1981, M. Thatcher marked copies of secret documents with unique word spacing pauerns to find a cabinet minister who leaked informa<on. Steganography Example: SALT-II nuclear treaty described sensors on missile silos, repor<ng if a silo is occupied. Both USSR and USA inves<gated if sensors could transmit other informa<on. Existence of is known Overt watermarking Example: Web-sites of some museums provide high-quality digital images of the collec<on, with a warning that each image is watermarked to protect it against piracy or reproduc<on. Overt embedded communica<on Example: In late 1940s, a <me code at 800Hz frequency was embedded in radio broadcast. The code was inaudible, and only communicated the current <me to various automa<c devices. Cover Work Payload Embedder ed Work Stego Work Detector (Maybe) Detected payload 3

Possible uses of digital watermarking and steganography ing: Internet and high-capacity digital recording devices facilitate unauthorized copying Cryptography provides the protec<on in transit, but not aler delivery ing is a complement to cryptographic protec<on, never removed during use, may be designed to survive content transforma<ons (re-encoding, format changes, etc.) Steganography: Electronic communica<ons are suscep<ble to eavesdropping and interven<ons Security and privacy can be addressed by cryptographic tools, or by anonymous remailers However, encryp<on is not hidden, and the presence of the communica<on is obvious Steganography can be used also in cases when the encryp<on is prohibited Cryptanalysis aims to establish whether communica<on is taking place (e.g. to prevent criminal ac<vi<es or to iden<fy members of an organiza<on) Eve Alice Detect communica<on, auack the channel Covert communica<on Bob 4

Some specific applica8ons of digital watermarking Broadcast monitoring air <me verifica<on, re-broadcas<ng control, ads control ac<ve vs passive monitoring: index complexity reduc<on Owner iden<fica<on legal copyright no<ce, owner contact informa<on Proof of ownership key in central repository, digital nega<ve, asymmetric iden<fica<on of original / derived work Transac<on tracking (fingerprin<ng of copies) e.g. iden<fica<on of leaking party at Oscar Award previews Content authen<ca<on detec<on of tampering, localized [semi]fragile digital signature Copy / Record / Playback control e.g. DVD copy protec<on Device control copy preven<on marks, ads indicator, traffic info on FM radio Legacy enhancement digital signals transmiued over analog networks, lyrics in MP3 Desired WM properdes and characterisdcs: Impercep<bility: WM must not ruin the aesthe<cs of Cover Work Inseparability: WM cannot be removed by conver<ng, re-forma_ng, etc. Example of (rela<vely poor) watermarking as owner iden<fica<on: the complete Lena image and its (usually omiued) copyright no<ce 5

Quan8ta8ve metrics of watermarking systems Embedding effecdveness probability that output is iden<fied as watermarked immediately aler embedding (may be <100%) compromise between effec<veness and fidelity; determined analy<cally or with large DB of Works Fidelity how impercep<ble WM is in Work; perceptual similarity between original and watermarked Works (possibly aler addi<onal degrada<on of both due to delivery); based on some percep<ve model Robustness how well WM survives common signal processing opera<ons: spa<al / temporal filtering, lossy compression, prin<ng / scanning, geometric distor<ons etc. Data payload number of bits a watermark encodes per unit of <me or per Work; N-bit WM encodes 2 N +1 possible detector outputs (one bit always encodes very presence of WM) Blind [public] or informed [private] detecdon whether original Work is needed for successful detec<on False posidve rate probability to erroneously detect a missing WM per detector run (fixed Work, random WMs) Security ability to resist hos<le auacks, such as unauthorized removal (elimina<on, masking, collusion) / embedding (forgery) = ac<ve auacks, unauthorized detec<on = passive auack Use of secret watermark key (similar to cipher key) Cost deployment of embedders / detectors, computa<onal load, real-<me requirements etc. 6

Proper8es of steganographic and steganaly8c systems Recall the primary goal of steganography: conceal the fact that the covert communica<on is present within innocuous communica<on ProperDes of a WM system irrelevant for steganography: Embedding effecdveness: N/A, due to freedom to choose a suitable Cover Work Fidelity: N/A, since the steganalyzing party normally has no access to the original Work Blind / informed extracdon: N/A, one usually assumes that the original Work is not available Robustness: N/A, noise etc. not a big issue for modern digital communica<on ProperDes important for steganography: Embedding capacity: maximum theore<cally possible number of embedded bits Steganographic capacity: max payload hidden without ar<facts, so that the detec<on is improbable Embedding efficiency: number of embedded bits per unit of distor<on Robustness against system / blind / targeted steganalysis Based on method weakness (implementa<on fault, insufficient stego keyspace), sta<s<cal proper<es common to all methods, detectability of s embedded with a specific method. StaDsDcal undetectability Is it hard to no<ce the presence of a? Usually (loosely) quan<fied in terms of sta<s<cal anomalies, based on some sta<s<cal model of relevant Works. False alarm rate: tradeoffs characterized with Receiver Opera<ng Characteris<c (ROC) curves Security: resistance to passive (observa<on), ac<ve (obstruc<on), malicious (impersona<on) auacks Use of stego keys Algorithm is assumed known, embedding is controlled by a secret key. Schemes can be symmetric or asymmetric; as a rule, key length does not influence the security as much as the length of crypto keys 7

Small digression: communica8on systems Reminder: generic image compression + communicadon system Source Sink Remove irrelevance Remove redundancy Transform Inject missing irrelevant informa<on (based on assump<ons/models) Inverse transform Simple communicadon system: Input m TransmiUed signal: Channel encoder x Lossless coding Decoding n + x = {x 1, x 2,..., x N }, Noise Simplest channel: addi<ve white Gaussian noise, y Codec Channel Channel coding Modula<on Channel Demodula<on Error correc<on Channel decoder Inject channel-adjusted redundant informa<on (error protec<on) Exploit, remove injected redundancy Output x 2 i p i y = x + n, n ~ N( 0,σ 2 ) m n Goal: maximize likelihood that the detected is iden<cal to the original 8

Secure communica8on systems CommunicaDon with encrypdon: Input m Noise Channel Channel Encryptor + encoder Decryptor decoder Encryp<on key x n y Decryp<on key Output m n CommunicaDon with key-based channel coding: Input m Encoding key Channel encoder x n + Noise y Goal: secrecy of s, messaging layer. Example: RSA encryp<on Channel decoder Decoding key Output m n Goal: guaranteed delivery of signals, transport layer. Example: spread-spectrum radio 9

Communica8on-based models of watermarking ing with a simple informed detector: Input m embedder key encoder w a Original cover work + c o c w n + Noise c wn co detector + - wn Original cover work decoder key Output m n Message mapped to added pa4ern w a (could be via intermediate pa4ern w m ) w a is added to cover work w o to produce watermarked work c w (i.e. blind embedder: ignores proper<es of cover work) Further processing adds noise n (could be more complex: compression, auacks etc.) If original work is subtracted, then the communica<on process is iden<cal to simple communica<on model with addi<ve noise 10

Communica8on-based models of watermarking ing with a blind detector: Input m embedder key encoder w a Original cover work + c o c w n + Noise c wn detector decoder key Output m n In blind detector, cover work is just another kind of noise One goal: maximize similarity between the input and output s Another possible goal: learn how exactly watermarked work was processed 11

Simple WM system: blind embedding, correla8on detector Embed 1-bit m: c = co + α ( 2m 1) wr m=1 m=0 +α ( 2m 1) cover work = reference pauern Detect via linear correladon (scalar product): 1 1 zlc (c, wr ) = c wr = c[x, y] wr [x, y] N N x,y Eﬀect from noise: c = co ± α wr + n, zlc (c, wr ) = zlc (co, wr ) + zlc (n, wr ) ± α zlc (wr, wr ) If reference pauern has zero mean, unit variance, then: " 1, zlc (c, wr ) > τ $$ mn = # no, zlc (c, wr ) < τ $ 0, z ( c, wr ) < τ $% lc 12

Tes8ng simple WM system with DB of images Percentage of images [Cox et al, 08] m = 0 No WM m =1 Plausible threshold value DB of 4000 images Strength α = 1 Threshold τ = 0.7 results in false posi<ve probability of ~ 10-4 Performance highly dependent on reference image E.g. low-pass filtering results in much poorer separa<on, higher FP probability Detec<on value (linear correla<on) Working and useful WM system; however, only op<mal if the cover work and noise are drawn from Gaussian distribu<on, suscep<ble to certain auacks 13

ing as communica8on with side informa8on Idea: allow embedder to examine cover work before genera<ng watermark Input m embedder key encoder w a Original cover work + c o Can even subtract cover work completely c w n + Noise c wn detector decoder key [Shannon, 58]: communica<on with side informa<on at the transmiuer Can modify embedder to guarantee 100% embedding efficiency (losing in fidelity) Output m n 14

Fixed correla8on embedding c = c o +α 2m 1 ( ) w r, Percentage of images α = τ goal z lc ( c o, w r ) z lc ( w r, w r ) m = 0 m =1 No WM Detec<on value (linear correla<on) 15

Geometric models of watermarking Media space: mul<-dimensional space of all works Region of acceptable fidelity D( c, c o ) < δ c o Detec<on region z lc ( c, w r ) > τ Fidelity distance func<on extremely difficult to formalize; depends on human percep<on. Simplest case - MSE: D MSE ( c 1, c 2 ) = 1 N = 1 N x,y c 1 c 2 2 ( c 1 [x, y] c 2 [x, y] ) 2 Successfully watermarked versions of c o Some perceptual distance func<ons are asymmetric, result in units of JND just no<ceable difference w r True distribu<on of works in media space is usually unknown, effect of transmission / noise / auacks also unknown J (but modeled anyway) 16

Effects of blind and fixed-correla8on embedding Blind embedder w r Fixed-lc embedder w r AlternaDve common detecdon region definidons: Normalized correla<on: angle in N-dim space z nc ( c, w r ) = c w r c w r = cos ( c, w r ) Correla<on coefficient: subtract mean first (= angle between N-1-dimensional projec<ons) z cc ( c, w r ) = z nc ( c c, w r w r ) Plane x = x[i] = 0 w r w r c Detec<on region Projec<ons on plane Detec<on region 17

Marking space (cf. transform-based compression) Media space (e.g. pixel values) not always convenient for watermarking Extractor transforms work to more convenient representa<on (e.g. frequencies, wavelets) Unlike image compression, not looking for more compact representa<on Generic watermark embedder Original cover work Vector in media space extractor Message Simple WM Vector in embedded marking space Inverse extractor ed work Vector in media space Generic watermark detector Input work Vector in extractor media space Simple WM Vector in detector marking space Detected 18