Today s Topics Iformatio Theory Mohamed Hamada oftware gieerig ab The Uiversity of Aizu mail: hamada@u-aizu.ac.jp UR: http://www.u-aizu.ac.jp/~hamada tropy review tropy ad Data Compressio Uiquely decodable codes Prefix Code Average Code egth hao s First Theorem Kraft-McMilla Iequality Code fficiecy Code xtesio tropy H() tropy is the average iformatio cotet of a source =Is H K- H = plog = p Coditioal tropy H(Y X) Is the amout of iformatio cotaied i Y such that X is give H(Y X) = = Σ j P(X=v j ) H(Y X = v j ) 3 Joit tropy Is the amout of iformatio cotaied i both evets X ad Y Chai Rule Chai Rule Relatioship betwee coditioal ad joit etropy H(X, Y) = -Σ p(x,y) log p(x,y) X,Y H(X,Y) = H(X) + H(Y X) 5 6
tropy, Codig ad Data Compressio Data vs. Iformatio yes, ot, yes, yes, ot ot I ACII, each item is 3 8 = bits of data But if the oly possible aswers are yes ad ot, there is oly oe bit of iformatio per item 7 8 Compressio = queezig out the Air uppose you wat to ship pillows i boxes ad are charged by the size of the box To use as few boxes as possible, squeeze out all the air, pac ito boxes, fluff them up at the other ed ossless data compressio = pillows are perfectly restored ossy data compressio = some damage to the pillows is OK (MP3 is a lossy compressio stadard for music) oss may be OK if it is below huma perceptual threshold tropy is a measure of limit of lossless compressio A B. C D. F G H I J K. M.- - -.-. -.....-. --.....--- -.-.-.. -- O P Fixed legth code Q : Morse Code R T.9 U V. X Y Z -. ---.--. --.-.-. -..- -.-- -..- -.-- --.. 9 : Telegraphy glish letters -> Morse Code eder: from Hoaido ar.--.-.-..--.-.-. Codig Messages with Fixed egth Codes : symbols, A, B, C, D A=, B=, C=, D= I geeral, with symbols, codes eed to be of legth lg, rouded up For glish text, 6 letters + space = 7 symbols, legth = 5 sice < 7 < 5 (replace all puctuatio mars by space).--.-.-. ar Receiver: i Toyo
Uiquely decodable codes If ay ecoded strig has oly oe possible source strig producig it the we have uique decodablity of uiquely decodable code is the prefix code Prefix Codig (Istataeous code) A prefix code is defied as a code i which o codeword is the prefix of some other code word. A prefix code is uiquely decodable. Prefix Code Code A Code B Code C s s s s 3 3 Uiquely Decodable Codes Decodig of a Prefix Code Decisio Tree for Code B Prefix Codes Oly oe way to decode left to right whe message received Code B s s s s 3 Iitial tate s s s s 3 Probability Code Received message:.7... : Decode Aswer : s s 3 s s s 5 6 Prefix Codes Average Code egth Code A B C D I COD A PRFIX COD? O HY? Code of D is a prefix to code of C 7 Iformatio coder has K symbols ach symbol has probability p ach symbol is represeted by a codeword of legth bits Average codeword legth K pl 8 3
A B. C D. F G H I J K. M.- - -.-. -.....-. --.....--- -.-.-.. -- O P Q : Morse Code R T.9 U V. X Y Z -. ---.--. --.-.-. -..- -.-- -..- -.-- --.. Average codeword legth K p l = * +. * +. + * + * 9 hao s First Theorem: The Codig Theorem H The outputs of a iformatio source caot be represeted by a source code whose average legth is less tha the source etropy Average Code egth tropy s Average bits per symbol: =.7 +. 3+. 3+. 3 =.6 bits/symbol (dow from ) Aother prefix code that is better =.7 +. +. 3+. 3 =.5.7....7... Robot -way radom wal prob( x ), prob( x ) prob( x ) prob( x ) 8 H ( X ) ( log log log log ). 75bps 8 8 8 8 Robot symbo tropy s p.5.5.5.5 Prefix Codes fixed-legth codeword variable-legth codeword tropy, Compressibility, Redudacy ower etropy <=> More redudat <=> More compressible Higher etropy <=> ess redudat <=> ess compressible symbol stream : fixed legth: variable legth: 3bits 8bits bits savigs achieved by VC (redudacy elimiated) 3
tropy ad Compressio First-order etropy is theoretical miimum o code legth whe oly frequecies are tae ito accout =.7 +. +. 3+. 3 =.5 First-order tropy =.353.7... First-order tropy of glish is about bits/character based o typical glish texts Bits You are watchig a set of idepedet radom samples of X You see that X has four possible values P(X=A) = / P(X=B) = / P(X=C) = / P(X=D) = / o you might see output: BAACBADCDADDDA You trasmit data over a biary serial li. You ca ecode each readig with two bits (e.g. A =, B =, C =, D = ) bits o average per symbol 5 6 Fewer Bits omeoe tells you that the probabilities are ot equal P(X=A) = / P(X=B) = / P(X=C) = /8 P(X=D) = /8 Is it possible to ivet a codig for your trasmissio that oly uses.75 bits o average per symbol. How? 7 It s possible Fewer Bits P(X=A) = / P(X=B) = / P(X=C) = /8 P(X=D) = /8 to ivet a codig for your trasmissio that oly uses.75 bits o average per symbol. A B C D (This is just oe of several ways) 8 Fewer Bits uppose there are three equally liely values P(X=A) = /3 P(X=B) = /3 P(X=C) = /3 Here s a aïve codig, costig bits per symbol A B C Ca you thi of a codig that would eed oly.6 bits per symbol o average? I theory, it ca i fact be doe with.5896 bits per symbol. 9 Kraft-McMilla Iequality K l If codeword legths of a code satisfy the Kraft McMilla s iequality, the a prefix code with these codeword legths ca be costructed. For code D - + - + -3 + -= 9/8 This meas that Code D I OT A PRFIX COD Codewor d C Code D egth s s s 3 s 3 3 5
Use of Kraft-McMilla Iequality e may use it if the umber of symbols are large such that we caot simply by ispectio judge whether a give code is a prefix code or ot HAT Kraft-McMilla Iequality Ca Do: It ca determie that a give code I OT A PRFIX COD It ca idetify that a prefix code could be costructed from a set of codeword legths HAT Kraft-McMilla Iequality Caot Do: It caot guaratee that a give code is ideed a prefix code 3 ymbo l Codewor d Code egth s s 3 s 3 s 3 For code - + - + -3 + -3= ad hece satisfy Kraft-Mcmilla iequality I COD A PRFIX COD? O HY? s 3 is a prefix to s 3 Code fficiecy η A efficiet code meas η H 33 Probability p s Code I egth Code II egth s / s / s /8 3 s 3 /8 3 tropy H() =/log ()+/log ()+ /8log (8)+/log (8) = ¾ bits/symbol Code I Code II 7 3 3 8 8 8 8 7 7 875. 7 3 For a Prefix Code hao s First Theorem H p if p for some H H l hat is the fficiecy η? η= l if η< However, we may icrease efficiecy by extedig the source 35 Icreasig fficiecy by xtesio By extedig the source we may potetially icrease efficiecy The drawbac is Icreased decodig complexity H H H H H H H whe 36 6
xtesio of a Discrete Memoryless Treats Blocs of successive symbols Iformatio s,s,...,s K Pr s p,,,...,k K- p xteded Iformatio,,..., K Pr q,i,,...,k i i K - i p i 37 ={s,s,s }, p =/, p =/, p =/ H()=(/)log ()+ (/)log ()+ (/)log () H()=3/ bits s of σ σ σ σ 3 σ σ 5 σ 6 σ 7 σ 8 equece of s from ecod-order xteded s s s s s s s s s s s s s s s s s s P{σ i }, i=,,,8 /6 /6 /8 /6 /6 /8 /8 /8 / By Computig: H( )=3 bits 38 ummery codig fficiet represetatio of iformatio sources Codig Requiremets Uiquely Decodable Codes Prefix Codes o codeword is a prefix to some other code word Code fficiecy Kraft s Iequality K l Codig Theorem H H H 39 d 7