CSC30 Informaton Theory Sam Rowes Lecture 3: Provng the Kraft-McMllan Inequaltes September 8, 6 Revew: Our Approach The study of both compresson and transmsson requres that we abstract data and messages as sequences of symbols from a fnte alphabet (gnorng semantcs of content). Both problems nvolve two dstnct tasks: ) Modelng. We have to represent the stochastc behavour of the source and the channel usng probablstc models. ) Encodng/Decodng. Gven our source and channel models we want to algorthmcally desgn schemes for compresson and transmsson that have certan good propertes (correct/effcent/optmal). For now, we are assumng the models are known and are focusng on the codes. But later n the course we wll study adaptve/dctonary methods (e.g. Lempel-Zff,gzp,PPM) whch combne modelng and codng together. Revew: Course Content Lossless Data Compresson Shannon s Noseless Codng Theorem Lower lmt on lossless compresson s the source entropy. Algorthms: Huffman Codng, Arthmetc Codng Transmsson over Nosy Channels Shannon s Nosy Codng Theorem Upper lmt on error-free transmsson rate s channel capacty. Algorthms: Lnear Codes, Low Densty Party Check Codes (*)Lossy Compresson Shannon s Rate-Dstorton Theorem Algorthms: mp3,jpeg,mpeg Revew: Mathematcal Setup 3 A stochastc source emts a sequence of symbols (from alphabet A) X = X,X,...,X N wth probablty p(x). Our encoder (code) C converts ths nto an (btstrng) encodng Z. We assume (for now) that the decoder can see Z exactly (noseless channel), that we are requred to reconstruct X exactly (lossless compresson) and that we are usng a symbol code, (.e. we encode each symbol X ndependently and concatenate ther encodngs). We requre the code to be unquely decodable (UD), and we saw that for any UD code there s always an nstantaneously decodable (ID) code wth the same codeword lengths. These lengths must satsfy the Kraft-McMllan nequalty: l. We wll measure the qualty of our code by the average length (under p(x)) of the encodng Z, compared to the length of X.
Provng the Two Inequaltes 4 We can prove both Kraft s and McMllan s nequalty by provng that for any set of lengths, l,..., l I, for bnary codewords: A) If I = / l, we can construct an nstantaneous code wth codewords havng these lengths. B) If I = / l >, there s no unquely decodable code wth codewords havng these lengths. (A) s half of Kraft s nequalty. (B) s half of McMllan s nequalty. Usng the fact that nstantaneous codes are unquely decodable, (A) gves the other half of McMllan s nequalty, and (B) gves the other half of Kraft s nequalty. To do ths, we ll ntroduce a helpful way of thnkng about codes as...trees! Extendng the Tree to Maxmum Depth 6 We can extend the tree by fllng n the subtree underneath every actual codeword, down to the depth of the longest codeword. Each codeword then corresponds to ether a leaf or a subtree. Prevous tree extended, wth each codeword s leaf or subtree crcled: 0 Short codewords occupy more of the tree. For a bnary code, the fracton of leaves taken by a codeword of length l s / l. 0 0 0 0 0 0 Vsualzng Prefx-Free Codes as Trees 5 We can vew codewords of an nstantaneous (prefx-free) code as leaves of a tree. The root represents the null strng; each level corresponds to addng another code symbol. Here s the tree for a code wth codewords 0,,, 0: 0 0 0 Constructng Instantaneous Codes 7 Suppose that Kraft s Inequalty holds: I = l Order the lengths so l l I. Q: In the bnary tree wth depth l I, how can we allocate subtrees to codewords wth these lengths? A: We go from shortest to longest, =,...,I: ) Pck a node at depth l that sn t n a subtree prevously used, and let the code for codeword be the one at that node. ) Mark all nodes n the subtree headed by the node just pcked as beng used, and not avalable to be pcked later. Let s look at an example...
Buldng an Instantaneous Code (0) 8 Let the lengths of the codewords be {,,3,3}. Frst check: + + 3 + 3. Intalze the tree (level 0). Buldng an Instantaneous Code () 0 Let the lengths of the codewords be {,,3,3}. Pck (arbtrarly) an unmarked node at level to use for codeword of length ; mark the subtree below t. Buldng an Instantaneous Code () 9 Let the lengths of the codewords be {,,3,3}. Pck (arbtrarly) an unmarked node at level to use for codeword of length ; mark the subtree below t. Buldng an Instantaneous Code (3) Let the lengths of the codewords be {,,3,3}. Pck two unmarked nodes at level 3 as codewords of length 3. 0
Buldng an Instantaneous Code Let the lengths of the codewords be {,,3,3}. Our fnal code can be read from the leaf nodes: {,,,0}. 0 UD Codes Must Obey the Inequalty 4 Let l l I be the codeword lengths. Defne K = I = l. For any postve nteger n, we can sum over all possble combnatons of values for,..., n n {,...,I} to get K n. K n =,..., n l l n We rewrte ths n terms of possble values for j = l + + l n : K n nl I N j,n = j= j N j,n s the # of sequences of n codewords wth total length j. If the code s unquely decodable, N j,n j, so K n nl I, whch for bg enough n s possble only f K. Ths proves that any UD code must satsfy l. Constructon Wll Always Be Possble 3 Q: Wll there always be a node avalable n step () above? If Kraft s nequalty holds, we wll always be able to do ths. To begn, there are l b nodes at depth l b. When we pck a node at depth l a, the number of nodes that become unavalable at depth l b (assumed not less than l a ) s l b l a. When we need to pck a node at depth l j, after havng pcked earler nodes at depths l (wth < j and l l j ), the number of nodes left to pck from wll be an nteger equal to l j j l j l = l j = j = l > 0 j Snce / l < I / l, by assumpton. = = Ths proves we can always construct an ID code f l. Tradeoffs Choosng Codeword Lengths 5 The Kraft-McMllan nequaltes mply that to make some codewords shorter, we wll have to make others longer. Example: The obvous bnary encodng for eght symbols uses codewords that are all three bts long. Ths code s nstantaneous, and satsfes the Kraft nequalty, snce: 3 + 3 + 3 + 3 + 3 + 3 + 3 + 3 = Suppose we want to encode the frst symbol usng only two bts. We ll have to make some other codewords longer eg, we can encode two of the other symbols n four bts, and the remanng fve symbols n three bts, snce + 4 + 4 + 3 + 3 + 3 + 3 + 3 = How should we choose among the possble codes?
Formalzng Whch Codes are the Best: Probabltes for Source Symbols 6 We d lke to choose a code that uses short codewords for common symbols and long ones for rare symbols. To formalze ths, we need to assgn each symbol n the source alphabet a probablty. Symbols a,...,a I wll have probabltes wrtten as p,...,p I. We assume that these probabltes don t change wth tme. We also assume that symbols n the source sequence, X, X,..., X N, are ndependent: P(X = a, X = a,..., X n = a N ) = P(X n = a n ) = p p p N n These assumptons are really too restrctve n practce, but we ll gnore that for now. Optmal Codes 8 We say a code s optmal for a gven source (wth gven symbol probabltes) f ts average length s at least as small as that of any other code. (There can be many optmal codes for the same source, all wth the same average length.) The Kraft-McMllan nequaltes mply that f there s an optmal code, there s also an optmal nstantaneous code. More generally, for any unquely decodable code wth average length L, there s an nstantaneous code wth the same average length. Questons: Can we fgure out the codeword lengths of an optmal code startng from the symbol probabltes?.e. can we solve: mn p l subject to l {l } Can we fnd such an optmal code, and use t n practce? Answers: next class! Expected Codeword Length 7 Consder a code whose codewords for symbols a,...,a I have lengths l,...,l I. Let the probabltes of these symbols be p,...,p I. We defne the expected codeword length for ths code to be I L = L(C,X) = p l = Ths s the average length of the codeword encodng a sngle source symbol. But snce averagng s a lnear operaton, the average length of a coded message wth N source symbols s just NL. We am to choose a code for whch L s small. Bascally, we want to assgn short codeword lengths to the more probable symbols but we also need to satsfy the KM nequalty so we wll be forced to assgn longer lengths to the less probable symbols.