Modified from Maria s lectures CPSC 335 Itermediate Iformatio Structures LECTURE 11 Compressio ad Huffma Codig Jo Roke Computer Sciece Uiversity of Calgary Caada
Lecture Overview Codes ad Optimal Codes Huffma Codig No-determiism of the algorithm Implemetatios: Sigly-liked List Doubly-liked list Recursive top-dow Usig heap Adaptive Huffma codig
A 000 C 010 E 100 G 110 B 001 D 011 F 101 H 111 With this code, the message BACADAEAFABBAAAGAH is ecoded as the strig of 54 bits CODES 001000010000011000100000101000001001000000000110000111 It is sometimes advatageous to use variable-legth codes, i which differet symbols may be represeted by differet umbers of bits. For example, Morse code does ot use the same umber of dots ad dashes for each letter of the alphabet. I particular, E, the most frequet letter, is represeted by a sigle dot. I geeral, if our messages are such that some symbols appear very frequetly ad some very rarely, we ca ecode data more efficietly (i.e., usig fewer bits per message) if we assig shorter codes to the frequet symbols. Cosider the followig alterative code for the letters A through H: A 0 C 1010 E 1100 G 1110 B 100 D 1011 F 1101 H 1111 With this code, the same message as above is ecoded as the strig 100010100101101100011010100100000111001111 This strig cotais 42 bits, so it saves more tha 20% i space i compariso with the fixed-legth code show above. 3
Optimal codes 4
Optimal codes 5
Huffma Codig Algorithm is used to assig a codeword to each character i the text accordig to their frequecies. The codeword is usually represeted as a bitstrig. Algorithm starts with the set of idividual trees, cosistig of a sigle ode, sorted i the order of icreasig character probabilities. The two trees with the smallest probabilities are selected ad processed so that they become the left ad the right sub-tree of the paret ode, combiig their probabilities. I the ed, 0 are assiged to all left braches of the tree, 1 to all right braches, ad the codewods for all leaves (characters) of the tree are geerated.
6 pages copied from Corme et al.
Huffma tree buildig exercise
Char Freq Character cout i text. E T A 125 93 80 O 76 I 73 Huffma Code Costructio N S R H 71 65 61 55 L 41 D 40 C 31 U 27 14
Huffma Code Costructio Char E T A O I N S R H L D C U Freq 125 93 80 76 73 71 65 61 55 41 40 31 27 C U 15 31 27
Huffma Code Costructio Example from Uwisc Char E T A O I N S R H L D Freq 125 93 80 76 73 71 65 61 58 55 41 40 C U 31 27 58 C U 16 31 27
Huffma Code Costructio Char E T A O I N S R H Freq 125 93 81 80 76 73 71 65 61 58 55 L D 41 40 81 D L 40 41 58 C U 17 31 27
Huffma Code Costructio Char E T A O I N S R Freq 125 113 93 81 80 76 73 71 65 61 H 58 55 81 113 D L H 40 41 58 55 C U 18 31 27
Huffma Code Costructio Char E T A O I N Freq 126 125 113 93 81 80 76 73 71 S R 65 61 81 126 113 D L R S H 40 41 61 65 58 55 C U 19 31 27
Huffma Code Costructio Char E T A O I N Freq 144 126 125 113 93 81 80 76 73 71 81 126 144 113 D L R S N I H 40 41 61 65 71 73 58 55 C U 20 31 27
Huffma Code Costructio Char E T A O Freq 156 144 126 125 113 93 81 80 76 156 A O 80 76 81 126 144 113 D L R S N I H 40 41 61 65 71 73 58 55 C U 21 31 27
Huffma Code Costructio Char E T Freq 174 156 144 126 125 113 93 81 156 174 A O T 80 76 81 93 126 144 113 D L R S N I H 40 41 61 65 71 73 58 55 C U 22 31 27
Huffma Code Costructio Char E Freq 238 174 156 144 126 125 113 156 174 238 A O T E 80 76 81 93 126 144 125 113 D L R S N I H 40 41 61 65 71 73 58 55 C U 23 31 27
Huffma Code Costructio Char Freq 270 238 174 156 144 126 156 174 270 238 A O T E 80 76 81 93 126 144 125 113 D L R S N I H 40 41 61 65 71 73 58 55 C U 24 31 27
Huffma Code Costructio Char Freq 330 270 238 174 156 330 156 174 270 238 A O T E 80 76 81 93 126 144 125 113 D L R S N I H 40 41 61 65 71 73 58 55 C U 25 31 27
Huffma Code Costructio Char Freq 508 330 270 238 330 508 156 174 270 238 A O T E 80 76 81 93 126 144 125 113 D L R S N I H 40 41 61 65 71 73 58 55 C U 26 31 27
Char Freq 838 Huffma Code Costructio 508 330 838 330 508 156 174 270 238 A O T E 80 76 81 93 126 144 125 113 55 D L R S N I H 40 41 61 65 71 73 58 27 C U
28 Huffma Code Costructio R S N I E H C U 0 0 T D L 1 0 0 A O 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 0 0 1 125 Freq 93 80 76 73 71 61 55 41 40 E Char T A O I N R H L D 31 27 C U 65 S 0000 Fixed 0001 0010 0011 0100 0101 0111 1000 1001 1010 1011 1100 0110 110 Huff 011 000 001 1011 1010 1000 1111 0101 0100 11100 11101 1001 838 Total 4.00 3.62
No-determiism of the Huffma Codig
No-determiism of the Huffma Codig
For aother example, let s ecode a excerpt from Michael Jackso s sog Bad2. Because I m bad, I m bad-- come o Bad, bad-- really, really bad You kow I m bad, I m bad-- you kow it Bad, bad-- really, really bad You kow I m bad, I m bad-- come o, you kow Bad, bad-- really, really bad Thaks to Jeff Boyd who poited me to the paper PVRG-MPEG CODEC 1.1 by Ady C. Hug from which 4 slides have bee take. 31
The frequecy of words i the sog Bad. 32
The Huffma tree for the lyrics to Bad 33
The Huffma codes for the words i Bad. 34
Implemetatio Liked List Implemetatio depeds o the ways to represet the priority queue, which requires removig two smallest probabilities ad isertig the ew probability i the proper positios. The first way to implemet the priority queue is the sigly liked list of refereces to trees, which resembles the algorithm preseted i the previous slides. The tree with the smallest probability is replaced by the ewly created tree. From the trees with the same probability, the first trees ecoutered are chose.
Doubly Liked List All probability odes are first ordered, the first two trees are always removed. The ew tree is iserted at the ed of the list i the sorted order. A doubly-liked list of refereces to trees with immediate access to the begiig ad to the ed of this list is used.
Doubly Liked-List implemetatio
Recursive Implemetatio Top-dow approach for buildig a tree startig from the highest probability. The root probability is kow if lower probabilities, i the root s childre, have bee determied, the latter are kow if the lower probabilities have bee computed etc. Thus, the recursive algorithm ca be used.
HEAP A biary tree has the heap property iff it is empty or the key i the root is larger tha that i either child ad both subtrees have the heap property. Complete if all the leaves are o the same level or two adjacet oes ad all odes at the lowest level are as far to the left as possible. 39
81 80
If we umber the odes from 1 at the root ad place: --the left child of ode k at positio 2k --the right child of ode k at positio 2k+1 The the 'fill from the left' ature of the complete tree esures that the heap ca be stored i cosecutive locatios i a array.
INSERT ito HEAP
Implemetatio usig Heap The mi-heap of probabilities is built. The highest probability is put i the root. Next, the heap property is restored The smallest probability is removed ad the root probability is set to the sum of two smallest probabilities. The processig is complete whe there is oly oe ode i the heap left.
Huffma implemetatio with a heap
Huffma Codig for pairs of characters
Adaptive Huffma Codig Devised by Robert Gallager ad improved by Doald Kuth. Algorithm is based o the siblig property: if each ode has a siblig, ad the breadth-first right-to-left tree traversal geerates a list of odes with oicreasig frequecy couters, it is a Huffma tree. I adaptive Huffma codig, the tree icludes a couter for each symbol updated every time correspodig symbol is beig coded. Checkig whether the siblig property holds esures that the tree uder costructio is a Huffma tree. If the siblig property is violated, the tree is restored.
Adaptive Huffma Codig
Adaptive Huffma Codig
Sources Web liks: l MP3 Coverter: http://www.mp3-overter.com/mp3codec/ huffma_codig.htm l Practical Huffma Codig: http://www.compresscosult.com/ huffma/ Drozdek Textbook - Chapter 11
Shao-Fao I the field of data compressio, Shao Fao codig, amed after Claude Shao ad Robert Fao, is a techique for costructig a prefix code based o a set of symbols ad their probabilities (estimated or measured). It is suboptimal i the sese that it does ot achieve the lowest possible expected code word legth like Huffma codig; however ulike Huffma codig, it does guaratee that all code word legths are withi oe bit of their theoretical ideal etropy.
Shao-Fao Codig For a give list of symbols, develop a correspodig list of probabilities or frequecy couts so that each symbol s relative frequecy of occurrece is kow. Sort the lists of symbols accordig to frequecy, with the most frequetly occurrig symbols at the left ad the least commo at the right. Divide the list ito two parts, with the total frequecy couts of the left part beig as close to the total of the right as possible. The left part of the list is assiged the biary digit 0, ad the right part is assiged the digit 1. This meas that the codes for the symbols i the first part will all start with 0, ad the codes i the secod part will all start with 1. Recursively apply the steps 3 ad 4 to each of the two halves, subdividig groups ad addig bits to the codes util each symbol has become a correspodig code leaf o the tree.
Shao-Fao example
Refereces Shao, C.E. (July 1948). "A Mathematical Theory of Commuicatio". Bell System Techical Joural 27: 379 423. http://cm.bell-labs.com/cm/ms/what/shaoday/ shao1948.pdf. Fao, R.M. (1949). "The trasmissio of iformatio". Techical Report No. 65 (Cambridge (Mass.), USA: Research Laboratory of Electroics at MIT). Shao-Fao