Topic 0: Huffmn Coding The uthor should gze t Noh, nd... lern, s they did in the Ark, to crowd gret del of mtter into very smll compss. Sydney Smith, dinburgh Review
Agend ncoding Compression Huffmn Coding
ncoding UT CS 5 3 67 3 0000 00000 0000000 00000 0000 wht is file? open bitmp in text editor 3
ASCII - UNICOD
Text File 5
Text File??? 6
Bitmp File 7
Bitmp File????
JPG File 9
JPG VS BITMAP JPG File 0
ncoding Schemes "It's ll s nd 0s" Wht do the s nd 0s men? 50 09 ASCII -> ym Red Green Blue-> drk tel?
Altering files Tower bit mp. Alter the first 300 chrcters of line 67 to ll 0 s 000000000000000000000000000000000000000 000000000000000000000000000000000000000 000000000000000000000000000000000000000 000000000000000000000000000000000000000 000000000000000000000000000000000000000 000000000000000000000000000000000000000 000000000000000000000000000000000000000 000000000000000000000000000
Agend ncoding Compression Huffmn Coding 3
Compression Compression: Storing the sme informtion but in form tht tkes less memory lossless nd lossy compression Recll:
Lossy Artifcts 5
Why Bother? Is compression relly necessry? Terbytes 500 HD, hour movies or 500,000 songs Price? About $00.00 6
Little Pipes nd Big Pumps Home Internet Access 0 Mbps roughly $0 per month months * 3 yers * $0 = $,0 0,000,000 bits /second = 5.0 * 0 6 bytes / sec CPU Cpbility $,500 for lptop or desktop Intel i7 processor Assume it lsts 3 yers. Memory bndwidth 5.6 GB / sec =.6 * 0 0 bytes / sec on the order of 5.0 * 0 0 instructions / second 7
Mobile Devices? Cellulr Network Your milege my vry Meg bits per second AT&T 7 mbps downlod, 7 mbps uplod T-Mobile & Verizon mbps downlod, 7 mbps uplod 7,000,000 bits per second =.5 x 0 6 bytes per second http://tinyurl.com/q6o7wn iphone CPU Apple A6 System on Chip Coy bout IPS cores Rough estimtes: x 0 0 instructions per second
Little Pipes nd Big Pumps CPU Dt In From Network 9
Compression - Why Bother? Apostolos "Toli" Lerios Fcebook ngineer Heds imge storge group jpeg imges lredy compressed look for wys to compress even more % less spce = millions of dollrs in svings 0
Agend ncoding Compression Huffmn Coding
Purpose of Huffmn Coding Proposed by Dr. Dvid A. Huffmn A Method for the Construction of Minimum Redundncy Codes Written in 95 Applicble to mny forms of dt trnsmission Our exmple: text files still used in fx mchines, mp3 encoding, others
The Bsic Algorithm Huffmn coding is form of sttisticl coding Not ll chrcters occur with the sme frequency! Yet in ASCII ll chrcters re llocted the sme mount of spce chr = byte, be it e or x 3
The Bsic Algorithm Any svings in tiloring codes to frequency of chrcter? Code word lengths re no longer fixed like ASCII or Unicode Code word lengths vry nd will be shorter for the more frequently used chrcters
The Bsic Algorithm. Scn file to be compressed nd tlly occurrence of ll vlues.. Sort or prioritize vlues bsed on frequency in file. 3. Build Huffmn code tree bsed on prioritized vlues.. Perform trversl of tree to determine new codes for vlues. 5. Scn file gin to crete new file using the new Huffmn codes 5
Building Tree Scn the originl text Consider the following short text erie eyes seen ner lke. Count up the occurrences of ll chrcters in the text 6
Building Tree Scn the originl text erie eyes seen ner lke. Wht chrcters re present? e r i spce y s n r l k. 7
Building Tree Scn the originl text erie eyes seen ner lke. Wht is the frequency of ech chrcter in the text? Chr Freq. Chr Freq. Chr Freq. y k e s. r n i spce l
Building Tree Prioritize chrcters Crete binry tree nodes with vlue nd the frequency for ech vlue Plce nodes in priority queue The lower the frequency, the higher the priority in the queue 9
Building Tree The queue fter inserting ll nodes front bck i k l y. n r s sp e Null Pointers re not shown 30
Building Tree While priority queue contins two or more nodes Crete new node Dequeue node nd mke it left subtree Dequeue next node nd mke it right subtree Frequency of new node equls sum of frequency of left nd right children nqueue new node bck into queue 3
Building Tree i k l y. n r s sp e 3
Building Tree k l y. n r s sp e i 33
Building Tree k l y. n r s sp e i 3
Building Tree y. n r s sp e i k l 35
Building Tree y. n r s sp e i k l 36
Building Tree n r s sp e i k l y. 37
Building Tree n r s sp e i k l y. 3
Building Tree r s sp e i k l y. n 39
Building Tree r s sp e i k l y. n 0
Building Tree i k l y. sp n e r s
Building Tree i k l y. sp n r s e
Building Tree y. sp n r s e i k l 3
Building Tree sp e y. n r s i k l
Building Tree e n r s i k l 6 sp y. 5
Building Tree 6 e n r s i k l y. sp Wht is hppening to the chrcters with low frequency? 6
Building Tree 6 e i k l y. sp n r s 7
Building Tree 6 e i k l r. sp n r s
Building Tree e 0 n r s 6 i k l y. sp 9
Building Tree e 0 6 n r s i k l y. sp 50
Building Tree 0 6 6 i k l y. sp e n r s 5
Building Tree 0 6 i k l y. 6 sp e n r s 5
Building Tree 6 0 6 i k l y. 6 sp e n r s 53
Building Tree i k 0 l y. 6 6 sp e 6 n r s After enqueueing this node there is only one node left in priority queue. 5
Building Tree Dequeue the single node left in the queue. This tree contins the new code words for ech chrcter. Frequency of root node should equl number of chrcters in text. erie eyes seen ner lke. i 0 k l y. 6 6 sp e 6 n spces, 6 chrcters totl r s 55
ncoding the File Trverse Tree for Codes Perform trversl of the tree to obtin new code words left, ppend 0 to code word right ppend to code word 6 code word is only completed when lef node is reched 0 6 i k l y. 6 sp e n r s 56
ncoding the File Trverse Tree for Codes Chr Code 0000 i 000 k 000 l 00 y 000. 00 spce 0 e 0 00 n 0 r 0 s i 0 k l y. 6 6 sp e 6 n r s 57
ncoding the File Rescn text nd encode file using new code words erie eyes seen ner lke. 000000000000 0000000 0000000 00000000000 Chr Code 0000 i 000 k 000 l 00 y 000. 00 spce 0 e 0 00 n 0 r 0 s 5
ncoding the File Results Hve we mde things ny better? bits to encode the text ASCII would tke * 6 = 0 bits 000000000000 0000000 0000000 00000000000 If modified code used bits per chrcter re needed. Totl bits * 6 = 0. Svings not s gret. 59
Decoding the File How does receiver know wht the codes re? Tree constructed for ech text file. Considers frequency for ech file Big hit on compression, especilly for smller files Tree predetermined bsed on sttisticl nlysis of text files or file types 60
Decoding the File Once receiver hs tree it scns incoming bit strem 0 go left go right 000000000 0000000 A. elk ny sir B. eek snke C. eek kin sly D. eek snrl nil. eel snrl i 0 k l y. 6 6 sp e 6 n r s 6
Assignment Hints reding chunks not chrs heder formt the pseudo eof chrcter the GUI 6
Assignment xmple "erie eyes seen ner lke." will result in different codes thn those shown in slides due to: dding elements in order to PriorityQueue required pseudo eof chrcter (POF) 63
Assignment xmple Chr Freq. Chr Freq. Chr Freq. y k e s. r n POF i spce l 6
Assignment xmple. i k l y POF n r s SP e 65
Assignment xmple i k l y POF n r s SP e. 66
Assignment xmple i k l y POF n r s SP e. 67
Assignment xmple l y POF n r s SP e. i k 6
Assignment xmple POF n r s SP e. i k l y 69
Assignment xmple n r s 3 SP e. i k l y POF 70
Assignment xmple s. SP 3 i k l y POF n r e 7
Assignment xmple i SP 3 k l y POF n r s e. 7
n r s. i 7 3 k l y POF e SP 73
7 e 3 SP i k l y POF n r s. 7
e 7 n r s. i 3 k l y POF SP 75
6 7 e i 3 k l y POF SP n r s. 76
7 6 7 e i 3 k l y POF SP n r s. 77
Codes vlue: 3, equivlent chr:, frequency:, new code 0 vlue: 6, equivlent chr:., frequency:, new code 0 vlue: 69, equivlent chr:, frequency:, new code vlue: 97, equivlent chr:, frequency:, new code 00 vlue: 0, equivlent chr: e, frequency:, new code 0 vlue: 05, equivlent chr: i, frequency:, new code 0000 vlue: 07, equivlent chr: k, frequency:, new code 000 vlue: 0, equivlent chr: l, frequency:, new code 000 vlue: 0, equivlent chr: n, frequency:, new code 00 vlue:, equivlent chr: r, frequency:, new code 0 vlue: 5, equivlent chr: s, frequency:, new code 0 vlue:, equivlent chr: y, frequency:, new code 00 vlue: 56, equivlent chr:?, frequency:, new code 000 7