Coding for Efficiency

Similar documents
Module 3 Greedy Strategy

Wednesday, February 1, 2017

Monday, February 2, Is assigned today. Answers due by noon on Monday, February 9, 2015.

Module 3 Greedy Strategy

Lecture5: Lossless Compression Techniques

A Brief Introduction to Information Theory and Lossless Coding

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

Information Theory and Huffman Coding

HUFFMAN CODING. Catherine Bénéteau and Patrick J. Van Fleet. SACNAS 2009 Mini Course. University of South Florida and University of St.

Comm. 502: Communication Theory. Lecture 6. - Introduction to Source Coding

Entropy, Coding and Data Compression

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

Communication Theory II

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

Introduction to Source Coding

Information Theory and Communication Optimal Codes

CHAPTER 5 PAPR REDUCTION USING HUFFMAN AND ADAPTIVE HUFFMAN CODES

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

AN INTRODUCTION TO ERROR CORRECTING CODES Part 2

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

Solutions to Assignment-2 MOOC-Information Theory

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

Run-Length Based Huffman Coding

Lossless Image Compression Techniques Comparative Study

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, April 28, Midterm Solutions (Prepared by TA Shouvik Ganguly)

Lab/Project Error Control Coding using LDPC Codes and HARQ

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

Comparison of Data Compression in Text Using Huffman, Shannon-Fano, Run Length Encoding, and Tunstall Method

6.02 Introduction to EECS II Spring Quiz 1

Chapter 6: Memory: Information and Secret Codes. CS105: Great Insights in Computer Science

Maximum Likelihood Sequence Detection (MLSD) and the utilization of the Viterbi Algorithm

Error Correction with Hamming Codes

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

The study of probability is concerned with the likelihood of events occurring. Many situations can be analyzed using a simplified model of probability

Greedy Algorithms. Kleinberg and Tardos, Chapter 4

Search then involves moving from state-to-state in the problem space to find a goal (or to terminate without finding a goal).

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003

Entropy Coding. Outline. Entropy. Definitions. log. A = {a, b, c, d, e}

Chapter 3 Convolutional Codes and Trellis Coded Modulation

Slides credited from Hsueh-I Lu, Hsu-Chun Hsiao, & Michael Tsai

Christopher Stephenson Morse Code Decoder Project 2 nd Nov 2007

Hamming Codes and Decoding Methods

6. FUNDAMENTALS OF CHANNEL CODER

A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction

Introduction to Coding Theory

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007

Dijkstra s Algorithm (5/9/2013)

Discrete Structures for Computer Science

Design of Parallel Algorithms. Communication Algorithms

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Error-Correcting Codes

Activity. Image Representation

Introduction to Error Control Coding

Outline. Communications Engineering 1

Hamming Codes as Error-Reducing Codes

B. Tech. (SEM. VI) EXAMINATION, (2) All question early equal make. (3) In ease of numerical problems assume data wherever not provided.

Problem Sheet 1 Probability, random processes, and noise

Digital Communication Systems ECS 452

GENERIC CODE DESIGN ALGORITHMS FOR REVERSIBLE VARIABLE-LENGTH CODES FROM THE HUFFMAN CODE

Chapter 3 Data and Signals

1 Permutations. Example 1. Lecture #2 Sept 26, Chris Piech CS 109 Combinatorics

Digital Communication Systems ECS 452

Tarek M. Sobh and Tarek Alameldin

Lectures: Feb 27 + Mar 1 + Mar 3, 2017

2004 Denison Spring Programming Contest 1

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Digital Television Lecture 5

Chapter 10 Error Detection and Correction 10.1

Huffman Coding For Digital Photography

HY448 Sample Problems

The Theory Behind the z/architecture Sort Assist Instructions

Chapter 8. Representing Multimedia Digitally

3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007

Exercise 2: Current in a Series Resistive Circuit

6.450: Principles of Digital Communication 1

From Shared Memory to Message Passing

Mathematics Explorers Club Fall 2012 Number Theory and Cryptography

ARTIFICIAL INTELLIGENCE (CS 370D)

Exercise 3: Ohm s Law Circuit Voltage

Let start by revisiting the standard (recursive) version of the Hanoi towers problem. Figure 1: Initial position of the Hanoi towers.

Diversity Techniques

16.2 DIGITAL-TO-ANALOG CONVERSION

ECE Advanced Communication Theory, Spring 2007 Midterm Exam Monday, April 23rd, 6:00-9:00pm, ELAB 325

Lecture 9b Convolutional Coding/Decoding and Trellis Code modulation

What You ll Learn Today

Error Detection and Correction

Permutations. Example 1. Lecture Notes #2 June 28, Will Monroe CS 109 Combinatorics

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

2. TELECOMMUNICATIONS BASICS

Lossless Grayscale Image Compression using Blockwise Entropy Shannon (LBES)

d[m] = [m]+ 1 2 [m 2]

Remember that represents the set of all permutations of {1, 2,... n}

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued

Frequency-Hopped Spread-Spectrum

On the Capacity Regions of Two-Way Diamond. Channels

Common Mistakes. Quick sort. Only choosing one pivot per iteration. At each iteration, one pivot per sublist should be chosen.

Transcription:

Let s suppose that, over some channel, we want to transmit text containing only 4 symbols, a, b, c, and d. Further, let s suppose they have a probability of occurrence in any block of text we send as follows a =.4 b =.3 c =.175 d =.125 We will also assume that these are encoded in binary using a fixed-length code similar to ASCII, although in this case 4 symbols requires only 2 bits for encoding any symbol to be sent across the channel. For a block of text containing n characters total, the total length of the message is (from [C1]).4 x n x 2 +.3 x n x 2 +.175 x n x 2 +.125 x n x 2 = 2n Clearly, this is independent of the letter frequencies. This is because no use was made of the frequency characteristics of the symbols in the encoding. It should be clear that if the more frequently occurring symbols were transmitted using shorter binary strings, the total length of the message would be shorter. Suppose that we encoded the symbols a, b, c, and d with non-ascii codes such that a is encoded using 1 binary digits b is encoded using 1 binary digits c is encoded using 2 binary digits and d is encoded using 2 binary digits For instance, a = 0, b = 1, c = 10, d = 11 Now the total message length is.4 x n x 1 +.3 x n x 1 +.175x n x 2 +.125 x n x 2 = 1.3n for a performance improvement of.7n/2n = 35% in information rate. A common code using this philosophy is the well-known Morse code. An examination of this code shows that letters such as E, T, and A, are encoded using much fewer dots and dashes then letters such as Q, J, and Z. (See Figure) NTC 12/6/04 144

A H O V B I P W C J Q X D K R Y E L S Z F M T G N U International Morse Code One measure of the efficiency of a code is the Average Symbol Length (ASL). That is, on the average, how many binary bits does it take to encode a single symbol. For the example above the result was 1.3 binary bits/symbol (recall that the total number of bits was 1.3n for n symbols in the message.) The general formula for this calculation is given by the well-known formula for a weighted average: ASL = f i l i [C1] where the fi are the frequency ratios of each symbol and the li are the binary bit lengths of the corresponding symbols Huffman code The Huffman algorithm is a procedure which produces optimum codes of the sort we have been describing, having the number of digits used to encode a symbol be inversely proportional to the frequency of occurrence of that symbol. This is a tree encoding scheme, and can be implemented as follows: 1. Order all of the symbols in decreasing order of frequency of occurrence. If more than one letter have the same frequency, their order with respect to each other is irrelevant. 2. Draw a tree, starting with the leaves as follows: a. The two symbols with the smallest frequencies are written next to each other, and joined to a node drawn above them with straight lines. The sum of the two frequencies is written next to this node. NTC 12/6/04 145

b. The new node created above is added to the set of frequencies and the two symbols used so far are removed. 3. Repeat step 2, including the frequency for the new node and ignoring the frequencies of the two symbols used to create it. Note that the frequency for this node might be greater than the frequencies of the next to lowest-frequency symbols, and would not then be included in the next branch of the tree. 4. Repeat step 3, until all symbols have been incorporated into the tree, and there is only one node at the top of the tree. 5. Label the branches of the tree by labeling each of the two branches leaving a node, one with a 1 and the other with a 0. It is common to label all branches descending to the left with a 0 and those descending to the right with a 1, but this is not necessary. 6. Each symbol is now assigned a code by tracing along the branches from the symbol s leaf to the topmost node and writing the 0's and 1's in order, from right to left as you travel up the tree. Equivalently, trace a path from the top single node of the tree to the symbol of interest, writing the 0 s an 1's in order from left to right. Example. Consider 5 symbols with frequencies as follows a =.3 b =.25 c =.15 d =.15 e =.15 1. The symbols are already ordered correctly 2. d and e are the first two leaves (although we could have chosen c and d or c and e), creating a new node (let s call it x) with frequency.3. We now have a new frequency table a =.3 x =.3 b =.25 c =.15 3. Repeating step 2 creates a new node (y) from b and c with a frequency of.4, resulting in the following table NTC 12/6/04 146

y =.4 a =.3 x =.3 4. Continuing in this way a and x combine to form w with a frequency of.6 giving w =.6 y =.4 Combining w and y gives us our last node. The tree looks as follows The coding of the five symbols, scanning up the tree and writing the binary digits from right to left, is a = 01 b = 10 c = 11 d = 000 e = 001 Equivalently, start at the top node of the tree and trace a path to each leaf (symbol), writing down the bits from left to right as you go. NTC 12/6/04 147

Any code generated by the Huffman algorithm has the following properties 1. It has the Prefix Property. A code has the prefix property if no codeword is the prefix of any other codeword. A code having this property is called instantly decodable, or an instantaneous code, since we do not have to wait the maximum codeword length to decide which symbol was transmitted. Suppose, in the code just generated, a was just encoded as 0 and b was encoded as 1. This would give us a more efficient code in terms of message length, but would take longer to decode, since we need to wait for enough of the message to be received to be able to decide if 0 was an a, or just the beginning of d or e. Furthermore, without additional cues, it might be difficult to tell whether 000 was d or three a s. 2. It is not unique. Since we have certain discretion regarding which symbols to combine at each stage of the process, and how to label the branches, there are a number of equivalently efficient codes that could be generated by this process. 3. It gives the most efficient code possible within the constraints of property 1. It has the property previously discussed regarding the relationship between frequencies of the symbols and the number of bits in their encoding. Practice Problems - Huffman Codes 1. Each of the following is a group of codewords for a certain code (four different codes). Which of them have the prefix property? Indicate yes or no as appropriate. a. 01, 101, 000, 111 b. 00, 011, 0111, 001 c. 0, 01, 001 d. 1, 01, 001 NTC 12/6/04 148

Practice Problems - Huffman Codes (con t) 2. Construct a binary Huffman code for the following set of symbols and their frequencies. Show the tree, labelling the branches with 1's and 0's as appropriate, label every node its frequency, and label all the leaves with the symbol. Write the codeword for each symbol in the space provided. Symbol Frequency Code a.25 b.22 c.12 d.12 e.10 f.09 g.07 h.03 3. What is the average codeword length for the code generated in part 3? 4. What is the theoretical minimum average codeword length for the symbol frequencies in problem 3 assuming the code does not need to have the prefix property? 5. What would be the minimum codeword length if the symbols in problem 3 were encoded using a fixed length code? NTC 12/6/04 149