Information Theory and Communication Optimal Codes

Similar documents
Introduction to Source Coding

Lecture5: Lossless Compression Techniques

Solutions to Assignment-2 MOOC-Information Theory

Communication Theory II

Module 3 Greedy Strategy

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

Module 3 Greedy Strategy

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

A Brief Introduction to Information Theory and Lossless Coding

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

Entropy, Coding and Data Compression

Information Theory and Huffman Coding

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

Communication Theory II

Coding for Efficiency

The Lempel-Ziv (LZ) lossless compression algorithm was developed by Jacob Ziv (AT&T Bell Labs / Technion Israel) and Abraham Lempel (IBM) in 1978;

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, April 28, Midterm Solutions (Prepared by TA Shouvik Ganguly)

Computing and Communications 2. Information Theory -Channel Capacity

GENERIC CODE DESIGN ALGORITHMS FOR REVERSIBLE VARIABLE-LENGTH CODES FROM THE HUFFMAN CODE

Comm. 502: Communication Theory. Lecture 6. - Introduction to Source Coding

ECE Advanced Communication Theory, Spring 2007 Midterm Exam Monday, April 23rd, 6:00-9:00pm, ELAB 325

Greedy Algorithms. Kleinberg and Tardos, Chapter 4

Scheduling in omnidirectional relay wireless networks

Rab Nawaz. Prof. Zhang Wenyi

6.450: Principles of Digital Communication 1

Outline. Communications Engineering 1

DCSP-3: Minimal Length Coding. Jianfeng Feng

The ternary alphabet is used by alternate mark inversion modulation; successive ones in data are represented by alternating ±1.

code V(n,k) := words module

CHAPTER 5 PAPR REDUCTION USING HUFFMAN AND ADAPTIVE HUFFMAN CODES

SYLLABUS of the course BASIC ELECTRONICS AND DIGITAL SIGNAL PROCESSING. Master in Computer Science, University of Bolzano-Bozen, a.y.

ECEn 665: Antennas and Propagation for Wireless Communications 131. s(t) = A c [1 + αm(t)] cos (ω c t) (9.27)

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

SHANNON S source channel separation theorem states

Digital Communication Systems ECS 452

Lecture 1 Introduction

THE use of balanced codes is crucial for some information

Info theory and big data

The idea of similarity is through the Hamming

COPYRIGHTED MATERIAL. Introduction. 1.1 Communication Systems

Lecture Notes 3: Paging, K-Server and Metric Spaces

IMPERIAL COLLEGE of SCIENCE, TECHNOLOGY and MEDICINE, DEPARTMENT of ELECTRICAL and ELECTRONIC ENGINEERING.

Slides credited from Hsueh-I Lu, Hsu-Chun Hsiao, & Michael Tsai

COURSE MATERIAL Subject Name: Communication Theory UNIT V

Introduction to Coding Theory

Course Developer: Ranjan Bose, IIT Delhi

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Background Dirty Paper Coding Codeword Binning Code construction Remaining problems. Information Hiding. Phil Regalia

Run-Length Based Huffman Coding

Huffman Coding For Digital Photography

Digital Communications I: Modulation and Coding Course. Term Catharina Logothetis Lecture 12

Digital Television Lecture 5

COMP Online Algorithms. Paging and k-server Problem. Shahin Kamali. Lecture 11 - Oct. 11, 2018 University of Manitoba

5984 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER 2010

Monday, February 2, Is assigned today. Answers due by noon on Monday, February 9, 2015.

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

Wednesday, February 1, 2017

Synchronization using Insertion/Deletion Correcting Permutation Codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003

Coding Techniques and the Two-Access Channel

Speech Coding in the Frequency Domain

A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING

Multitree Decoding and Multitree-Aided LDPC Decoding

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

PROJECT 5: DESIGNING A VOICE MODEM. Instructor: Amir Asif

Lecture 9b Convolutional Coding/Decoding and Trellis Code modulation

Chapter 3 Convolutional Codes and Trellis Coded Modulation

Hamming net based Low Complexity Successive Cancellation Polar Decoder

Variable-Length. Error-Correcting Codes

S Coding Methods (5 cr) P. Prerequisites. Literature (1) Contents

Sets. Gazihan Alankuş (Based on original slides by Brahim Hnich et al.) August 6, Outline Sets Equality Subset Empty Set Cardinality Power Set

Entropy Coding. Outline. Entropy. Definitions. log. A = {a, b, c, d, e}

CSCI 2570 Introduction to Nanocomputing

Part A: Question & Answers UNIT I AMPLITUDE MODULATION

Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function

On the Capacity Region of the Vector Fading Broadcast Channel with no CSIT

Review: Our Approach 2. CSC310 Information Theory

CSE 573 Problem Set 1. Answers on 10/17/08

PROBABILITY AND STATISTICS Vol. II - Information Theory and Communication - Tibor Nemetz INFORMATION THEORY AND COMMUNICATION

Modulation and Coding Tradeoffs

Tarek M. Sobh and Tarek Alameldin

Symbol-by-Symbol MAP Decoding of Variable Length Codes

Hamming Codes as Error-Reducing Codes

On Optimum Communication Cost for Joint Compression and Dispersive Information Routing

B. Tech. (SEM. VI) EXAMINATION, (2) All question early equal make. (3) In ease of numerical problems assume data wherever not provided.

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

DEGRADED broadcast channels were first studied by

Block Markov Encoding & Decoding

Capacity-Approaching Bandwidth-Efficient Coded Modulation Schemes Based on Low-Density Parity-Check Codes

Connected Identifying Codes

EELE 6333: Wireless Commuications

Information Hiding: Steganography & Steganalysis

Transcription:

Information Theory and Communication Optimal Codes Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/1

Roadmap Examples and Types of Codes Kraft Inequality McMillan Inequality Entropy bound on data compression Shannon Code Huffman Code Wrong Code Stochastic and Stationary Processes c Ritwik Banerjee Information Theory and Communication 2/1

Wrong Code We have seen that the minimum expected codeword length per symbol is arbitrarily close to the entropy. For stationary stochastic processes, this means that the entropy rate is the expected number of bits per symbol required to describe the process. But what happens if the code is designed for the wrong distribution? Because the true distribution may be unknown to us. The wrong distribution may happen to be the best known approximation of the unknown distribution. Let p(x) and q(x) denote the real and the wrong distributions, resp. Consider the Shannon code assignment, where the minimum expected codeword length is within 1-bit of the theoretical optimal for q(x): l(x) = log 1 q(x) We will not achieve expected length L H(p). c Ritwik Banerjee Information Theory and Communication 3/1

Wrong Code What is the increase in the expected description length due to the wrong distribution? c Ritwik Banerjee Information Theory and Communication 4/1

Wrong Code What is the increase in the expected description length due to the wrong distribution? This is given by the KL divergence D(p q). Theorem The expected length under p(x) of the code assignment l(x) = satisfies the inequalities H(p) + D(p q) E(l(x)) < H(p) + D(p q) + 1 where the expectation is w.r.t. the distribution p. That is, the divergence is a measure of the penalty of using an approximation for the coding process. log 1 q(x) c Ritwik Banerjee Information Theory and Communication 4/1

Code Classes and McMillan Inequality We have shown that instantaneous codes satisfy Kraft inequality. Six years after Kraft s proof, McMillan showed that the same if and only if result also holds true for the larger class of uniquely decodable codes. We will skip the proof of this version of the inequality: The codeword lengths of a uniquely decodable D-ary code satisfy D l i 1 Conversely, given a set of integers l i satisfying this inequality, there exists a uniquely decodable code with these integers as its codeword lengths. c Ritwik Banerjee Information Theory and Communication 5/1

Huffman Codes Given a distribution, a prefix code with the shorted expected length (i.e., optimal compression) can be constructed by an algorithm discovered by Huffman in 1952. Commonly used for lossless data compression. Let X = {1, 2, 3, 4, 5} and X is a random variable taking values from X with the following distribution: p(1) = 0.25 p(2) = 0.25 p(3) = 0.2 p(4) = 0.15 p(5) = 0.15 c Ritwik Banerjee Information Theory and Communication 6/1

Huffman Codes Given a distribution, a prefix code with the shorted expected length (i.e., optimal compression) can be constructed by an algorithm discovered by Huffman in 1952. Commonly used for lossless data compression. Let X = {1, 2, 3, 4, 5} and X is a random variable taking values from X with the following distribution: p(1) = 0.25 p(2) = 0.25 p(3) = 0.2 p(4) = 0.15 p(5) = 0.15 The least frequent values should have the longest codewords. The codeword lengths for 4 and 5 must be equal. c Ritwik Banerjee Information Theory and Communication 6/1

Huffman Codes Given a distribution, a prefix code with the shorted expected length (i.e., optimal compression) can be constructed by an algorithm discovered by Huffman in 1952. Commonly used for lossless data compression. Let X = {1, 2, 3, 4, 5} and X is a random variable taking values from X with the following distribution: p(1) = 0.25 p(2) = 0.25 p(3) = 0.2 p(4) = 0.15 p(5) = 0.15 The least frequent values should have the longest codewords. The codeword lengths for 4 and 5 must be equal. Otherwise, we can delete a bit from the longer codeword, and still retain a prefix-free code; which implies that the code we had was not optimal. Construct a code where the two longest codes differ only in the last bit. Combine 4 and 5 into a single source, with probability 0.3. Keep combining the two least probable items into a single source until there is only one item left. c Ritwik Banerjee Information Theory and Communication 6/1

Huffman Codes There is a binary tree structure corresponding to this process! What about other alphabets? In general, for a D-ary alphabet, the algorithm performs a combining of the D least probable items into a single source. For example, if the codewords are in the ternary alphabet D = {1, 2, 3}, we will get a ternary tree. c Ritwik Banerjee Information Theory and Communication 7/1

Huffman Codes Insufficient number of symbols Not enough symbols to combine D items at a time? Create dummy variables with 0 probability! At each stage the number of symbols is reduced by D 1. Therefore, for k merges, we want the total number of symbols to be 1 + k(d 1). Optimality of Binary Huffman Codes Theorem For any distribution, there exists an instantaneous code with minimum expected length such that the following properties hold: 1. The codeword lengths are ordered inversely to the probabilities. 2. The two longest codewords have the same length. 3. The longest codewords, which correspond to the least likely symbols, differ only in the last bit. c Ritwik Banerjee Information Theory and Communication 8/1

Huffman Code is not unique At each split, we have two choices: 01 or 10 For multiple items tied with identical probabilities, multiple ways ordering their codewords exist. c Ritwik Banerjee Information Theory and Communication 9/1

Source Coding and 20-questions We want the optimal seqeunce of yes/no questions to determine an object from a set of objects. Assuming that we know the probability distribution on these objects. A sequence of such questions is equivalent to a code for the object. A question depends only on the answers to the questions asked previously. The sequence of answers uniquely determines the object, therefore, if we model the yes/no answers by 0s and 1s, we have a unique binary encoding for each object in the set. The average length of this code is simply the average number of questions asked. This can be optimized by Huffman code. That is, Huffman code determines the optimal sequence of questions that will identify the object. The expected number of questions in this process, E(Q), satisfies H(X) E(Q) < H(X) + 1 c Ritwik Banerjee Information Theory and Communication 10/1

Alphabetic Codes Consider a special case of the 20-questions game: the elements of X = {1, 2,..., m} are in decreasing order of probability (i.e., p 1 p 2... l m) the only questions allowed are of the form is X > a? (for some a). The Huffman code constructed by the Huffman algorithm may not correspond to slice sets of the form {x : x < a}. But we can do the following: Take the optimal code lengths found by Huffman codes. Use these lengths to assign symbols to the tree by taking the first available node at the current level. This is not Huffman code, but it is another optimal code. At each non-leaf node, it splits the set into two subsets: {x : x < a} and {x : x > a} These are also called alphabetic codes because the tree construction process leads to an alphabetical ordering of the codewords. c Ritwik Banerjee Information Theory and Communication 11/1

Code Redundancy Huffman code has the shortest average codeword length, i.e., L Huffman L for any prefix code. Redundancy of a random variable X is defined as the difference between the average Huffman codeword length and the entropy H(X). Redundancy of Huffman coding is bounded above by p + 0.086, where p is the probability of the most common symbol. c Ritwik Banerjee Information Theory and Communication 12/1

Huffman coding is optimal Theorem If C is a Huffman code and C is any uniquely decodable code, then L(C ) L(C). The proof for binary alphabets can be extended to general D-ary alphabets. Huffman coding is a greedy algorithm. It works by combining the least likely symbols at each step. In general, greedy approaches do not lead to globally optimal solutions, but in the case of Huffman coding, it does. c Ritwik Banerjee Information Theory and Communication 13/1