Introduction to Source Coding

Similar documents
Lecture5: Lossless Compression Techniques

Information Theory and Communication Optimal Codes

Entropy, Coding and Data Compression

Communication Theory II

Comm. 502: Communication Theory. Lecture 6. - Introduction to Source Coding

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

Information Theory and Huffman Coding

Module 3 Greedy Strategy

A Brief Introduction to Information Theory and Lossless Coding

Module 3 Greedy Strategy

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

Coding for Efficiency

Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003

GENERIC CODE DESIGN ALGORITHMS FOR REVERSIBLE VARIABLE-LENGTH CODES FROM THE HUFFMAN CODE

Solutions to Assignment-2 MOOC-Information Theory

Monday, February 2, Is assigned today. Answers due by noon on Monday, February 9, 2015.

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

Wednesday, February 1, 2017

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Communication Theory II

6.450: Principles of Digital Communication 1

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, April 28, Midterm Solutions (Prepared by TA Shouvik Ganguly)

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

Greedy Algorithms. Kleinberg and Tardos, Chapter 4

The idea of similarity is through the Hamming

SYLLABUS of the course BASIC ELECTRONICS AND DIGITAL SIGNAL PROCESSING. Master in Computer Science, University of Bolzano-Bozen, a.y.

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007

DCSP-3: Minimal Length Coding. Jianfeng Feng

The Lempel-Ziv (LZ) lossless compression algorithm was developed by Jacob Ziv (AT&T Bell Labs / Technion Israel) and Abraham Lempel (IBM) in 1978;

Speech Coding in the Frequency Domain

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

Run-Length Based Huffman Coding

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

Exercises to Chapter 2 solutions

Computing and Communications 2. Information Theory -Channel Capacity

The ternary alphabet is used by alternate mark inversion modulation; successive ones in data are represented by alternating ±1.

Entropy Coding. Outline. Entropy. Definitions. log. A = {a, b, c, d, e}

What You ll Learn Today

Lossless Image Compression Techniques Comparative Study

Slides credited from Hsueh-I Lu, Hsu-Chun Hsiao, & Michael Tsai

Digital Communication Systems ECS 452

6.02 Introduction to EECS II Spring Quiz 1

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Lecture 1 Introduction

(a) (b) (c) (d) (e) (a) (b) (c) (d) (e)

HUFFMAN CODING. Catherine Bénéteau and Patrick J. Van Fleet. SACNAS 2009 Mini Course. University of South Florida and University of St.

Chapter 6: Memory: Information and Secret Codes. CS105: Great Insights in Computer Science

CHAPTER 5 PAPR REDUCTION USING HUFFMAN AND ADAPTIVE HUFFMAN CODES

Channel Coding/Decoding. Hamming Method

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Computer Science 1001.py. Lecture 25 : Intro to Error Correction and Detection Codes

FREDRIK TUFVESSON ELECTRICAL AND INFORMATION TECHNOLOGY

ECEn 665: Antennas and Propagation for Wireless Communications 131. s(t) = A c [1 + αm(t)] cos (ω c t) (9.27)

Efficient and Compact Representations of Some Non-Canonical Prefix-Free Codes

B. Tech. (SEM. VI) EXAMINATION, (2) All question early equal make. (3) In ease of numerical problems assume data wherever not provided.

Huffman Coding For Digital Photography

Lecture 9b Convolutional Coding/Decoding and Trellis Code modulation

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING

Exercise Problems: Information Theory and Coding

Multi-user Two-way Deterministic Modulo 2 Adder Channels When Adaptation Is Useless

CSCI 2570 Introduction to Nanocomputing

EE521 Analog and Digital Communications

Problem Sheet 1 Probability, random processes, and noise

ECE Advanced Communication Theory, Spring 2007 Midterm Exam Monday, April 23rd, 6:00-9:00pm, ELAB 325

Digital Audio. Lecture-6

CHAPTER 6: REGION OF INTEREST (ROI) BASED IMAGE COMPRESSION FOR RADIOGRAPHIC WELD IMAGES. Every image has a background and foreground detail.

6.004 Computation Structures Spring 2009

Digital Logic and Design (Course Code: EE222) Lecture 14: Combinational Contd.. Decoders/Encoders

Cryptography. Module in Autumn Term 2016 University of Birmingham. Lecturers: Mark D. Ryan and David Galindo

Info theory and big data

TSTE17 System Design, CDIO. General project hints. Behavioral Model. General project hints, cont. Lecture 5. Required documents Modulation, cont.

Huffman Coding with Non-Sorted Frequencies

Discrete Structures for Computer Science

Digital Communication Systems ECS 452

MATHEMATICS IN COMMUNICATIONS: INTRODUCTION TO CODING. A Public Lecture to the Uganda Mathematics Society

The study of probability is concerned with the likelihood of events occurring. Many situations can be analyzed using a simplified model of probability

Optimal Coded Information Network Design and Management via Improved Characterizations of the Binary Entropy Function

A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction

S Coding Methods (5 cr) P. Prerequisites. Literature (1) Contents

Digital Television Lecture 5

Sets. Gazihan Alankuş (Based on original slides by Brahim Hnich et al.) August 6, Outline Sets Equality Subset Empty Set Cardinality Power Set

MIDTERM REVIEW INDU 421 (Fall 2013)

Huffman-Compressed Wavelet Trees for Large Alphabets

Multiuser Information Theory and Wireless Communications. Professor in Charge: Toby Berger Principal Lecturer: Jun Chen

CSE 312: Foundations of Computing II Quiz Section #2: Inclusion-Exclusion, Pigeonhole, Introduction to Probability (solutions)

Review: Our Approach 2. CSC310 Information Theory

Background Dirty Paper Coding Codeword Binning Code construction Remaining problems. Information Hiding. Phil Regalia

COMP Online Algorithms. Paging and k-server Problem. Shahin Kamali. Lecture 9 - Oct. 4, 2018 University of Manitoba

EELE 6333: Wireless Commuications

Language of Instruction Course Level Short Cycle ( ) First Cycle (x) Second Cycle ( ) Third Cycle ( ) Term Local Credit ECTS Credit Fall 3 5

DVA325 Formal Languages, Automata and Models of Computation (FABER)

THE use of balanced codes is crucial for some information

PD-SETS FOR CODES RELATED TO FLAG-TRANSITIVE SYMMETRIC DESIGNS. Communicated by Behruz Tayfeh Rezaie. 1. Introduction

EECS 473 Advanced Embedded Systems. Lecture 13 Start on Wireless

Algorithms and Data Structures: Network Flows. 24th & 28th Oct, 2014

PROBABILITY AND STATISTICS Vol. II - Information Theory and Communication - Tibor Nemetz INFORMATION THEORY AND COMMUNICATION

Transcription:

Comm. 52: Communication Theory Lecture 7 Introduction to Source Coding - Requirements of source codes - Huffman Code

Length Fixed Length Variable Length Source Code Properties Uniquely Decodable allow user to invert mapping to the original. Prefix No codeword can be a beginning of any other codeword. A prefix-free encoding is useful because it is self-defining: if we have any string of symbols, there cannot be two ways to decode it.

Examples: Not uniquely decodable Codes Code : symbols A and C are assigned the same binary sequence. Thus, the first requirement of a useful code is that each symbol be assigned a unique binary sequence. Code 2: is also not uniquely decodable because it is confusing in decoding. Source Code Code 2 Codeword Codeword A B C D

Prefix Coding A prefix code is defined as a code in which no codeword is the beginning of another codeword. A prefix code is uniquely decodable but the converse is not true. Code C is uniquely decodable since bit indicates the beginning of each code word in the code. Source Prefix Code Code A Code B Code C Codeword Codeword Codeword s s s 2 s 3 Not Uniquely Decodable Nor Prefix Code Uniquely Decodable Codes

Example: Prefix-Uniquely decodable Prob. Code Code 2 Code 3 Code 4 A P[A]=/2 B P[B]=/4 C P[C]=/8 D P[D]=/6 E P[E]=/6 Average Length 3/6 3/6 3/6 33/6 HW: Uniquely decodable : Code, 2, 3. No confusion in decoding Determine Instantaneous the prefix decodable: codes and Code the, uniquely 3. No need decodable to look ahead. codes from Prefix codes are uniquely the decodable above codes??? but the converse is not true. *** Code 4????

Example: Decoding of a Prefix Code Decision Tree for Code B s Source s k Code B Codeword c k Initial State s s 2 s s s 2 s 3 s 3 Example : Decode Answer : s s 3 s 2 s s A prefix-free encoding is useful because it is self-defining: if we have any string of symbols, there cannot be two ways to decode it.

Example Using Prefix decoding tree, decode the following code Decoding Tree symbol code a b a b c d c r d r abracadabra =

Kraft-McMillan Inequality K k 2 v k A prefix code must satisfy the Kraft McMillan s inequality. If a code satisfies this inequality, it doesn t mean that the code is a prefix. For code D 2 - + 2-2 + 2-3 + 2-2= 9/8 This means that Code D IS NOT A PREFIX CODE Source s k Codeword C k Code D Codeword Length v k s s 2 s 2 3 s 3 2 Proof in the BOOK (Not required)

Use of Kraft-McMillan Inequality We may use it if the number of symbols are large such that we cannot simply by inspection judge whether a given code is a prefix code or not. WHAT Kraft-McMillan Inequality Can Do: It can determine that a given code IS NOT A PREFIX CODE WHAT Kraft-McMillan Inequality Cannot Do: It cannot guarantee that a given code is indeed a prefix code

Example Source s k Codeword Code E Codeword Length For code E 2 - + 2-2 + 2-3 + 2-3= Is code E a PREFIX code? NO WHY? s 3 is a beginning of s 2 c k v k s s 3 s 2 3 s 3 2

For a Prefix Code H H Shannon s First Theorem S L S L H S if P k 2 v k k What is the Efficiency η? η= if P 2 vk for some k η< k

Huffman Coding Huffman code is a prefix, variable length code that can achieve the shortest average code length for a given input alphabet. Huffman code constructs a binary tree starting with the probabilities of each symbol in the alphabet. The tree is built in a bottom-up manner The tree is then used to find the codeword for each symbol. An algorithm for finding the Huffman code for a given alphabet with associated probabilities is given in the following slide.

Huffman Encoding Algorithm. Source symbols are listed in order of decreasing probability. The two source symbols of lowest probability are assigned a and. This part of the step is referred to as a splitting stage. 2. These two source symbols are regarded as being combined into a new source symbol with probability equal to the sum of the two original probabilities. (The list of source symbols and therefore source statistics is thereby reduced in size by one). The probability of the new symbol is placed in the list in accordance with its value. 3. The steps are repeated until we are left with a final list of source statistics of only two for which a and a are assigned.

Huffman encoding: Example Use Probabilities to order coding priorities of letters Low probability get codes first (more bits) This smoothes out the information per bit Letter X X2 X3 X4 X5 X6 X7 Probability.35.3.2..4.5.5

Huffman encoding Use a code tree to make the code Combine the symbols with lowest probability to make a new block symbol Assign a to one of the old symbols code word and to the other symbol Now reorder and combine the two lowest probability symbols of the new set Each time the symbol has lowest probability the code words get shorter x x2 x3 x4 x5 x6 x7 D D D 2 D 3 D 4

Result: Huffman encoding Entropy: H(X) = 2. (The best possible average number of bits) Average code word length: _ 7 L v k P( k x k v k = number of bits per symbol ) 2.2 Compression ratio: CR=3/2.2=.357 Note: The average code word length approaches the entropy (fundamental limit) The average code word length does satisfy: S L S H H So the efficiency = H ( X _ L ) 95.5%

Huffman Coding: Example 2 Compute the Huffman Code for the source shown H S 4. log2 4. 2 2. log2 2. 2. log2. 2 293. L Source s k Probability p k s. s.2 s 2.4 s 3.2 s 4.

Solution A Source s k Stage I s 2.4 s.2 s 3.2 s. s 4.

Solution A Source s k Stage I Stage II s 2.4.4 s.2.2 s 3.2.2 s..2 s 4.

Solution A Source s k Stage I Stage II Stage III s 2.4.4.4 s.2.2.4 s 3.2.2.2 s..2 s 4.

Solution A Source s k Stage I Stage II Stage III Stage IV s 2.4.4.4.6 s.2.2.4.4 s 3.2.2.2 s..2 s 4.

Solution A Source s k Stage I Stage II Stage III Stage IV s 2.4.4.4.6 s.2.2.4.4 s 3.2.2.2 s..2 s 4.

As high as Possiblee Source s k Solution A Stage I Stage II Stage III Stage IV Code s 2.4.4.4.6 s.2.2.4.4 s 3.2.2.2 s..2 s 4.

Solution A Cont d Source s k Probability p k Code word c k s. s.2 s 2.4 s 3.2 s 4. H S 2 293. L 4. 2 2. 2 2. 2. 3. 3 22. S L S H H 3 2.2 THIS IS NOT THE ONLY SOLUTION! CR. 364

As low as Possible Source s k Alternate Solution B Stage I Stage II Stage III Stage IV Code s 2.4.4.4.6 s.2.2.4.4 s 3.2.2.2 s..2 s 4.

Alternative Solution B Cont d Source s k Probability p k Code word c k s. s.2 s 2.4 s 3.2 s 4. H S 2 293. L 4. 2. 2 2. 3. 4. 4 22. S L S H H CR 3 2.2.364

What is the difference between the two solutions? They have the same average code length They differ in the variance of the code length 2 K k P k ( v k L) 2 Solution A σ 2 =.6 Solution B σ 2 =.36 both have the same CR: CR 3 2.2.364

Compute Entropy (H) Exercise Build Huffman tree Compute average code length Compression ratio CR (S) P (S) A. B.2 C.4 D.2 E. Code BCCADE

Solution Compute Entropy (H) H = 2. bits Build Huffman tree Compute code length - L = 2.2 bits CR=3/2.2=.364 P(S) Code A. B.2 C.4 D.2 E. Code BCCADE =>

Properties of Huffman Codes The Huffman coding technique is optimal because: - It satisfies prefix condition (so decoding is unambiguous) - It has small average code length (approach minimum). In Huffman encoding, symbols that occur more frequently have shorter Huffman codes (but we must know the probabilities of each symbol for this to be true) Huffman encoding process (Huffman tree) is not unique. The code with the lowest code length variance is the better one.