Lecture5: Lossless Compression Techniques

Similar documents
LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

Introduction to Source Coding

Communication Theory II

Module 3 Greedy Strategy

Information Theory and Communication Optimal Codes

Module 3 Greedy Strategy

A Brief Introduction to Information Theory and Lossless Coding

Coding for Efficiency

Greedy Algorithms. Kleinberg and Tardos, Chapter 4

Comm. 502: Communication Theory. Lecture 6. - Introduction to Source Coding

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

Information Theory and Huffman Coding

Entropy, Coding and Data Compression

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

Solutions to Assignment-2 MOOC-Information Theory

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, April 28, Midterm Solutions (Prepared by TA Shouvik Ganguly)

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

CHAPTER 5 PAPR REDUCTION USING HUFFMAN AND ADAPTIVE HUFFMAN CODES

Monday, February 2, Is assigned today. Answers due by noon on Monday, February 9, 2015.

Lossless Grayscale Image Compression using Blockwise Entropy Shannon (LBES)

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007

Digital Communication Systems ECS 452

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

The Lempel-Ziv (LZ) lossless compression algorithm was developed by Jacob Ziv (AT&T Bell Labs / Technion Israel) and Abraham Lempel (IBM) in 1978;

Run-Length Based Huffman Coding

Wednesday, February 1, 2017

GENERIC CODE DESIGN ALGORITHMS FOR REVERSIBLE VARIABLE-LENGTH CODES FROM THE HUFFMAN CODE

Huffman Coding For Digital Photography

Digital Communication Systems ECS 452

HUFFMAN CODING. Catherine Bénéteau and Patrick J. Van Fleet. SACNAS 2009 Mini Course. University of South Florida and University of St.

Lossless Image Compression Techniques Comparative Study

ECE Advanced Communication Theory, Spring 2007 Midterm Exam Monday, April 23rd, 6:00-9:00pm, ELAB 325

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING

Comparison of Data Compression in Text Using Huffman, Shannon-Fano, Run Length Encoding, and Tunstall Method

Slides credited from Hsueh-I Lu, Hsu-Chun Hsiao, & Michael Tsai

6.02 Introduction to EECS II Spring Quiz 1

Keywords Audio Steganography, Compressive Algorithms, SNR, Capacity, Robustness. (Figure 1: The Steganographic operation) [10]

6.004 Computation Structures Spring 2009

Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

The idea of similarity is through the Hamming

Problem Sheet 1 Probability, random processes, and noise

A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction

Communication Theory II

6.450: Principles of Digital Communication 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Chapter 6: Memory: Information and Secret Codes. CS105: Great Insights in Computer Science

DECISION TREE TUTORIAL

DCSP-3: Minimal Length Coding. Jianfeng Feng

!!"!#$#!%!"""#&#!%!""#&#"%!"# &#!%!# # ##$#!%!"'###&#!%!"(##&#"%!"!#&#!%!""# #!!"!#$#!%)# &#!%*# &#"%(##&#!%!# Base or

6. FUNDAMENTALS OF CHANNEL CODER

What You ll Learn Today

Entropy Coding. Outline. Entropy. Definitions. log. A = {a, b, c, d, e}

Images with (a) coding redundancy; (b) spatial redundancy; (c) irrelevant information

Arithmetic Compression on SPIHT Encoded Images

SHANNON S source channel separation theorem states

Pooja Rani(M.tech) *, Sonal ** * M.Tech Student, ** Assistant Professor

A SURVEY ON DICOM IMAGE COMPRESSION AND DECOMPRESSION TECHNIQUES

Lectures: Feb 27 + Mar 1 + Mar 3, 2017

Huffman-Compressed Wavelet Trees for Large Alphabets

Huffman Coding with Non-Sorted Frequencies

Speech Coding in the Frequency Domain

COURSE MATERIAL Subject Name: Communication Theory UNIT V

4. Which of the following channel matrices respresent a symmetric channel? [01M02] 5. The capacity of the channel with the channel Matrix

Fundamentals of Digital Communication

[Manisha*, 4.(10): October, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

2. REVIEW OF LITERATURE

ECE 8771, Information Theory & Coding for Digital Communications Summer 2010 Syllabus & Outline (Draft 1 - May 12, 2010)

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Simple, Fast, and Efficient Natural Language Adaptive Compression

A Recursive Threshold Visual Cryptography Scheme

Hamming net based Low Complexity Successive Cancellation Polar Decoder

The Need for Data Compression. Data Compression (for Images) -Compressing Graphical Data. Lossy vs Lossless compression

B. Tech. (SEM. VI) EXAMINATION, (2) All question early equal make. (3) In ease of numerical problems assume data wherever not provided.

Efficient and Compact Representations of Some Non-Canonical Prefix-Free Codes

Computer Graphics. Si Lu. Fall er_graphics.htm 10/02/2015

An Efficient Approach for Image Compression using Segmented Probabilistic Encoding with Shanon Fano[SPES].

TSTE17 System Design, CDIO. General project hints. Behavioral Model. General project hints, cont. Lecture 5. Required documents Modulation, cont.

Convolutional Coding Using Booth Algorithm For Application in Wireless Communication

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Indian Institute of Technology, Roorkee, India

CS 540-2: Introduction to Artificial Intelligence Homework Assignment #2. Assigned: Monday, February 6 Due: Saturday, February 18

Chapter 12. Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks

Prof. Feng Liu. Fall /02/2018

DEVELOPMENT OF LOSSY COMMPRESSION TECHNIQUE FOR IMAGE

MATHEMATICS IN COMMUNICATIONS: INTRODUCTION TO CODING. A Public Lecture to the Uganda Mathematics Society

A Modified Image Template for FELICS Algorithm for Lossless Image Compression

ECEn 665: Antennas and Propagation for Wireless Communications 131. s(t) = A c [1 + αm(t)] cos (ω c t) (9.27)

[Srivastava* et al., 5(8): August, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

ECE 6640 Digital Communications

CSE101: Algorithm Design and Analysis. Ragesh Jaiswal, CSE, UCSD

Review: Our Approach 2. CSC310 Information Theory

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates

SPIHT Algorithm with Huffman Encoding for Image Compression and Quality Improvement over MIMO OFDM Channel

Topic 23 Red Black Trees

Transcription:

Fixed to fixed mapping: we encoded source symbols of fixed length into fixed length code sequences Fixed to variable mapping: we encoded source symbols of fixed length into variable length code sequences Source Symbol s k Symbol Probability P k Symbol Code, Word c k Code I Code Word Length l k Symbol Code Word c k Code II Code Word Length l k s 0 /2 00 2 0 s /4 0 2 0 2 s 2 /8 0 2 0 3 s 3 /8 2 4 Variable to fixed mapping encode variable length source sequences into a fixed length code words Variable to variable mapping encode variable length source sequences into a variable length code words LECTURES

Shannon Fano Code It is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured) It is suboptimal in the sense that it does not achieve the lowest possible expected code word length The technique was proposed by Claude Shannon and was attributed to Robert Fano Algorithm -Source symbols are listed in order of decreasing probability from left to right 2-Divide the list into two parts, with the total probability (or frequency of occurrence) of the left part being as close to the total of the right as possible 3-The left part of the list is assigned the binary digit 0, and the right part is assigned the digit This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with 4-Recursively apply the steps 2 and 3 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree LECTURES 2

Shannon Fano Code Ex: Assume a sequence of alphabet A={ a, b, c, d, e, f } with the following occurrence weights, {9, 8, 6, 5, 4, 2} respectively. Apply Shannon Fano coding and discuss the suboptimality LECTURES 3

Shannon Fano Code 7 7 LECTURES 4

Shannon Fano Code 5

Shannon Fano Code LECTURES 6

Shannon Fano Code LECTURES 7

Shannon Fano Code 0 0 e 4 f 2 LECTURES 8

Shannon Fano Code Symbol Frequency P CODE a 9 0.265 00 b 8 0.235 0 c 6 0.76 00 d 5 0.47 0 e 4 0.8 0 f 2 0.059 L K k P k l k L 2x(0.235 0.265) 3x(0.76 0.47 0.8 0.058) 2.5bits / symbol LECTURES 9

Huffman procedure is based on two observations regarding optimum prefix codes: - Symbols that occur more frequently (have a higher probability of occurrence) will have shorter codewords than symbols that occur less frequently 2-The two symbols that occur least frequently will have the same length It is commonly used for lossless compressions Huffman code is a prefix, variable length code that can achieve the shortest average code length for a given input alphabet with pmf In general, Huffman coding is a form of statistical coding as not all characters occur with the same frequency (Probability) The process of finding the optimal code was algorithmized by Huffman A code is an optimal prefix code if it has the same average codeword length as a Huffman code for the given pmf 0

Ex: Consider the following short text Eerie eyes seen near lake Build Huffman tree to encode the text -Get the count of each letter - Sort from lowest to highest count

- Create a new node as the sum of the lowest two count 2

- Create a new node as the sum of the lowest two count - Locate the node at its precedence and create a new one 3

- Locate the node at its precedence and create a new one 4

5

6

7

8

9

20

2

22

23

The frequency of the root node should equal to the number of characters in the text 0 0 0 0 0 0 0 0 0 0 0 24

Least Frequent Same Length More Frequent Shorter Length 25

Huffman Algorithm -Source symbols are listed in order of decreasing probability (frequency) 2-The two source symbols of lowest probability are assigned a 0 and (splitting stage ) 3-These two source symbols are combined into a new source symbol with probability equal to the sum of the two original probabilities (The list of source symbols and therefore source statistics is thereby reduced in size by one) 4-The probability of the new symbol is placed in the list in accordance with its value The steps are repeated until we are left with a final list of source statistics of only two for which a 0 and a are assigned 26

Ex: Find the Huffman code for the following source given the corresponding probabilities; marginal pmf S o 0. S 0.2 S 2 0.4 S 3 0.2 S 4 0. S 2 0.4 S 0.2 S 3 0.2 S o 0. S 4 0. 0 0.2 0.4 0.2 0.2 0.2 0 0.4 0.4 0.4 0.2 0 0.6 0.6 0.4 0 0 000 000 00 H( s) (0.4) log ( ) 2x(0.2) log 2( ) 2x(0.) xlog 2( ) 0.4 0.2 0. 2 L 0.4x 0.2x3 0.2x3 0.x4 0.x4 2.2 H( s) 2.293 CR 3.364 2.2 2 K Pk k ( l k L) 2.36 LECTURES 27

Alternative solution S o 0. S 0.2 S 2 0.4 S 3 0.2 S 4 0. S 2 0.4 0.4 0.4 0.6 0 00 S 0.2 0.2 0.4 0 0.4 0 0.6 S 3 0.2 0.2 0 0.2 0.4 00 S o 0. 0 0.2 0.2 0 S 4 0. H( s) (0.4) log ( ) 2x(0.2) log 2( ) 2x(0.) xlog 2( ) 0.4 0.2 0. 2 L 0.4x2 0.2x2 0.2x2 0.x3 0.x3 2.2 H( s) 2.293 CR 3.364 2.2 2 K Pk k ( l L) Hence, to obtain a minimum variance Huffman code, we always put the combined symbol as high in the list as possible LECTURES k 2 0.6 28

Conclusions Huffman encoding process (Huffman tree) is not unique The code with the lowest code length variance is the best because it ensure the least amount of variability in the encoded stream An encoding scheme that minimizes the average length is called optimal coding Huffman code process is optimal if: H( S) L H( S) Looser bound In fact (from literatures), it can be shown that if P max is the largest probability in the probability model, then for P max 0.5, while for P max < 0.5 H( S) L H( S) P max H( S) L H( S) Pmax 0.086 Tighter bound The average codeword length is equal to the marginal entropy if all marginal probabilities are integer powers of 2 LECTURES 29

Huffman vs. Shannon Symbol Frequency P a 9 0.265 0 b 8 0.235 0 c 6 0.76 d 00 5 0.47 e 0000 4 0.8 0000 f 2 0.059 0 0.265 0.235 0.76 0.76 0.47 0 0.32 0.265 0.235 0 0.76 0.4 0.32 0.265 0 0 L K k P k l k L 2x(0.235 0.265 0.76) 3x0.47 4x(0.8 0.058).352 0.44 0.704 2.470 bits / symbol The average code length is less than that of Shannon Fano code, Hence Huffman is optimum, but Shannon is suboptimal LECTURES 30