Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

Similar documents
A Brief Introduction to Information Theory and Lossless Coding

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

Entropy, Coding and Data Compression

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Communication Theory II

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

Images with (a) coding redundancy; (b) spatial redundancy; (c) irrelevant information

Lecture5: Lossless Compression Techniques

The Lempel-Ziv (LZ) lossless compression algorithm was developed by Jacob Ziv (AT&T Bell Labs / Technion Israel) and Abraham Lempel (IBM) in 1978;

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

A SURVEY ON DICOM IMAGE COMPRESSION AND DECOMPRESSION TECHNIQUES

Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

Information Theory and Huffman Coding

Module 3 Greedy Strategy

Information Theory and Communication Optimal Codes

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING

REVIEW OF IMAGE COMPRESSION TECHNIQUES FOR MULTIMEDIA IMAGES

Coding for Efficiency

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

Introduction to Source Coding

Comm. 502: Communication Theory. Lecture 6. - Introduction to Source Coding

Image Processing Computer Graphics I Lecture 20. Display Color Models Filters Dithering Image Compression

COURSE MATERIAL Subject Name: Communication Theory UNIT V

A Hybrid Technique for Image Compression

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

Compression and Image Formats

CGT 511. Image. Image. Digital Image. 2D intensity light function z=f(x,y) defined over a square 0 x,y 1. the value of z can be:

The Strengths and Weaknesses of Different Image Compression Methods. Samuel Teare and Brady Jacobson

Raster Image File Formats

Indian Institute of Technology, Roorkee, India

Digital Speech Processing and Coding

Communication Theory II

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

UNIT 7C Data Representation: Images and Sound

Chapter 3 LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING COMPRESSED ENCRYPTED DATA USING VARIOUS FILE FORMATS

Module 3 Greedy Strategy

The Need for Data Compression. Data Compression (for Images) -Compressing Graphical Data. Lossy vs Lossless compression

Pooja Rani(M.tech) *, Sonal ** * M.Tech Student, ** Assistant Professor

Digital Image Processing Introduction

Tarek M. Sobh and Tarek Alameldin

Multimedia Communications. Lossless Image Compression

Ch. 3: Image Compression Multimedia Systems

Digital Asset Management 2. Introduction to Digital Media Format

Computing and Communications 2. Information Theory -Channel Capacity

UNIT 7C Data Representation: Images and Sound Principles of Computing, Carnegie Mellon University CORTINA/GUNA

An Analytical Study on Comparison of Different Image Compression Formats

Lecture 1 Introduction

MULTIMEDIA SYSTEMS

Course Developer: Ranjan Bose, IIT Delhi

Comparative Analysis of Lossless Image Compression techniques SPHIT, JPEG-LS and Data Folding

Entropy Coding. Outline. Entropy. Definitions. log. A = {a, b, c, d, e}

Fundamentals of Multimedia

ECE/OPTI533 Digital Image Processing class notes 288 Dr. Robert A. Schowengerdt 2003

3. Image Formats. Figure1:Example of bitmap and Vector representation images

2. REVIEW OF LITERATURE

Speech Coding in the Frequency Domain

Digital Images: A Technical Introduction

CHAPTER 6: REGION OF INTEREST (ROI) BASED IMAGE COMPRESSION FOR RADIOGRAPHIC WELD IMAGES. Every image has a background and foreground detail.

Image Processing. Adrien Treuille

Bitmap Image Formats

Digital Communication Systems ECS 452

Lossless Image Compression Techniques Comparative Study

Keywords Audio Steganography, Compressive Algorithms, SNR, Capacity, Robustness. (Figure 1: The Steganographic operation) [10]

Module 6 STILL IMAGE COMPRESSION STANDARDS

Speeding up Lossless Image Compression: Experimental Results on a Parallel Machine

Lossless Grayscale Image Compression using Blockwise Entropy Shannon (LBES)

A STUDY OF IMAGE COMPRESSION TECHNIQUES AND ITS APPLICATION IN TELEMEDICINE AND TELECONSULTATION

Unit 1.1: Information representation

Arithmetic Compression on SPIHT Encoded Images

A High-Throughput Memory-Based VLC Decoder with Codeword Boundary Prediction

Image compression using Weighted Average and Least Significant Bit Elimination Approach S.Subbulakshmi 1 Ezhilarasi Kanagasabai 2

Chapter 8. Representing Multimedia Digitally

DEVELOPMENT OF LOSSY COMMPRESSION TECHNIQUE FOR IMAGE

Chapter 9 Image Compression Standards

A Review on Medical Image Compression Techniques

Digital Communication Systems ECS 452

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

Assistant Lecturer Sama S. Samaan

Rab Nawaz. Prof. Zhang Wenyi

Monday, February 2, Is assigned today. Answers due by noon on Monday, February 9, 2015.

Channel Coding/Decoding. Hamming Method

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Wednesday, February 1, 2017

Approximate Compression Enhancing compressibility through data approximation

6.004 Computation Structures Spring 2009

Lecture - 3. by Shahid Farid

Topics. 1. Raster vs vector graphics. 2. File formats. 3. Purpose of use. 4. Decreasing file size

Lecture #2. EE 471C / EE 381K-17 Wireless Communication Lab. Professor Robert W. Heath Jr.

Influence of Dictionary Size on the Lossless Compression of Microarray Images

An Efficient Approach for Image Compression using Segmented Probabilistic Encoding with Shanon Fano[SPES].

PROBABILITY AND STATISTICS Vol. II - Information Theory and Communication - Tibor Nemetz INFORMATION THEORY AND COMMUNICATION

Indexed Color. A browser may support only a certain number of specific colors, creating a palette from which to choose

Information Theory: the Day after Yesterday

Comparison of Data Compression in Text Using Huffman, Shannon-Fano, Run Length Encoding, and Tunstall Method

6.02 Introduction to EECS II Spring Quiz 1

Color & Compression. Robin Strand Centre for Image analysis Swedish University of Agricultural Sciences Uppsala University

Transient Errors and Rollback Recovery in LZ Compression

Problem Sheet 1 Probability, random processes, and noise

Transcription:

Course Presentation Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

Data Compression Motivation Data storage and transmission cost money Use fewest number of bits to represent information source Pro: Cons: Less memory, less transmission time Extra processing required Distortion (if using lossy compression ) Data has to be decompressed to be represented, this may cause delay Page 1

Data Compression Lossless Lossless and Lossy Exact reconstruction is possible Applied to general data Lower compression rates Examples: Run-length, Huffman, Lempel-Ziv Lossy Higher compression rates Applied to audio, image and video Examples: CELP, JPEG, MPEG-2 Page 2

Run-length encoding BBBBHHDDXXXXKKKKWWZZZZ 4B2H2D4X4K2W4Z Image of a rectangle 0, 40 0, 40 0,10 1,20 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,20 0,10 0,40 Page 3

Fixed Length Coding (FLC) A simple example The message to code: Message length: 10 symbols 5 different symbols at least 3 bits Codeword table Total bits required to code: 10*3 = 30 bits Page 4

Variable Length Coding (VLC) Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code The message to code: Codeword table To identify end of a codeword as soon as it arrives, no codeword can be a prefix of another codeword How to find the optimal codeword table? Total bits required to code: 3*2 +3*2+2*2+3+3 = 24 bits Page 5

Morse code nonprefix code VLC, Example Application Needs separator symbol for unique decodability Page 6

Huffman Coding Algorithm Step 1: Take the two least probable symbols in the alphabet (longest codewords, equal length, differing in last digit) Step2: Combine these two symbols into a single symbol, and repeat. P(n): Probability of symbol number n Here there is 9 symbols. e.g. symbols can be alphabet letters a, b, c, d, e, f, g, h, i Page 7

Paper: "A Method for the Construction of Minimum-Redundancy Codes, 1952 Results in "prefix-free codes Most efficient Cons: Huffman Coding Algorithm No other mapping will produce a smaller average output size If the actual symbol frequencies agree with those used to create the code Have to run through the entire data in advance to find frequencies David A. Huffman 1925-1999 Minimum-Redundancy is not favorable for error correction techniques (bits are not predictable if e.g. one is missing) Does not support block of symbols: Huffman is designed to code single characters only. Therefore at least one bit is required per character, e.g. a word of 8 characters requires at least an 8 bit code Page 8

Entropy Coding Entropy, Definition The entropy, H, of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X. X Information Source H X P x Information Theory Point of View P(x) Probability that symbol x in X will occur Measure of information content (in bits) A quantitative measure of the disorder of a system It is impossible to compress the data such that the average number of bits per symbol is less than the Shannon entropy of the source(in noiseless channel) The Intuition Behind the Formula x X log 1 Claude E. Shannon P x amount of uncertatinty H P 1916-2001 x 1 bringing it to the world of bits H log 2 I x, information content of x P x weighted average number of bits required to encode each possible value P x and 2 1 P x Page 9

Lempel-Ziv (LZ77) Algorithm for compression of character sequences Assumption: Sequences of characters are repeated Idea: Replace a character sequence by a reference to an earlier occurrence 1. Define a: search buffer = (portion) of recently encoded data look-ahead buffer = not yet encoded data 2. Find the longest match between the first characters of the look ahead buffer and an arbitrary character sequence in the search buffer 3. Produces output <offset, length, next_character> offset + length = reference to earlier occurrence next_character = the first character following the match in the look ahead buffer Page 10

Lempel-Ziv-Welch (LZW) Drops the search buffer and keeps an explicit dictionary Produces only output <index> Used by unix "compress", "GIF", "V24.bis", "TIFF Example: wabbapwabbapwabbapwabbapwoopwoopwoo Progress clip at 12 th entry Encoder output sequence so far: 5 2 3 3 2 1 Page 11

Lempel-Ziv-Welch (LZW) Example: wabbapwabbapwabbapwabbapwoopwoopwoo Progress clip at the end of above example Encoder output sequence: 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4 Page 12

Arithmetic Coding Encodes the block of symbols into a single number, a fraction n where (0.0 n < 1.0). Step 1: Divide interval [0,1) into subintervals based on probability of the symbols in the current context Dividing Model. Step 2: Divide interval corresponds to the current symbol into subintervals based on dividing model of step 1. Step 3: Repeat Step 2 for all symbols in the block of symbols. Step 4: Encode the block of symbols with a single number in the final resulting range. Use the corresponding binary number in this range with the smallest number of bits. See the encoding and decoding examples in the following slides Page 13

Arithmetic Coding, Encoding Example: SQUEEZE Using FLC: 3 bits per symbol 7*3 = 21 bits P( E ) = 3/7 Prob. S Q U Z : 1/7 Page 14 Dividing Model We can encode the word SQUEEZE with a single number in [0.64769-0.64772) range. The binary number in this range with the smallest number of bits is 0.101001011101, which corresponds to 0.647705 decimal. The '0.' prefix does not have to be transmitted because every arithmetic coded message starts with this prefix. So we only need to transmit the sequence 101001011101, which is only 12 bits.

Arithmetic Coding, Decoding Input Probabilities: P( A )=60%, P( B )=20%, P( C )=10%, P( <space> )=10% Decoding the input value of 0.538 60% 20% 10% 10% Dividing model from input probabilities The fraction 0.538 (the circular point) falls into the sub-interval [0, 0.6) the first decoded symbol is 'A' The subregion containing the point is successively subdivided in the same way as diviging model. Since.538 is within the interval [0.48, 0.54), the second symbol of the message must have been 'C'. Since.538 falls within the interval [0.534, 0.54), the Third symbol of the message must have been '<space>'. The internal protocol in this example indicates <space> as the termination symbol, so we consider this is the end of decoding process Page 15

Pros Arithmetic Coding Typically has a better compression ratio than Huffman coding. Cons High computational complexity. Patent situation had a crucial influence to decisions about the implementation of an arithmetic coding (Many now are expired). Page 16

Multimedia Systems Entropy Coding Thank You Next Session: Color Space FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.dml.ir/ Page 17