Comm. 502: Communication Theory. Lecture 6. - Introduction to Source Coding

Similar documents
Entropy, Coding and Data Compression

Introduction to Source Coding

Lecture5: Lossless Compression Techniques

Communication Theory II

Information Theory and Huffman Coding

Coding for Efficiency

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

Communication Theory II

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued

Information Theory and Communication Optimal Codes

Computing and Communications 2. Information Theory -Channel Capacity

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued

A Brief Introduction to Information Theory and Lossless Coding

MATHEMATICS IN COMMUNICATIONS: INTRODUCTION TO CODING. A Public Lecture to the Uganda Mathematics Society

ELEC3028 (EL334) Digital Transmission

CT111 Introduction to Communication Systems Lecture 9: Digital Communications

Module 3 Greedy Strategy

Communications I (ELCN 306)

BSc (Hons) Computer Science with Network Security, BEng (Hons) Electronic Engineering. Cohorts: BCNS/17A/FT & BEE/16B/FT

Digital Communication Systems ECS 452

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

Module 3 Greedy Strategy

Channel Concepts CS 571 Fall Kenneth L. Calvert

Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

S Coding Methods (5 cr) P. Prerequisites. Literature (1) Contents

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

Basic Communications Theory Chapter 2

4. Which of the following channel matrices respresent a symmetric channel? [01M02] 5. The capacity of the channel with the channel Matrix

COPYRIGHTED MATERIAL. Introduction. 1.1 Communication Systems

ECEn 665: Antennas and Propagation for Wireless Communications 131. s(t) = A c [1 + αm(t)] cos (ω c t) (9.27)

Chapter 3 Data and Signals

Digital Communication Systems ECS 452

CS101 Lecture 01: Introduction. What You ll Learn Today

CHAPTER 4 SIGNAL SPACE. Xijun Wang

UNIT-1. Basic signal processing operations in digital communication

Overview. Lecture 3. Terminology. Terminology. Background. Background. Transmission basics. Transmission basics. Two signal types

DIGITAL COMMUNICATION

EENG 444 / ENAS 944 Digital Communication Systems


TSKS01 Digital Communication Lecture 1

Channel Coding/Decoding. Hamming Method

History of Communication

Chapter Two. Fundamentals of Data and Signals. Data Communications and Computer Networks: A Business User's Approach Seventh Edition

EE303: Communication Systems

Ch 5 Hardware Components for Automation

Modulation and Coding Tradeoffs

EIE 441 Advanced Digital communications

Comm 502: Communication Theory

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003

Error Control Coding. Aaron Gulliver Dept. of Electrical and Computer Engineering University of Victoria

DIGITAL COMMINICATIONS

Fundamentals of Digital Communication

Solutions to Information Theory Exercise Problems 5 8

SIGNALS AND SYSTEMS LABORATORY 13: Digital Communication

GENERIC CODE DESIGN ALGORITHMS FOR REVERSIBLE VARIABLE-LENGTH CODES FROM THE HUFFMAN CODE

6.450: Principles of Digital Communication 1

Problem Sheet 1 Probability, random processes, and noise

Introduction to Communications Part Two: Physical Layer Ch3: Data & Signals

PROJECT 5: DESIGNING A VOICE MODEM. Instructor: Amir Asif

Chapter-1: Introduction

Noisy Index Coding with Quadrature Amplitude Modulation (QAM)

The ternary alphabet is used by alternate mark inversion modulation; successive ones in data are represented by alternating ±1.

Exercises to Chapter 2 solutions

Contents. Telecom Service Chae Y. Lee. Data Signal Transmission Transmission Impairments Channel Capacity

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Introduction to Error Control Coding

DCSP-3: Minimal Length Coding. Jianfeng Feng

HUFFMAN CODING. Catherine Bénéteau and Patrick J. Van Fleet. SACNAS 2009 Mini Course. University of South Florida and University of St.

Review of Lecture 2. Data and Signals - Theoretical Concepts. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2

6.004 Computation Structures Spring 2009

Principles of Communications ECS 332

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007

Lecture 1 Introduction

ECE Advanced Communication Theory, Spring 2007 Midterm Exam Monday, April 23rd, 6:00-9:00pm, ELAB 325

Digital Design Laboratory Lecture 7. A/D and D/A

The idea of similarity is through the Hamming

2. TELECOMMUNICATIONS BASICS

C06a: Digital Modulation

Multi-user Two-way Deterministic Modulo 2 Adder Channels When Adaptation Is Useless

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

Digital Modulation Techniques

IMPERIAL COLLEGE of SCIENCE, TECHNOLOGY and MEDICINE, DEPARTMENT of ELECTRICAL and ELECTRONIC ENGINEERING.

Pulse Code Modulation

Receiver Design for Noncoherent Digital Network Coding

Fundamentals of Data and Signals

Lecture 9b Convolutional Coding/Decoding and Trellis Code modulation

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

The information carrying capacity of a channel

Advanced Digital Signal Processing Part 2: Digital Processing of Continuous-Time Signals

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Info theory and big data

Chapter 3 LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING COMPRESSED ENCRYPTED DATA USING VARIOUS FILE FORMATS

Chapter 2: Fundamentals of Data and Signals

Chapter 2. Physical Layer

Transcription:

Comm. 50: Communication Theory Lecture 6 - Introduction to Source Coding

Digital Communication Systems Source of Information User of Information Source Encoder Source Decoder Channel Encoder Channel Decoder Modulator De-Modulator Channel Communication systems are designed to transmit the information generated by a source to some destination.

Types of Information Sources Types of Information Sources Analog Discrete Analog : The output is analog signals (examples: TV and Radio broadcasting). Discrete : The output is discrete (a sequence of letters or symbols (example: Computers, storage devices, ).

Source Encoder Whether a source is analog or discrete, a digital communication system is designed to transmit information in digital form. Consequently, the output of the source must be converted to a format that can be transmitted digitally. This conversion of the source output to a digital form is generally performed by the source encoder, whose output is a sequence of binary digits. Ex: ASCII code which converts characters to binary bits.

Why Source Coding is Important It enables us to determine: - The amount of information from a given source. - The minimum storage and bandwidth needed to transfer data from a given source. - The limit on the transmission rate of information over a noisy channel? - Data compression.

Fixed Length Codes The standard character codes are of fixed length such as 5, 6, or 7 bits. The length is usually chosen so that there are enough binary characters to assign a unique binary sequence to each input alphabet character. Fixed length codes have the property that character boundaries are separated by a fixed bit count. Example: This allows the conversion of a serial data stream to a parallel data stream by a simple bit counter.

Variable Length Codes Data compression codes are often variable length codes. We expect that the length of a binary sequence assigned to each alphabet symbol is inversely related to the probability of that symbol. A significant amount of data compression can be realized when there is a wide differences in probabilities of the symbols. To achieve this compression, there must also be a sufficiently large number of symbols.

Discrete Information Source Information Source Assume S Pr S S k pk Information Source Generates a group of symbols from a given alphabet S Each symbol has a probability: p k o, S 1,..., SK 1 - Symbols are independent S Information Source s, s,..., s 0 1 K 1 Pr s p,k 0, 1,...,K 1 k k K-1 k 0 p k 1

If =1: Measure of Information - The occurrence of the event does not correspond to any gain of information (i.e there is no uncertainty). In this case, there is no need for communications because the receiver knows everything. As p k p k decreases, The uncertainty increases The reception of s k corresponds to some gain in information. BUT HOW MUCH?

Measure of Information (Cont.) Information is measured by: (1) Self information () Entropy We use the probability theory to quantify and measure information.

(1) Self Information The amount of information in bits about a symbol is closely related to its probability of occurrence. A function which measures the amount of information after observing the symbol sk is the self-information: I ( s k ) 1 I( sk ) log log p p k k [bits] 0 1 P S k

Properties of Self Information Properties of I(s): I( s k ) 1) I(s) 0 (a real nonnegative measure). 0 1 P S k ) I( S k ) I( S ) i if P k P i 3) I(s) is a continuous function of p.

() Entropy Entropy is the average amount of information of a finite discrete source, More precisely, it is the average number of bits per symbol required to describe that source. For a source containing N independent symbols, its entropy is defined as H Since Then E( I( S N i1 H H i1 Unit of entropy: bits/symbol (Infor. bits/symb) p i I( S i i )) I( S i ) log N i1 N ) p i log ( p i ) ( p p i log ( p i i ) )

H is a Properties of Entropy positive quantity H 0 The unit bit is a measure of information content and is not to be confused with the term bit meaning binary digit. If all a priori probabilities are equally likely ( P i 1/ N for all N symbols) then the entropy is maximum and given by: H log N Then 0 H log N

0 H log N If p i H all a p N) N (1/ N) p Proof 1/ N for all symbols N i1 (1/ log log priori probabilities i log N i1 (1/ N)[ N ( i log log ) (1/ N) (1/ N)] are equally likely

Example A source puts out one of five possible messages during each message interval. The probs. of these messages are p 1 = 1 ; p = 1 ; 4 p 3 = 1 : p 4 = 1 1, p 5 = 4 16 16 What is the information content of these messages? I (m 1 ) = - log 1 1 4 = 1 bit I (m ) = - log = bits I (m 3 ) = - log 1 8 1 16 = 3 bits I (m 4 ) = - log = 4 bits I (m 5 ) = - log 1 16 = 4 bits

Entropy Example Find and plot the entropy of the binary code in which the probability of occurrence for the symbol 1 is p and for the symbol 0 is 1-p H i1 P p log i log p P i 1 plog 1 p 1 H p v logv 0 as v 0 0 1/ 1 p 1 1 1 1 1 1 1 p H log log 1bit/symbol 1 1 1 3 3 p H log log 0.8113 bits/symbol 4 4 4 4 4 0 H 0 bit/symbol; p 1 H 0 bit/symbol

Average Information Content in English Language Calculate the average information in bits/character in English assuming each letter is equally likely Since characters do not appear with the same frequency in English, use the probabilities P=0.70 for a,e,o,t P=0. for h,i,n,r,s P=0.08 for c,d,f,l,m,p,u,y P=0.0 for b,g,j,k,q,v,w,x,z H 6 1 1 log 6 6 i1 4.7 bits / char Solve this problem

Objective Source Coding Efficient representation of data generated by an information source What Does the word EFFIECIENT Mean? Efficient Source Coding means: - Minimum average number of bits per source symbol

How could we be EFFICIENT in Source Coding? By using knowledge of the statistics of the source Clearly: Frequent source symbols should be assigned SHORT CODEWORDS Rare source symbols should be assigned LONGER CODEWORDS Example Morse Code E is represented by:. Q is represented by:.

Morse Code Morse code is a method of transmitting text information as a series of on-off tones, lights, or clicks that can be directly understood by a skilled listener or observer without special equipment. The letters is transmitted as standardized sequences of short and long signals called "dots" and "dashes. The duration of a dash is three times the duration of a dot. Each dot or dash is followed by a short silence, equal to the dot duration.

Morse Code The letters of a word are separated by a space equal to three dots (one dash), and the words are separated by a space equal to seven dots. The dot duration is the basic unit of time measurement in code transmission. For efficiency, the length of each character in Morse is approximately inversely proportional to its frequency of occurrence in English. Thus, the most common letter in English, the letter "E," has the shortest code, a single dot.

Morse Code

Average Code Length Information Source s k Source Encoder c k Source has K symbols Each symbol s k has probability p k Each symbol s k is represented by a codeword c k of length v k bits Average codeword length Variance of the code length: L K k1 K k1 p k v k p k ( v k L)

Example: Average Codeword Length L L = 0.5() +0.30() + 0.1(3) +0.15(3) + 0.18() =.7 bits It does not mean that we have to find a way to transmit a noninteger number of bits. Rather, it means that on the average the length of the code is.7 bits. K k1 p k v k Symbol p (S) Code A 0.5 11 B 0.30 00 C 0.1 010 D 0.15 011 E 0.18 10 Calculate the variance of the code length

Code Efficiency L represents the average number of bits per source symbol used in the source encoding process. L min If denotes the minimum possible codeword length, the coding efficiency of the source encoder is defined as L min An efficient code means η1 What is L min? L

Shannon s First Theorem: The Source Coding Theorem L HS L min = H(S) The outputs of an information source cannot be represented by a source code whose average length is less than the source entropy

Compression Ratio We define the compression ratio as: CR Number of bits of the fixed code that represents the symbols Average code length of the variable length code We define the code efficiency as: Code efficiency Entropy Average code length H( S) L L L min It measures how much the code achieves from the possible compression ratio.

Source Symbols k Symbol Probability p k Example Symbol Code word Code I Code word Length Symbol Code word Code II Code word Length c k v k c k v k s 0 1/ 00 0 1 s 1 1/4 01 10 s 1/8 10 110 3 s 3 1/8 11 1111 4 Source Entropy: H i1 H(S) =1/log ()+1/4log (4)+ 1/8log (8)+1/log (8)= 1.75 bits/symbol= Lmin CR 1 CR 1.067 1.875 Code I 1 1 1 1 L 4 8 8 1.75 74 0 875. N p i log (1/ p i L ) K p k v k Code II k1 1 1 1 1 7 L 1 3 4 3 1.875 4 8 8 4 H ( S) 1.75 74 bits L 1 1.875 74 0.9333