SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON).

Similar documents
A Brief Introduction to Information Theory and Lossless Coding

Entropy, Coding and Data Compression

Huffman Coding - A Greedy Algorithm. Slides based on Kevin Wayne / Pearson-Addison Wesley

Comm. 502: Communication Theory. Lecture 6. - Introduction to Source Coding

Lecture5: Lossless Compression Techniques

Information Theory and Communication Optimal Codes

Introduction to Source Coding

Information Theory and Huffman Coding

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

Communication Theory II

Module 3 Greedy Strategy

The idea of similarity is through the Hamming

Coding for Efficiency

Module 3 Greedy Strategy

COMM901 Source Coding and Compression Winter Semester 2013/2014. Midterm Exam

# 12 ECE 253a Digital Image Processing Pamela Cosman 11/4/11. Introductory material for image compression

EXPLAINING THE SHAPE OF RSK

Exercises to Chapter 2 solutions

ECE Advanced Communication Theory, Spring 2007 Midterm Exam Monday, April 23rd, 6:00-9:00pm, ELAB 325

6.004 Computation Structures Spring 2009

Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

Monday, February 2, Is assigned today. Answers due by noon on Monday, February 9, 2015.

Chapter 3 LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING COMPRESSED ENCRYPTED DATA USING VARIOUS FILE FORMATS

Introduction to Probability

Wednesday, February 1, 2017

MAS.160 / MAS.510 / MAS.511 Signals, Systems and Information for Media Technology Fall 2007

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

MAS160: Signals, Systems & Information for Media Technology. Problem Set 4. DUE: October 20, 2003

Error-Correcting Codes

Laboratory 1: Uncertainty Analysis

Digitizing Color. Place Value in a Decimal Number. Place Value in a Binary Number. Chapter 11: Light, Sound, Magic: Representing Multimedia Digitally

Cutting a Pie Is Not a Piece of Cake

Introduction to Coding Theory

CSE 100: BST AVERAGE CASE AND HUFFMAN CODES

6.450: Principles of Digital Communication 1

(Refer Slide Time: 01:45)

5/17/2009. Digitizing Color. Place Value in a Binary Number. Place Value in a Decimal Number. Place Value in a Binary Number

1 This work was partially supported by NSF Grant No. CCR , and by the URI International Engineering Program.

A GRAPH THEORETICAL APPROACH TO SOLVING SCRAMBLE SQUARES PUZZLES. 1. Introduction

FAST LEMPEL-ZIV (LZ 78) COMPLEXITY ESTIMATION USING CODEBOOK HASHING

Computing and Communications 2. Information Theory -Channel Capacity

PROJECT 5: DESIGNING A VOICE MODEM. Instructor: Amir Asif

FREDRIK TUFVESSON ELECTRICAL AND INFORMATION TECHNOLOGY

SIGNALS AND SYSTEMS LABORATORY 13: Digital Communication

How to Make the Perfect Fireworks Display: Two Strategies for Hanabi

Teaching the TERNARY BASE

Primitive Roots. Chapter Orders and Primitive Roots

The Lempel-Ziv (LZ) lossless compression algorithm was developed by Jacob Ziv (AT&T Bell Labs / Technion Israel) and Abraham Lempel (IBM) in 1978;

ECE 4400:693 - Information Theory

#A13 INTEGERS 15 (2015) THE LOCATION OF THE FIRST ASCENT IN A 123-AVOIDING PERMUTATION

Image Enhancement in Spatial Domain

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

Session 5 Variation About the Mean

Solutions to Assignment-2 MOOC-Information Theory

code V(n,k) := words module

SOLITAIRE CLOBBER AS AN OPTIMIZATION PROBLEM ON WORDS

Enumeration of Two Particular Sets of Minimal Permutations

Bell Labs celebrates 50 years of Information Theory

Lecture 13 February 23

Solutions for the Practice Final

(Refer Slide Time: 3:11)

Lecture 18 - Counting

RMT 2015 Power Round Solutions February 14, 2015

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Yale University Department of Computer Science

Probability Paradoxes

MSI Design Examples. Designing a circuit that adds three 4-bit numbers

Variations on the Two Envelopes Problem

Handout 11: Digital Baseband Transmission

Notes for Recitation 3

Mathematics Explorers Club Fall 2012 Number Theory and Cryptography

Information & Communication

Information Theory: A Lighthouse for Understanding Modern Communication Systems. Ajit Kumar Chaturvedi Department of EE IIT Kanpur

6.02 Introduction to EECS II Spring Quiz 1

Unit 1.1: Information representation

Chapter 8. Representing Multimedia Digitally

Chapter 6: Memory: Information and Secret Codes. CS105: Great Insights in Computer Science

Three of these grids share a property that the other three do not. Can you find such a property? + mod

Channel Coding/Decoding. Hamming Method

Error Detection and Correction

Noisy Index Coding with Quadrature Amplitude Modulation (QAM)

18.8 Channel Capacity

CHAPTER 5 PAPR REDUCTION USING HUFFMAN AND ADAPTIVE HUFFMAN CODES

Basics of Error Correcting Codes

THEORY: NASH EQUILIBRIUM

Entropy Coding. Outline. Entropy. Definitions. log. A = {a, b, c, d, e}

SF2972 GAME THEORY Normal-form analysis II

Digital Communication Systems ECS 452

Module 6 STILL IMAGE COMPRESSION STANDARDS

Dominant and Dominated Strategies

ELEC3028 (EL334) Digital Transmission

GAP CLOSING. Powers and Roots. Intermediate / Senior Facilitator Guide

MA 111, Topic 2: Cryptography

6.2 Modular Arithmetic

LESSON 6. The Subsequent Auction. General Concepts. General Introduction. Group Activities. Sample Deals

Activity. Image Representation

Outline. Communications Engineering 1

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates

Lecture 1 Introduction

Transcription:

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON). 1. Some easy problems. 1.1. Guessing a number. Someone chose a number x between 1 and N. You are allowed to ask questions: Is this number larger than...?. What is the smallest number of questions that will always identify x? Try it for N = 2, 32, 1024. 1.2. Finding a fake coin. You are given N coins, one of which is fake (it is lighter than the rest). You are allowed to compare weights of any two groups of coins (each such comparison gives you three possible answers: the first group is lighter, of the same weight, or heavier, than the second). How many weightings do you need to determine which coin is fake? Try it for N = 3 and N = 81. 1.3. n-ary notation. How many digits do you need to write the number N in binary (base 2), trinary (base 3), decimal (base 10) notation? What about n-ary notation? 1.4. Guessing a letter. Someone chooses a word at random from Webster s dictionary (all words are equally likely to be chosen), and takes its first letter (we ll denote it by x). You are allowed to present that person with any group of letters of your choosing (containing letters of your choice) and ask him whether x is in this group or not. The game is played many times. Can you devise a strategy using which you will be able to guess the letter asking the smallest number of questions on average? What is that number? You might be tempted to answer log 2 26 4.7. We ll show later that it is actually possible to do this (though it involves combining several games into one; but for now let us assume that we can guess the value of 1 x N by using log 2 N questions on average). Moreover, this would be the optimal answer if all letters were equally likely to occur. However, this is not the case (see Figure 1). In reality not all values of x are equally likely. For example, in my dictionary of 22890 words, there will be very few words that start with X (154) or Z (564) and many more that start with N (941). In my electronic dictionary, most letters occurred as first letters of about 920 words, with the exception of X and Z. The frequency with which X occurs is thus only about 20% of the frequency of typical letters; the frequency of Z is about 60%. This means that the probability of seeing X or Y (80%) is thus about the same as seeing any other letter. Let s put on some special glasses that make us see all letters fine, except that when we are shown X or Z we see the same letter, Ξ. Now we effectively have a new alphabet of 25 letters: A, B, C,..., W, Y, Ξ, all of which are equally likely (roughly; the frequency of Ξ is 718). Thus we ll need 4.64 = log 2 25 choices to narrow it down to one of these letters. In most cases, this tells us which letter was chosen, except when we get the letter Ξ (which happens 728/22890, or about 3% of the time). So 3% of the time we need to ask an extra question. Thus on average, we use 4.64 + 0.03, or about 4.67 questions. 1

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON). 2 Figure 1. Frequencies of first letters in a dictionary of 22,890 English words. 2. Data transmission. We wish to transmit data via a telegraph. The telegraph is a device that can transmit, every second, one of two symbols: (dot) and (dash). In other words, every second we can transmit a binary digit, or bit. 2.1. Transmitting text. We now want to transmit English text by telegraph. One way to do this is to enumerate each symbol and transmit its binary representation. For example, if English had only 4 letters, we would then need two bits per symbol, which results in a transmission rate of 0.5 characters per second. We ll see, however, that this is not the most efficient way to go. We first make a connection between the transmission problem and the games we discussed earlier. Let us suppose that each symbol in our language is assigned a certain codeword, i.e., a combination of dots and dashes that is transmitted whenever we want to send a given symbol. For example, we may agree to that a codeword for the symbol A is. Thus if we wish to transmit the symbol A, we send this particular codeword. On the receiving end, the receiver does not know which letter we had in mind to send; to him it is an unknown symbol x. He knows, however, the list of our codewords. At first, he knows nothing about x. The moment the first bit arrives, the receiver can eliminate all codewords that do not start with that bit. Each new bit narrows the possibilities for x even further. This is exactly parallel to the game we played in problem 1.4. If we have received some bits B 1 B k so far, we know that x must be one of the letters whose codewords start with B 1 B k. Thus you can think of the next bit as the yes/no answer to the question: does x belong to the the set of letters with codewords starting with B 1 B k 1?

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON). 3 So you can think of the set of codewords as a strategy for playing the game. The first question you ask is: Is x among the letters whose codewords start with? If you get a yes for the first question, the second question is: Is x among the letters whose codewords start with?. If the answer to the first question is a no, the second question is Is x among the letters whose codewords start with?. Note that the strategy (=list of questions, i.e., the set of codewords) does not depend on x at all; what depends on x are the specific answers that we get to each question. Thus if you know the strategy and the answers, you know x! Hence our transmission problem amounts to: (a) pick a strategy (i.e., the codewords); (b) to transmit a symbol S, we send its codeword, i.e., the unique set of answers to the game questions that characterize x. The receiver then just plays the guess-the-x game. 2.2. Optimizing transmission rate. The question now is: how does one choose the strategy so as to maximize the average transmission rate (i..e, the average number of questions that need to be asked to identify x). 2.2.1. Two symbols. If we had only two symbols, Ψ 1 and Ψ 2, then it would be clear what to do: our strategy would be to ask is x = Ψ 1? and to transmit or accordingly. 2.2.2. Three symbols. If we had three symbols, Φ 1, Φ 2, Φ 3, we proceed as in problem 1.4. We choose the two lowest-frequency symbols (let s say Φ 2, Φ 3 ) and invent a new symbol, Φ 23 (which stands for Φ 2 or Φ 3 ). We then choose the most efficient way to transmit Φ 1 and Φ 23 : we send a for Φ 1 and a for Φ 23. However, in the latter case we didn t supply enough information for the receiver to determine whether we intended Φ 2 or Φ 3 (we only told him that t we didn t send Φ 1 ). So if we send a, we must follow it by a transmission that determines if we intended a Φ 2 or a Φ 3, i.e., a if we intended to send Φ 2, and a for Φ 3. To summarize, here is our code table : Symbol Φ 1 Φ 2 Φ 3 Transmitted code Let us see what happens in the specific case that Φ 1 occurs 50% of the time and Φ 2, Φ 3 occur 25% of the time. In this case, we need to transmit 1 bit 50% of the time, and 2 bits the remaining 50% of the time. Thus on average, each character costs us 1.5 bits, so that we transmit at 1/1.5 0.67 characters per second. Note that this is clearly the most efficient way to go. To beat 1.5 bits per symbol, we would need to assign a 1-bit code to at least one symbol. The other symbols must get two bits (or else the transmission could be ambiguous: if we e.g. assign to Φ 1, to Φ 2 and some two-bit symbol (say ) to Φ 3, then one cannot unambiguously decode. It may indicate either a transmission of Φ 2 Φ 1 or Φ 3 ). But clearly we achieve the best transmission speed if we assign the shortest transmission code to the most frequent symbol. 2.2.3. Four symbols. Let us now analyze the situation with 4 symbols, Φ 1, Φ 2, Φ 3, Φ 4. We proceed as before, inventing a new symbol for the two least frequent characters, e.g., Φ 3 and Φ 4. Let us call this new symbol Φ 34. Next, choose a code for Φ 1, Φ 2, Φ 34 as we did before. (Here one should take care to take the two least frequent symbols and replace them by a new symbol. It could be e.g. that the two least frequent symbols are Φ 2 and Φ 34, in which case our two new symbols are Φ 1 and Φ 234. Or it could happen that the two least frequent symbols are Φ 1 and Φ 2, whence we would take Φ 23 and Φ 34, and so on.)

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON). 4 We claim that this is the most efficient way to go. Indeed, our scheme assigns the two longest code words to the least frequent symbols, which clearly is necessary for any coding scheme. Here are two examples, with their respective transmission codes: Symbol Φ 1 Φ 2 Φ 3 Φ 4 Symbol Φ 1 Φ 2 Φ 3 Φ 4 Frequency 81% 9% 9% 1% Frequency 30% 25% 25% 20% Symbol Φ 1 Φ 2 Φ 3 Φ 4 Symbol Φ 1 Φ 2 Φ 3 Φ 4 Transmission code Transmission code These encodings require 0.81+2 0.09+3 0.10 = 1.29 bits per symbol in the first case, and 2 bits per symbol in the second case. The corresponding transmission rates are 0.78 an 0.5 symbols per second. Note that in the first case our encoding gives a 56% improvement over the obvious encoding in which every symbol receives 2 bits. 2.2.4. Problem. (a) Explain why in the first encoding above you couldn t encode Φ 3 or Φ 4 by two bits. (b) Work out a code for transmitting the English language; see Table 1 and Figure 2 for the relative frequencies of letters in English. Figure 2. Relative frequencies of English letters, %. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 8.2 1.5 2.8 4.3 12.7 2.2 2.0 6 7 0.2 0..8 4 2.4 6.7 7.5 1.9 0.1 6 6.3 9 2.8 1 2.4 0.2 2 0.1 Table 1. Frequency of occurrence of English letters (relative frequency, %).

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON). 5 3. Data compression. 3.1. Grouping. You may have noted that the transmission codes that we have invented seem to not be the most efficient. Consider for example the situation that we have two symbols, Ψ 1 and Ψ 2. Our scheme will produce a symbol rate of 1 symbol per second, no matter what the probabilities of Ψ 1 and Ψ 2 are. This looks wasteful, however, in the case that one of the symbols is very infrequent. Imagine, for example, that Ψ 1 occurs 90% of the time and Ψ 2 occurs only 10% of the time. Why should we allocate each one of them a whole bit? Clearly, Ψ 1 should get a shorter codeword than Ψ 2. Of course, the problem is that we don t know how to make codewords shorter than 1 bit. It is clear that this happens because we have too few symbols. The solution is to accumulate symbols and encode blocks of several symbols at once. In the case above, let us consider the various frequencies of pairs of symbols (we ll assume that symbols occur independently, more on this later): Pair of symbols Ψ 1 Ψ 1 Ψ 1 Ψ 2 Ψ 2 Ψ 1 Ψ 2 Ψ 2 Frequency 1% 9% 9% 81% Let us denote by Φ 1 the pair Ψ 1 Ψ 1, by Φ 2 the pair Ψ 1 Ψ 2, by Φ 3 the pair Ψ 2 Ψ 1 and by Φ 4 the pair Ψ 2 Ψ 2. We ve seen that we can transmit 4 symbols with such frequencies at the rate of 0.78 symbols per second. Since transmitting a single symbol Φ i amounts to transmitting a pair of symbols Ψ j Ψ k, the transmission rate for the original symbols Ψ 1, Ψ 2 is 2 0.78 = 1.56 characters per second, a 56% improvement over what we could do before. One can further improve the situation by accumulating more symbols. How well can you do? The answer is given by the following theorem of Shannon: Theorem 1. Assume that symbols Σ 1,..., Σ n occur with frequencies f 1,...,f n. Let H = f i log 2 f i. Then: (a) It is not possible to transmit these symbols at at average rate of more that 1/H symbols per second. (b) For any ε > 0, there is a collection of codewords that permits you to transmit at the rate of 1/H ε symbols per second. In our example, H = 0.1 log 2 0.1 + 0.9 log 2 0.9 0.47, so the optimal transmission rate predicted by Shannon s theorem is approximately 2.1 characters per second. 3.2. Problem. (a) What rate do you get if you accumulate 3 characters? (b) Compute the maximal transmission rate for the English alphabet. 3.3. The letters are not independent. Once we decide to aggregate symbols, we might as well note that the frequency of a given pair need not be the product of frequencies of the constituent symbols. For example, even the frequency of the pair QU in English is much higher than the product of the frequencies of Q and U. This is due to the fact that various letters in English are correlated. Thus in making the table of frequencies of pairs, we might as well take the real frequencies, as they occur in English. 3.4. Application: Data compression. The procedure we described is actually quite close to how many lossless compression programs work (such as ZIP). Computer data is naturally stored in bits; however, not all bit patterns are equally likely to occur in a given file (for

SOME EXAMPLES FROM INFORMATION THEORY (AFTER C. SHANNON). 6 example, the bit patterns 0000000000000000 and 1000000010000000 never occur in plain English text files). Thus our situation is akin to that of 3.1. We group bits into blocks of some length (let s say, 16 bits, which results in 65,536 blocks). We then think of each block as a separate symbol. We go on to analyze the frequencies with which the various blocks occur in original file. Next, we find an encoding scheme like in 2.2. In other words, we determine the codewords for each of our 65,536 symbols. We now write the compressed file. First, we write all the codewords at the beginning of the file. Then we read data from the original file and encode it using our encoding. If the transmission rate for our encoding is sufficiently big, the resulting file will be smaller than the original one (even counting the extra space we need to save the table of codewords). (Why?) Unzipping the file amounts to decoding the contents of the file using the symbol table. 4. Notes and further reading. The remarkable quantity H = f i log f i is called entropy. The notion of entropy arose in physics in Boltzman s treatment of thermodynamics. Amazingly enough, ideas from thermodynamics can be applied in information theory (and elsewhere in mathematics); this is in essence the basis of Shannon s work. For further reading on this topic, consider the following books: R. Ash, Information Theory, Dover Publications, New York (reprint of 1965 Interscience Publishers Edition). The first several chapters of this book require very little background, other than perhaps some basic understanding of elementary probability theory. C. Shannon and W. Weaver, The mathematical theory of communication, University of Illinois Press, 1963. This book contains both an expository introduction by W. Weaver and the book form of Shannon s original paper from 1948. Dimitri Shlyakhtenko, Department of Mathematics, UCLA.