Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING. Whether a source is analog or digital, a digital communication

1 Chapter 1 INTRODUCTION TO SOURCE CODING AND CHANNEL CODING 1.1 SOURCE CODING Whether a source is analog or digital, a digital communication system is designed to transmit information in digital form. Consequently the output of the source must be converted to a format so that it can be transmitted digitally. This conversion of the source output to a digital form is generally performed by the source encoder whose output may be assumed to be a sequence of binary digits. Optimum coding aims to match the source and the channel for maximum reliable information transfer. The coding process involves two distinct operations namely encoding and decoding. The source encoder/decoder units match the source to the equivalent noiseless channel, provided that the source information rate falls within channel capacity. The discrete information source of Entropy H(x) and the source symbol rate r is connected to the source encoder. The binary encoder converts incoming source symbols to code words consisting of binary digits produced at some fixed rate. The encoder looks like a binary source with entropy Ω (P) and information rate rb Ω(P) rblog22 = rb. Coding does not generate additional information, nor does it destroy information provided that the code is uniquely decipherable. Thus equating the input and output information rates of the encoder,

2 R = r H(x) = rb Ω(P) rb (1.1) Or rb/r H(x)..(1.2) The quantity rb/r =N is an important parameter called the average code length. Physically, N corresponds to the average number of binary digits per source symbol. m The average code length is given by, N = Σ pi Ni.(1.3) i=1 where pi = probability of the i th symbol and Ni = Number of code elements in the code of i th symbol. A common characteristic of information bearing signals generated by a physical source (Eg. Speech signals) is that in their natural form, they contain a certain amount of redundant information, the transmission of which results in inefficient and non optimal use of power to be transmitted and channel bandwidth which are the primary resources in any communication. For efficient signal transmission, the redundant information should be removed from the information bearing signal prior to transmission. Source coding tries to remove this redundancy present in the symbols. Source Coding Theorem : The Source coding theorem states that given a discrete memory less source with m symbols of probabilities p1,p2,. pm, characterized by a certain amount of entropy H(x), it is possible to construct a code that satisfies the prefix condition and has an average code length N that satisfies the inequalities. H(x) N H(x)+1.

3 In theory, the optimum source coding achieves the lower bound N = H(x). In practice, sub-optimum coding results in N > H(x) for a reasonably good efficiency code. The ratio, R/rb = H(x)/ N 1. (1.4) is called the efficiency of source coding. Source coding deals with the task of forming efficient description of information sources. Efficient descriptions permit a reduction in the memory or bandwidth resources required to store or to transport sample realizations of the source data. The goal of source coding is to form good fidelity description of the source for a given available bit rate. The system advantage of source coding is the reduced need for the system resources of Bandwidth and/or energy per bit required to deliver a description of the source. This advantage is available in exchange for two other system resources that are computation complexity and memory. In the past decades, as the cost of these latter resources have continued to fall, the technique of source coding promises to play an ever-increasing role in future communication and storage systems. As a result of source coding a significant amount of data compression can be realized when there is a wide difference in the probabilities of the source symbols. To achieve this compression, there must also be sufficiently large number of symbols. In order to have a large set of symbols, some times a new set of symbols is formed which

4 can be derived from the original set. The source codes generated for this set are called extension codes. Source Coding has been an area of intense research activity since the publication of Shannon s Classic paper in 1948 and the paper by Huffman published in 1952. Over the years, major advances have been made in the development of highly efficient source data compression algorithms. Of particular significance is the research on universal source coding and universal quantization published by Ziv (1985), Ziv and Lempel(1977, 78) Gray (1975) and Davisson et al, (1981). A code is a set of vectors called code words. Fixed length code (FLC) is a code in which all the code words are of same code length. A Variable length code (VLC) is a code in which the code lengths of different codes are not same. A prefix code is one in which no code word forms the prefix of any other code word. Such codes are also called uniquely decipherable or instantaneous codes. Kraft inequality: A necessary and sufficient condition for the existence of a binary code with code words having lengths n1 n2 ni.nm that satisfy the prefix condition is m 2 -n i 1 i=1 (1.5)

5 1.2 CHANNEL CODING Channel coding refers to the class of signal transformations designed to improve communications performance by enabling the transmitted signals to better withstand the effects of various channel impairments, such as noise, interference and fading. In this information age, there is an ever increasing necessity not only for speed, but also for accuracy in the storage, retrieval and transmission of data. Imperfect channels or media through which messages are transmitted cause errors in the received messages. Channel coding is a technique using which these errors can be detected or even corrected. Error correcting codes offer a kind of safety net the mathematical insurance against the vagaries of an imperfect communication channel. Various types of noise corrupt the data that is being transmitted. Error correcting codes are used for correcting errors which occur when messages are transmitted over a noisy channel or when a stored data is retrieved. Since these error correcting codes try to overcome the harmful effects of noise in the channel, the encoding procedure is called channel coding. Channel coding is of two types. They are 1. Waveform coding and 2. Structured Sequences (or structured redundancy). Waveform coding deals with transforming waveforms into better waveforms to make the detection process less subject to errors. Duo Binary signaling is an

6 example of Waveform coding type of channel coding. Structured sequences deal with transforming data sequences into better sequences having structured redundancy ( redundant bits ). In this type of channel coding process, some redundancy is added in the form of extra (parity) bits for data bits in a controlled manner prior to transmission through a noisy channel. The redundant bits can then be used for the detection and correction of errors. Linear Block Codes, BCH codes, Reed-Solomon codes are few examples for this second category of channel codes. The following are the important aspects of channel coding also known as error control coding making use of the structured sequences. It is possible to detect and correct errors by adding extra bits called error check bits or parity check bits to the message bit stream. Because of the additional bits, not all bit sequences will constitute bonafide messages. It is not possible to detect and correct all errors. Addition of extra bits reduces the effective data rate through the channel. Quantitatively, the rate efficiency of a coding scheme, is defined as rb/rc.

7 1.3 CLASSIFICATION OF DISCRETE SOURCES ACCORDING TO THE PROBABILITIES OF THEIR SYMBOLS Discrete information sources can be classified into two types depending on the relative values of probabilities of their discrete symbols. A discrete source of m symbols is described by their probabilities p1, p2, p3..pm when arranged in the descending order. m If Pi = Σ pj < pi-1..(1.6) j=i+1 for every i in the range 2 i (m-2),then the source is defined as a Type I source. m If Pi = Σ pj pi-1 (1.7) j=i+1 for at least one value of i in the range 2 i (m-2),then the source is defined as a Type II source. Example 1.1 : The six symbol source with the following probabilities is to be tested whether it is Type I or Type II. The probabilities are p1= 0.42, p2 = 0.30, p3 = 0.13, p4 = 0.07, p5 = 0.05, p6 = 0.03. pi Pi 0.42 0.58 P4 < p3, P3 < p2, P2 < p1 0.30 0.28 Hence the source is a Type I source. 0.13 0.15 0.07 0.08 0.05 0.03 0.03 0.00

8 Example 1.2 : Consider the case of transmitting the following English text message consisting of several letters and special characters like space and dot. Source coding helps to describe the information sources more efficiently by removing the redundancy between the messages. The requirements of memory or bandwidth resources needed to store or to transmit the messages are reduced by source coding. This source is to be tested whether it is a Type I or Type II. Number of symbols in the message : 24 Total number of characters in the string : 245 The number of times that each symbol has occurred in the text is counted from which the probability pi of each symbol si is calculated. From these values of pi the values of Pi are calculated. All these details are shown in the Table 1.1 below.

9 Table 1.1 : DISCRETE SYMBOLS AND THEIR PROBABILITIES OF AN ENGLISH TEXTUAL MESSAGE Sl. No Symbol Frequency Probabilities, pi Pi 1 E 36 0.14693 0.85307 2 (space) 36 0.14693 0.70614 3 R 19 0.07755 0.62859 4 O 18 0.07346 0.55513 5 S 17 0.06940 0.48573 6 T 16 0.06530 0.42043 7 N 13 0.05306 0.36737 8 D 11 0.04490 0.32247 9 I 11 0.04490 0.27757 10 C 10 0.04081 0.23676 11 M 9 0.03673 0.20003 12 A 7 0.02858 0.17145 13 H 7 0.02858 0.14287 14 U 7 0.02857 0.11430 15 B 5 0.02040 0.09390 16 G 5 0.02040 0.07350 17 Y 5 0.02040 0.05310 18 F 4 0.01632 0.03678 19 L 2 0.00816 0.02862 20 W 2 0.00816 0.02046 21. (dot) 2 0.00816 0.01230 22 P 1 0.00410 0.00820 23 Q 1 0.00410 0.00410 24 V 1 0.00410 0.00000 It is observed from Table 1.1 that, P22 = 0.00820 > p21 = 0.00816 P21 = 0.01230 > p20 = 0.00816. Hence this is a type II source.