Communications Overhead as the Cost of Constraints

Communications Overhead as the Cost of Constraints J. Nicholas Laneman and Brian. Dunn Department of Electrical Engineering University of Notre Dame Email: {jnl,bdunn}@nd.edu Abstract This paper speculates on a perspective for studying overhead in communication systems that contrasts the traditional viewpoint that overhead is the non-data portion of transmissions. By viewing overhead as the cost of constraints imposed on a system, information-theoretic techniques can be used to obtain fundamental limits on overhead information, and multiple constraints lead to an intriguing chain rule for overhead. In principle, protocol overhead in practical implementations can then be benchmarked against these fundamental limits in order to identify opportunities for improvement. Several examples are discussed within this developing framework. I. INTRODUCTION Because overhead can reduce the efficiency of a protocol, it is often considered a cost on the system. But it is rarely the case that the efficiency of a protocol can be improved simply by replacing overhead bits with data bits. Consider a protocol that encodes a user s messages onto n-bit packets for transmission over a noisy channel. Some of the drawbacks associated with considering overhead to be the non-data portion of a packet are: Non-data bits may be explicitly required to decode data bits. For example, a portion of the n-bit packet may be used to specify the rate of a forward error correction (FEC) code used to encode the user s message onto the remaining bits. Without knowledge of the FEC code s rate, the decoder does not know the size of the message that was sent and the message cannot be decoded. By replacing non-data bits with additional data bits, the probability of error may increase. For example, if a systematic FEC code is used to transmit k data bits, replacing the error-control bits with additional data bits will increase the probability of error for the original k data bits. It can be meaningless to explicitly distinguish data and non-data bits. For example, many systematic FEC codes have non-systematic equivalents that provide identical error-control performance; from the non-data viewpoint, the parity bits of the systematic code would be considered overhead, but it is less natural to define overhead from this perspective for the non-systematic code. The conclusion we draw from these observations is that defining what portion of a protocol is data versus what portion is overhead may not be the most relevant distinction. If the purpose of distinguishing overhead from data is to understand what gains may be realized by an improved protocol with a lower overhead cost, it seems preferable to define overhead explicitly as such. Accordingly, we consider overhead cost to be the reduction in system performance that results from a constraint on the design of a protocol or system. This paper establishes an operational definition for the overhead cost of a system constraint as the difference between baseline system performance and constrained system performance. This perspective was initially developed in [], and was inspired by Gallager s treatment of protocol information in []. Loosely speaking, Gallager s approach in [] involves applying a constraint to a source coding problem and identifying the resulting increase in rate as the protocol overhead for the system. In a channel coding problem, we might expect that additional constraints would decrease the rate of communication, and it could be natural to identify this decrease in rate as the protocol overhead. These observations suggests a broader theme for communications overhead that we attempt to develop in the sequel. The remainder of this paper is organized as follows. Section II defines block codes for a communication channel and exemplifies a number of constraints on encoders and decoders. Section III recalls the definition of channel capacity and defines overhead cost of a constraint in terms of channel capacity. Section IV provides a few computed examples and illustrations. II. CODES AND CONSTRAINTS In this section, we establish notation for channels, codes, and a variety of constraints that we explore in later sections. Consider a channel modeled by the sequence of conditional distributions Y n X n(yn x n ), on the inputs X n X n and outputs Y n Y n, n =,,... For integers M,N >, an (M,N) code consists of a message set W := {,,...,M} an encoder f : W X N a decoder g : Y N W The rate of an (M,N) code is R := log (M)/N bits per channel use. The average probability of error for an (M,N) code over channel Yn X n(yn x n ) with message W W distributed according to W (w) is [g(y N ) = W] computed over the joint distribution W (w) Xn W(x n w) Yn X n(yn x n ) with X n W(x n w) =if x n = f(w).

For an (M,N) code with w W, let x N (w) = f(w) denote the vector of outputs of the encoder, and let x k (w) denote the k-th element of x N (w), k =,,...,N. We will use f(w), x N (w), and x k (w) in different contexts to refer to the encoder. Let Γ(M,N) denote the set of all (M,N) codes. In the sequel, we will constrain this set of codes in various ways. If the indexes are clear from the context, we drop them and simply denote this set by Γ. A. Encoding Constraints Encoding constraints are common in communication systems. Several examples that we discuss in some detail include: An input constraint for a subset S X, denoted Γ S, restricts the encoder so that f : W S N. An average power constraint, denoted Γ, restricts the encoder so that N k= x k(w) /N for each w W. A repetition coding constraint of order L, where L> is an integer, denoted Γ RE,L. Repetition coding restricts an encoder so that x kl+l+ (w) = x kl+ (w) for l =,,...,L, k =,,,..., and all w W. That is, symbols occur in runs of length L in the output of the encoder. More generally, we can consider linear block codes, convolutional codes, and concatenated codes as imposing constraints on the encoder of an (M,N) code. Such involved examples are of course important, but beyond the scope of this paper. B. Decoding Constraints Decoding constraints are also common, though we tend to emphasize them less explicitly than encoding constraints. Several examples that we discuss in detail include: Maximum a posteriori (MA) decoding, denoted Γ MA, restricts the decoder to the form g MA (y N ) = arg max w W W Y N (w yn ), which depends upon an a priori distribution W (w) on the encoded message, the encoder f, and the channel law Y N X N (yn x N ). It is well known that MA decoding minimizes the average probability of error. Maximum likelihood (ML) decoding, denoted Γ ML, restricts the decoder to the form g ML (y N ) = arg max w W Y n X N (yn f(w)), which depends upon the encoder f and channel law Y N X N (yn x N ). It is well known that ML decoding corresponds to MA decoding if W (w) is a uniform distribution; therefore, ML decoding minimizes the average probability of error in this case. Joint typicality (JT) decoding, denoted Γ JT, restricts the decoder to the form g JT (y N )= min w W {w :(f(w), yn ) T(X N, Y N )} where T(X N,Y N ) is the jointly typical set for the joint distribution X N (x n ) YN X N (yn x N ) [3]. Here the distribution X N (x N ) is arbitrary, at least in principle, and the decoder depends upon it, the encoder f, and the channel law YN X N (yn x N ). Hard-decision decoding (HDD), denoted Γ HDD, corresponds to marginally estimating X k as ˆX k from Y k, i.e., symbol-by-symbol demodulation, and then applying some form of decoding to ˆX N to detect W. C. Compatible Constraints From our discussion of encoding and decoding constraints above, it should be clear that we can consider multiple constraints to restrict the class of (M,N) codes. However, it makes sense to ensure that multiple constraints are compatible. This motivates the following definition. Definition : Two constraints Γ and Γ are compatible if the set of (M,N) codes satisfying both constraints is nonempty, i.e., Γ Γ =. Unless we state otherwise, two or more constraints imposed on the same code are assumed to be compatible. III. CAACITY AND OVERHEAD COST In this section, we define a notion of overhead cost for a code constraint Γ Γ. We formulate this definition relative to channel capacity, but emphasize that overhead cost could be formulated in terms of other (fundamental) performance metrics. Definition : A rate R is achievable subject to constraint Γ Γ if there exists a sequence of ( NR,N) codes satisfying constraint Γ with average error probability tending to zero as N tends to infinity. Definition 3: The channel capacity subject to constraint Γ Γ, denoted C Γ, is the supremum of the rates achievable subject to constraint Γ. For Γ =Γ, achievability subject to Γ and C Γ correspond to the conventional definitions of achievability and channel capacity, respectively. Definition 4: The overhead cost of constraint Γ Γ, denoted O Γ, is defined as O Γ := C Γ C Γ. Clearly, we are treating overhead cost as a rate of information, which may not be appropriate in all settings. This treatment works if the performance metric is channel capacity, or -capacity for a given > [4], [5]. We stress that we have defined overhead cost in terms of the operational definition of channel capacity. This is important because, depending upon the complexity of the constraint, we may have different representations of the information capacity of the channel. For example, a general formula for channel capacity subject to general constraints is given in [4], [5]. An additive constraint over a memoryless channel leads to singleletter expression for the channel capacity [4], [5].

A. Interpretation of Coding Constraints as Overhead Cost Among the example constraints we have mentioned, it is perhaps easiest to interpret repetition coding Γ RE,L as inducing overhead cost. On a discrete noiseless channel, i.e., Y n = X n, repetition coding of order L > has overhead cost L L log X. We will show an example in Section IV for which O ΓRE,L varies between and this maximal value as a function of the channel parameters. Consider the AWGN channel with X = R, average power constraint Γ, and input constraint S = {+, }. It is less conventional to interpret the input constraint as inducing overhead cost, but fundamentally there is no difference between this constraint and the type of constraint introduced for repetition or other codes. For Γ Γ corresponding to any of the decoding constraints described in Section II-B, if O Γ >, then we can interpret this overhead as additional redundancy in the encoding required to ensure reliable decoding by the constrained decoder Γ. It is well known, however, that both maximum-likelihood and joint-typicality decoding achieve the capacity [6], [3], so that O ΓML = O ΓJT =. We will provide an example in Section IV for which O ΓHDD >. B. Overhead is Additive There are many situations in which we may want to impose more than one constraint on a code. Definition 5: The overhead cost of constraint Γ Γ relative to constraint Γ Γ, denoted O Γ Γ is defined as O Γ Γ := C Γ C Γ Γ. If Γ Γ, then O Γ Γ =, and, in particular, O Γ Γ =. On the other hand, if Γ Γ, O Γ Γ = C Γ C Γ, and, in particular, O Γ Γ = O Γ. With these definitions, we have a notion of additivity for overhead and relative overhead. Specifically, we have the following chain rule. roposition (Chain Rule for Overhead): O Γ Γ = O Γ + O Γ Γ = O Γ + O Γ Γ roof: From the definition of overhead cost, O Γ Γ = C Γ C Γ Γ =(C Γ C Γ )+(C Γ C Γ Γ ) = O Γ + O Γ Γ Adding and subtracting C Γ instead of C Γ yields the other direction. p p p p Fig.. The binary symmetric channel (BSC) with crossover probability p. p p ( p) ( p) p( p) p( p) {, } Fig.. Illustration of two uses of a binary symmetric channel (BSC) with crossover probability p under repetition coding of order L =as a single use of the binary symmetric erasure channel (BSEC). IV. EXAMLES In this section we give a few simple examples to illustrate how the conceptual framework established in the previous section can be used to compute overhead cost and to highlight how it differs from the non-data viewpoint on overhead. A. Binary Symmetric Channel with Repetition Coding of Order L = Consider communication over the binary symmetric channel (BSC), with crossover probability p, shown in Figure. The capacity of the BSC is given by C BSC (p) = h(p), () where h(p) := p log p ( p) log ( p) denotes the binary entropy function. In order to model redundancy that has been added by a higher layer, assume that the encoder must operate subject to a repetition coding constraint in which each pair of consecutive inputs to the channel are two identical symbols, i.e., Γ RE,. Two uses of the BSC with the same input symbol are equivalent to a single use of the binary symmetric erasure channel (BSEC) shown in Figure. By symmetry of the BSEC, a uniform input distribution is optimal and the constrained capacity can be computed as C ΓRE, (p) = h(3) p( p), p( p) h(3) (p, ( p) ),

. Capacity (bits per channel use).8.6.4. Overhead Cost C Γ (p): BSC capacity with repetition coding constraint C(p): Unconstrained BSC capacity...3.4.5 Crossover probability, p Rate b/ch use 3.5 3.5.5.5 C BSK C BSK,HDD O HDD BSK O BSK Fig. 3. Illustration of the overhead cost of a repetition coding constraint for the binary symmetric channel with crossover probably p. From the non-data viewpoint, the overhead cost is.5 bits per channel use for all p. where Fig. 4. channel. 5 5 5 / (db) Overhead costs of BSK and BSK with HDD for the AWGN h (3) (p,p ):= p log p p log p ( p p ) log ( p p ). The overhead cost of a repetition coding constraint for the BSC as a function of the crossover probability p is therefore given by O ΓRE, (p) = h(p) h(3) p( p), p( p) + h(3) (p, ( p) ). The baseline performance C Γ (p) and the constrained performance C ΓRE, (p) are shown in Figure 3. The overhead cost O ΓRE, (p) between them tends to zero as p approaches.5, which is in contrast to the fixed overhead cost of.5 bits per channel use for all p under the non-data viewpoint. B. Additive White Gaussian Noise Channel with Multiple Constraints Consider communication over the additive white Gaussian noise (AWGN) channel Y k = X k + Z k, where Z k is iid Gaussian N(, ). Since capacity is infinite without constraints, we start with an average power constraint Γ and corresponding capacity = log + σ as our baseline. It is relatively easy to treat a repetition coding constraint of order L on the AWGN channel, because an optimal receiver can simple average the L received values Y kl+, Y kl+,...,y kl+l for each distinct input symbol x kl+ (W), k =,,... roducing this sufficient statistic yields an equivalent channel of the form Yk = LX k + Z k with a faction /L of the channel uses, with X k still subject to average power constraint, and with Z k N(, ). Thus, the C ΓRE,L = L L iid Gaussian L The overhead O ΓRE,L = C AWGN σ L on the AWGN channel behaves similarly to the BSC case discussed earlier: the overhead tends to zero as / tends to zero, and increases as / increases. An easy pair of constraints to treat on the AWGN channel is the combination of BSK signaling and HDD, i.e., input constraint Γ {+, } and decoder constraint Γ HDD. BSK and HDD convert the AWGN channel into a BSC with crossover probability p = Q σ, where Q(x) := + π e t / dt. Thus, x C BSK,HDD = h Q. Finally, imposing only the BSK input constraint Γ {+, } leads to the capacity C BSK = R + + R, where R(x) := + e (t x) / log +e xt dt. π Figure 4 shows the AWGN channel capacity, the BSK constrained capacity, and the BSK and HDD constrained capacity. Arrows indicate the BSK overhead cost O BSK = CBSK and the overhead cost of HDD relative to BSK O HDD BSK = C BSK C BSK,HDD. It is interesting to note that for /σ

Rate b/ch use 3.5 3.5.5.5 C BSK C RE,3 C BSK,RE O RE BSK O BSK RE 5 5 5 / (db) ACKNOWLEDGMENT This work has been supported in part by NSF grants CCF5-4668 and CNS6-6595. REFERENCES [] B.. Dunn, Overhead in Communication Systems as the Cost of Constraints, h.d. dissertation, University of Notre Dame, Notre Dame, IN, Dec.. [Online]. Available: http://www.nd.edu/ jnl/pubs/ bdunn-phd-nd-.pdf [] R. G. Gallager, Basic Limits on rotocol Information in Data Communication Networks, IEEE Trans. Inform. Theory, vol., no. 4, pp. 385 398, Jul. 976. [3] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: John Wiley & Sons, Inc., 99. [4] S. Verdú and T. S. Han, A General Formula for Channel Capacity, IEEE Trans. Inform. Theory, vol. 4, no. 5, pp. 47 57, Jul. 994. [5] T. S. Han, Information Spectrum Methods in Information Theory. Berlin: Springer, 3. [6] R. G. Gallager, Information Theory and Reliable Communication. New York: John Wiley & Sons, Inc., 968. Fig. 5. Overhead costs of BSK, repetition coding, and BSK and repetition coding for the AWGN channel. below roughly db, O BSK and above roughly db O HDD BSK. This figure illustrates the utility of the chain rule for overhead, as we can isolate which constraints independently or relative to other constraints dominate the total overhead costs. Figure 5 shows the AWGN channel capacity, the BSK constrained capacity, the repetition coding capacity for L =3, and the combined BSK and repetition coding capacity for L =3. It is interesting to note that for / less than roughly 3 db, O RE BSK <O BSK RE, and vice versa for / greater than roughly 3 db.