Goa, India, October Question: 4/15 SOURCE 1 : IBM. G.gen: Low-density parity-check codes for DSL transmission.

ITU - Telecommunication Standardization Sector STUDY GROUP 15 Temporary Document BI-095 Original: English Goa, India, 3 7 October 000 Question: 4/15 SOURCE 1 : IBM TITLE: G.gen: Low-density parity-check codes for DSL transmission. ABSTRACT We propose the application of low-density parity-check (LDPC) codes for DSL transmission. We present simulation results that show that LDPC codes achieve excellent performance for bandwidthefficient QAM transmission. LDPC codes do not suffer from error floors and offer some unique advantages in terms of practical implementation. They therefore represent an alternative to turbocodes for advanced coding in ADSL. 1 Contact: E. Eleftheriou ele@zurich.ibm.com S. Ölçer oel@zurich.ibm.com IBM Zurich Research Laboratory 8803 Rüschlikon, Switzerland

1. Introduction Low-density parity-check (LDPC) codes were introduced by Gallager [1] as a family of linear block codes with parity-check matrices containing mostly zeros and only a small number of ones. The sparsity of the paritycheck matrices defining these codes is key in enabling their efficient decoding by a message-passing procedure also known as the sum-product algorithm. LDPC codes and their decoding were reinvented by MacKay and Neal [], [3] in the mid 1990 s, shortly after Berrou et al. introduced the turbo-codes [4] and demonstrated the importance of iterative decoding techniques for approaching channel capacity in practical systems. Subsequently, LDPC codes have generated tremendous interest both theoretically and from an implementation viewpoint and many new developments have taken place. It is today well acknowledged that LDPC codes are as good as turbo codes. LDPC codes and turbo-codes are based on a similar design philosophy both can be described from a constrained random code ensemble viewpoint. Also the decoding techniques used for both methods can be viewed as different instantiations of the same basic decoding process. However, the soft-input soft-output BCJR algorithm [5], or suboptimal versions of it, used for turbo-decoding is rather complex while the sum-product algorithm used for LDPC decoding lends itself to parallel implementation and is computationally simpler. LDPC codes, on the other side, may lead to more stringent requirements in terms of storage. In this contribution we investigate the application of LDPC codes to the DSL transmission problem [6]. We present simulation results for three specific high-rate binary LDPC codes, which are specified in Table 1 in terms of the parameters K, the length of the information block, and N, the code length. Transmitted QAM symbols are obtained from code bits via simple Gray mapping. Table 1 summarizes the main results in terms of the net coding gains obtained at a symbol-error rate of 10 7 on an additive white Gaussian noise (AWGN) channel for three different QAM formats. We note that the coding gain values given in this table can be increased further by allowing a larger number of iterations during the decoding process: here, we have intentionally limited the number of iterations in order to shorten the simulation time on the computer. K N Rate K/N 16-QAM 64-QAM 4096-QAM Code 1 433 495 0.8747 4.9 db 4.6 db 3.5 db (3.49 b/s/hz) (5.4 b/s/hz) (10.49 b/s/hz) Code 1777 1998 0.8894 6.1 db 5.9 db 4.8 db (3.55 b/s/hz) (5.33 b/s/hz) (10.67 b/s/hz) Code 3 4095 4376 0.9358 6. db 6.1 db 5.6 db (3.74 b/s/hz) (5.61 b/s/hz) (11. b/s/hz) Table 1 : LDPC codes considered for the simulations and net coding gains achieved at a symbol-error rate of 10-7 for different QAM constellations. The spectral efficiencies are indicated in parentheses. The codes given in Table 1 are due to MacKay and have been obtained by a random construction method. These codes are used in this contribution to demonstrate the performance of LDPC codes for bandwidthefficient modulation. The results of Table 1 confirm that LDPC codes offer net coding gains that are similar to those that have been reported for turbo codes, e.g., [7]. LDPC codes achieve asymptotically an excellent performance without exhibiting error floors and admit a wide range of trade-offs between performance and decoding complexity. For these reasons, they can be considered as an alternative to turbo-codes for DSL transmission. In Section, we give a brief description of LDPC codes and present the principles of the iterative technique used to decode them. In Section 3, we study by simulation the performance of binary LDPC codes for multilevel modulation and, in Section 4, discuss implementation aspects. Finally, we give in Section 5 some conclusions and describe future work. BI-095.doc Page of 1 0.10.000 16:59

. Low-density parity-check (LDPC) codes A linear block code can be described in terms of a parity-check matrix H which satisfies Hx = 0 for all codewords x. Each row of the M N parity-check matrix defines a parity-check equation that must be satisfied by each codeword x. For example, the well-known (7,4) Hamming code is defined by the following parity-check equations x1 x 1 1 1 0 1 0 0 x x x x x (check 1) 3 1 1 0 1 0 1 0 x 4 0 1 0 1 1 0 0 1 x 5 x 6 x 7 parity-check matrix H codeword x x x 5 6 7 x x 1 1 1 check bits x x 3 3 x x 4 4 (check ) (check 3) LDPC codes differ in three major ways with respect to this simple example: they usually have long block lengths N in order to achieve near Shannon-limit performance, their parity-check matrices are defined, by construction, in nonsystematic form and exhibit a number of 1 s that is much less than M N. A parity-check matrix for a (j,k)- regular LDPC code has exactly j ones in each of its columns and k ones in each of its rows. A parity-check matrix can generally be represented by a graph with two types of nodes: the symbol nodes and the parity-check nodes (or check nodes). A symbol node n, representing code bit x n, is connected to check node m only if the (m,n)-th element of the parity-check matrix is a 1. No symbol (check) node is connected to a symbol (check) node. For example, the (7,4) Hamming code can be represented by the graph shown in Fig. 1. x 1 x x 3 x 4 x 5 x 6 x 7 check 1 check check 3 Figure 1: Graph corresponding to the parity-check matrix of the (7,4) Hamming code. We note in this specific case that, because the parity-check matrix is given in systematic form, symbol nodes x 5, x 6 and x 7 in the associated graph are connected to single distinct check nodes. The parity-check matrix of a (j,k)- regular LDPC code leads to a graph where every symbol node is connected to precisely j check nodes and every check node is connected to precisely k symbol nodes. Graphical representations of LDPC codes are useful both for deriving and for implementing the iterative decoding procedure introduced in [1]. Gallager s decoder is a message-passing decoder (in a sense to be made clear below) based on the so-called sum-product algorithm. We note that the sum-product algorithm is a general algorithm to decode codes defined on graphs..1 Encoding procedure A wide variety of other algorithms (e.g., the Viterbi algorithm, the forward/backward algorithm, the iterative turbodecoding algorithm, the fast Fourier transform, ) can also be derived as specific instances of the sum-product algorithm. BI-095.doc Page 3 of 1 0.10.000 16:59

Encoding is in principle performed by multiplying in GF() the information block u by the generator matrix G of the LDPC code: x = u G. Recall that generator and parity-check matrices satisfy the relation G H T = 0.. Decoding of LDPC codes We assume binary transmission initially. Let the codeword x with elements x i = 0 or 1 be mapped onto the vector of bipolar signals s with elements s i = +1 or 1 (i = 0, 1, N-1). The vector of channel output signals is given by y = s + n (n: vector of noise samples). The decoding procedure aims at finding the most probable codeword x such that x H T = 0. We will not repeat here the details of the LDPC decoding algorithm, which can be found in several of the references given at the end of this contribution. Fundamentally, the algorithm has two alternating parts. Beginning for example at the check nodes (first part), quantities r mn associated with check node m are updated and passed as messages to the symbol nodes checked by check node m. This operation is performed for all check nodes. In the second part, quantities q mn associated with symbol node n are updated and passed as messages to the check nodes that involve variable node n. This operation is performed for all symbol nodes. update N symbol nodes n n r mn q mn m m update M check nodes Figure : Message-passing decoding. These two alternating parts, illustrated in Fig., make up one iteration of the decoding algorithm. At each iteration, it is possible to compute a codeword estimate xˆ. Decoding is stopped if xˆ H T = 0, or if some other stopping criterion is met (e.g., maximum number of iterations). We note that the quantity r mn represents the probability that check m is satisfied, given that symbol n is fixed, say, at 0 and given the (posterior) probability for each of the other symbols entering check m. The quantity q mn represents the probability that bit n is a 0 (or a 1) given the information obtained via checks other than check m. Messages passed between the nodes need not be posterior probabilities but can be likelihood or loglikelihood ratios. In fact, various simplifications of the decoding algorithm have been explored in the published literature and can be adopted for practical implementations, see e.g., [8]. 3. Performance of LDPC codes for bandwidth-efficient communications We study in this section the application of binary LDPC codes to two-dimensional QAM transmission over an AWGN channel. Our objective is to show typical attainable performance with simple encoding and symbol mapping schemes. BI-095.doc Page 4 of 1 0.10.000 16:59

The block-diagram of the encoding and decoding processes is shown in Fig. 3. For symbol mapping, b LDPC code bits are translated into one QAM symbol taken from a b -point constellation using Gray mapping. At the receiver, soft demapping of received noisy QAM symbols provides soft information on individual code bits in the form of a posteriori probabilities. These probabilities are employed to carry out the message passing LDPC decoding procedure described in Section. Block of k info bits Rate k/n LDPC encoder Block of N code bits QAM symbol mapper Block of QAM symbols decoded information decoding success/failure message passing decoder Block of N soft info Soft demapper Block of noisy QAM symbols Figure 3: Multilevel LDPC encoding and decoding. When the employed QAM constellation is square (b even) and the in-phase and quadrature noise components are independent, it is computationally advantageous to perform soft demapping independently for the real and imaginary parts of the received complex signals. We will therefore consider only square LL QAM constellations. Symbol mapping for the real or the imaginary part of transmitted QAM symbols is performed by translating a group of b/ code bits (i0, i1,...,i b/-1 ) into one of the L real symbols A m within the set A {A 0 (L 1), A1 (L 3),..., A L/-1 1, A L/ 1,...,A L-1 (L 1)}. Denoting by y the real or the imaginary part of a noisy received signal: y A n, with A A and n an AWGN sample with variance n, the a posteriori probability that bit i is zero (alternately one) is computed as n e j b Pr(i 0 y), 0,1,..., 1, (y-a j) L-1 n e j 0 (y-a ) j where the summation in the numerator is taken over all symbols A j A for which i 0. 3.1 Performance and net coding gain Performance is measured in terms of E b /N 0, the ratio of energy-per-bit to noise-power-spectral-density: BI-095.doc Page 5 of 1 0.10.000 16:59

E N b 0 a n 1, where a is the variance of the symbols taken from the b -point QAM constellation and n is the noise variance per dimension. If a rate-k/n LDPC code is employed, each QAM symbol carries (K/N) b information bits. To determine the net coding gain from simulations, we need to compare at given error probability the signal-tonoise ratios required for data transmission with bit/symbol using uncoded and coded modulations. Since is generally a rational number, a baseline uncoded two-dimensional QAM system would need to have a noninteger number of constellation points. To avoid this difficulty, we proceed as follows. For a modulation and coding scheme transmitting bit/symbol, a normalized signal-to-noise ratio SNR norm can be defined as [9], [10] SNR norm E 1 N b 0. In the case of uncoded transmission ( b ), SNR norm allows the average symbol-error probability to be expressed as PS (E) Q( 3 SNR norm ), with Q(x) (1/ ) exp( z / ) dz, nearly independently of the constellation size provided the latter is large enough. The curve of P S (E) versus SNR norm indicates that the gap to capacity for uncoded QAM is equal to ~ 9.8 db at a symbol-error rate of 10-7. * We therefore determine the value of the normalized signal-to-noise ratio ( SNR norm, say) needed for the coded system to achieve a symbol-error rate of 10-7 and compute the net coding gain at that symbol-error rate as G 9.8 10 log (SNR ) [db]. 10 * norm We note that an upper limit to the net coding gain measured in this way is ~ 8.7 db since the Shannon limit cannot be approached to within less than 1.53 db without shaping. x 3. Simulation results Simulations were performed for the LDPC codes specified in Table 1 and for 16-QAM, 64-QAM and 4096- QAM. To measure error rates, we transmit one codeword over the AWGN channel and decode using the message-passing (sum-product) algorithm for given maximum number of iterations. We repeat this experiment a large number of times to ensure that the number of observed block errors is at least equal to 100. Figure 4 shows the effect on performance of the maximum number of iterations allowed in the decoding process for a specific code and QAM format. Very similar results are obtained for other LDPC codes and spectral efficiencies. For the subsequent simulations, we shall set the maximum number of iterations to 0 in order to limit the simulation time on the computer, with the understanding that the performance results shown can still be improved by allowing a larger value for this parameter. BI-095.doc Page 6 of 1 0.10.000 16:59

Figure 4: Effect of the maximum number of iterations for LDPC decoding with Code and 16-QAM. Figures 5, 6, and 7 give the performance achieved with Code 1, Code and Code 3, respectively, for different QAM constellations in terms of bit-error rate (BER) versus E b /N 0. Note that the different systems cannot be compared with each other based on these figures since they all involve different spectral efficiencies. Figure 5: Bit-error rate versus E b /N 0 for 16, 64 and 4096-QAM using LDPC Code 1. BI-095.doc Page 7 of 1 0.10.000 16:59

Figure 6: Bit-error rate versus E b /N 0 for 16, 64 and 4096-QAM using LDPC Code. Figure 7: Bit-error rate versus E b /N 0 for 16, 64 and 4096-QAM using LDPC Code 3. Figures 8, 9, and 10 give the performance achieved with Code 1, Code and Code 3, respectively, for different QAM constellations in terms of symbol-error rate (SER) versus SNR norm. These figures also show P S (E), the SER for uncoded modulation. The net coding gains derived from Fig. 8, 9, and 10 are those summarized in Table 1 presented earlier in Section 1. Note that the coding gains given in this table are obtained by extrapolating the results for an error rate of 10-7. This can be done since the LDPC codes do not exhibit error floors. BI-095.doc Page 8 of 1 0.10.000 16:59

Figure 8: Symbol-error rate versus SNR norm for 16, 64 and 4096-QAM using LDPC Code 1. Figure 9: Symbol-error rate versus SNR norm for 16, 64 and 4096-QAM using LDPC Code. BI-095.doc Page 9 of 1 0.10.000 16:59

Figure 10: Symbol-error rate versus SNR norm for 16, 64 and 4096-QAM using LDPC Code 3. 4. Implementation The decoding procedure described in Section is appealing from an implementation viewpoint. For example, as illustrated in Fig., parallel processing can be used to update all check nodes, or all symbol nodes, simultaneously. For regular LDPC codes, each symbol or check node can be implemented as an instantiation of a basic computation unit an ASIC macro or a DSP routine. For fixed (j,k) parameters, it is then possible to decode for different codes simply either by rearranging the connections between the computation units or redefining the routine call parameters. It is furthermore possible to decode for codes of different rates and lengths by switching some of the computation units on or off. 4.1 Complexity For reasons of implementation complexity, it is usually preferable to avoid realizing the LDPC decoding algorithm in the form where messages being passed directly represent posterior probabilities. A variety of algorithms have been described in the published literature leading to efficient implementations of the decoder at the price of some loss in achievable coding gain. For the present discussion, we consider the algorithm presented in [8], which for the signal-to-noise ratio values of interest in DSLs, leads to very small performance penalty. The complexity of this algorithm amounts to 3N(j-1) additions and M(k-3) comparisons for one iteration, with M the number of rows in the parity check matrix (this complexity could in fact even be reduced further). Accordingly, the decoding complexity per code bit and per iteration is summarized in Table (a) for the three codes studied in Section 3. Adds Comps Code 1 6 5.6 Code 9 7.7 Code 3 9 7.9 (a) BI-095.doc Page 10 of 1 0.10.000 16:59

Log-MAP Max-Log-MAP Adds Comps Mults TLUs Adds Comps Mults 460 14 16 66 14 (b) Table : Decoding complexity of (a) LDPC codes and (b) turbo-codes in number of operations per bit and per iteration. (TLU: table lookup) For comparison, we included in Table (b) the complexity of BCJR-based decoding for binary turbo-codes. We assume parallel concatenation of a pair of 16-state convolution codes as proposed in [7]. The BCJR algorithm is assumed to be realized in the log-map and the max-log-map forms. We see that the implementation complexity of LDPC decoding is significantly less than for the turbo-coding approach. The number of operations given in Table does not take into account the soft demapping by which soft information on individual code bits is obtained from channel symbols since all the algorithms require the same computational effort for this step. They do not also include the GF() operations needed for encoding. Although these exclusive-or operations are traditionally not taken into account when comparing implementation complexities, it should be kept in mind that they may represent a nonnegligible computational effort for long LDPC codes. 5. Conclusion The study presented in this contribution has considered three specific high-rate LDPC codes to demonstrate that excellent performance can be achieved by LDPC coding for DSL transmission even with simple encoding/mapping schemes. We will complement the present study by presenting a family of LDPC codes and the specific symbol mapping scheme to be used for ADSL transmission. Performance under simulated impulse noise and coding/decoding latency are two important issues that will also be discussed, but our preliminary results indicate that the LDPC codes are no worse than turbo-codes from these aspects. LDPC codes offer net coding gains that are similar to the ones obtained with turbo-codes. They furthermore present some distinct advantages, including - the absence of error floors, - a low computational complexity, and - a wide range of trade-offs for implementation. We therefore propose to add as an open item in the issues list the use of LDPC codes as an alternative for advanced coding in G.dmt.bis and G.lite.bis. This contribution is to be presented under G.gen. BI-095.doc Page 11 of 1 0.10.000 16:59

References [1] Low-density parity-check codes, R. G. Gallager, IRE Trans. Info. Theory, vol. IT-8, pp. 1-8, Jan. 196. [] Near Shannon limit performance of low density parity check codes, D. J. C. MacKay and R. M. Neal, Electron. Lett., vol. 3, no. 18, pp. 1645-1646, Aug. 1996. [3] Good error-correcting codes based on very sparse matrices, D. J. C. MacKay, IEEE Trans. on Inform. Theory, vol. 45, No., pp. 399-431, Mar. 1999. [4] Near Shannon limit error-correcting coding and decoding: Turbo-codes, C. Berrou, A. Glavieux, and P. Thitimajshima, in Proc. 1993 IEEE Int. Conf. on Communications, Geneva, Switzerland, pp. 1064-1070, May 1993. [5] Optimal decoding of linear codes for minimizing symbol error rate, L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, IEEE Trans. on Inform. Theory, vol. IT-0, pp. 84-87, Mar. 1974. [6] On advanced signal processing and coding techniques for digital subscriber lines, G. Cherubini, E. Eleftheriou, and S. Ölçer, presented at the What is next in xdsl workshop, Vienna, Austria, September 15, 000. [7] Results of the requirements requested in the coding ad hoc report (BA-108R1) for the proposed turbo codes for ADSL modems by VoCAL Technologies Ltd in BA-00R1, Temporary Document HC-073, Study Group 15/4, Huntsville, Canada, 31 Jul. 4 Aug. 000. [8] Reduced complexity iterative decoding of low-density parity check codes based on belief propagation M. P. C. Fossorier, M. Mihaljevic, and H. Imai, IEEE Trans. Comm., vol. 47, No. 5, pp. 673-680, May 1999. [9] Modulation and coding for linear Gaussian channels G. D. Forney and G. Ungerboeck, IEEE Trans. on Inform. Theory, vol. 44, No. 6, pp. 384-415, Oct. 1998. [10] Trellis precoding: combined coding, precoding and shaping for intersymbol interference channels M. V. Eyuboglu and G. D. Forney, IEEE Trans. on Inform. Theory, vol. 38, No., pp. 301-314, Mar. 199. BI-095.doc Page 1 of 1 0.10.000 16:59