Vector-LDPC Codes for Mobile Broadband Communications Whitepaper November 23 Flarion Technologies, Inc. Bedminster One 35 Route 22/26 South Bedminster, NJ 792 Tel: + 98-947-7 Fax: + 98-947-25 www.flarion.com 9/3/4
Executive Summary In today s mobile data landscape, new efficient technologies are garnering wide scale interest throughout the globe. As carriers and vendors shift away from the status quo and analyze systems (whether CDMA, GSM, OFDM-based, etc.) based on their merits, there is a renewed emphasis on new technologies that can increase the efficiency of the system. One such innovation, rising from the development of the FLASH-OFDM air interface, is Vector- LDPC. Vector-LDPC, which stands for Vector-Low-Density Parity Check, is a powerful, new class of Forward Error Correction (FEC) codes. FEC codes are used to correct errors caused by a noisy channel, and play a critical role in the efficiency of a wireless telecommunications system. A robust FEC coding solution such as Vector-LDPC, leads to increased transmission distance, lower power requirements, and increased throughput in a wide array of communication systems, including wireless (mobile or fixed), satellite, optical fiber, storage, and wireline (cable, DSL). These efficiencies translate into less hardware, lower costs and increased performance for network operators. The performance of Vector-LDPC codes, proven through a multitude of FLASH-OFDM trials worldwide, exceeds that of Turbo Codes, today s FEC solution for 3G. Vector-LDPC offers superior coding gain, higher data rate, more flexibility, less complexity and less hardware than its 3G equivalent and other error-correction schemes, making it the ideal solution for today s high speed applications. This paper summarizes the importance of forward error correction coding to communication systems. In addition, it provides an architecture overview of the Vector-LDPC coding solution and its competitive advantages of Vector-LDPC, and shows its impact on the FLASH-OFDM mobile broadband system. 9/3/4 2
Application to communications systems Based on LDPC codes, Vector-LDPC is a new technology for increasing the coding gain and robustness against noise in a communications system. This improvement in coding performance can be used to increase the transmission distance, lower the power requirements and increase the throughput in applications such as fixed and mobile wireless, satellite, optical fiber and wired broadband (cable/dsl). The programmable parallel hardware architecture for encoding and decoding Vector-LDPC codes is available as an intellectual property (IP) core for communications and storage applications. This scalable, programmable, efficient architecture for LDPC encoding and decoding brings better noise immunity, and many high-speed communications applications stand to benefit from the large coding gains, high data rate, and smaller hardware size of these IP cores. Forward Error Correction Overview Forward error correction introduces redundancy into a data sequence using an Error- Correction Code (ECC) to protect against errors introduced by the channel (it is called forward to distinguish it from ARQ-based techniques that make use of a reverse channel). Coding gain measures the increased robustness against channel errors resulting from the use of the code in terms of energy required to deliver information. Coding gain can be exploited in various ways: increasing the reach of a communications system, decreasing the amount of power necessary for reliable transmission, and increasing the rate at which information is delivered. In previous decades, practical error correction typically involved convolutional codes decoded by the Viterbi algorithm, and/or Reed-Solomon codes. A fundamental breakthrough in errorcorrection has emerged over the last decade offering significantly more coding gain than traditional FEC methods. These new coding systems, which include Turbo Convolutional Codes (TCC), Turbo Product Codes (TPC), and Low-Density Parity-Check (LDPC) codes, share the property of having random-like structures that ensure a powerful code while admitting simple decoding using iterative methods. The fundamental ideas behind iterative decoding first appeared in 96 in the Ph.D. thesis of Robert Gallager, which introduced Low- Density Parity-Check codes. It was not until the early 99 s with the discovery of Turbo Codes that LDPC codes and the underlying ideas came to the forefront. LDPC codes have a simple and elegant structure, which has proven to be more amenable to analysis and optimization than Turbo codes. The design of LDPC codes with performance within a few hundredths of a db of Shannon s ultimate limit on capacity has been demonstrated. While Turbo Codes have gained widespread attention and have been adopted into numerous standards such as 3GPP and CDMA2, their complexity requirements are still relatively high in a typical CDMA2 receiver, the Turbo decoder consumes about 4% of the gate complexity. For most situations, properly designed LDPC codes not only have the better performance, but also offer lower complexity and greater flexibility in the code design than do Turbo Codes. LDPC codes are linear binary block codes. The codewords can be expressed as the set of all binary solutions (x,x2,,xn) to the parity check equation Hx T=, where the parity check matrix H is a binary matrix, as in Fig.. The codeword length is represented by n. Each row of H induces one parity check constraint on x. The number of independent constraints is n-k, and k is the number of information bits that can be encoded with this code. Low-Density Parity-Check (LDPC) codes are called low-density because they are defined by giving a parity-check matrix H that is sparse, i.e., it has few non-zero entries. 9/3/4 3
Another useful and common way of representing an LDPC codes is via a graphical representation called a Tanner graph, see Fig.. A Tanner graph of an LDPC code is a bipartite graph with variable nodes on one side and constraint or check nodes on the other side. Variable nodes correspond one-to-one with bits xi, hence they correspond to the columns of H. Constraint nodes correspond one-to-one with the parity checks that the bits xi must satisfy, hence they correspond to the rows of H. Edges in the graph connect constraint nodes to variable nodes, where an edge indicates the associated bit participates in the associated parity check. Thus, the edges in the graph correspond to the s in the paritycheck matrix H. A bit sequence associated to the variable nodes is a codeword if and only if the modulo 2 sum of the bits that neighbor a check node is for all check nodes. LDPC can be decoded with iterative soft-decision decoding algorithms called messagepassing algorithms. The most powerful of these algorithms is known as belief propagation. Parity-check matrix Columns Rows in matrix Tanner graph Bit nodes Parity-check nodes Edge in graph x x2 x 3 = M x 9 x x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 Bits Parity checks A A 2 A 3 A 4 A 5 A 6 Figure. A Parity-Check Matrix and Associated Tanner Graph for an LDPC Code Message-passing decoding of LDPC codes involves simple computations that can be executed in parallel. The Tanner graph serves as the framework for the message-passing algorithm that is used to decode LDPC codes. In these decoders, messages are exchanged back and forth along the edges of the graph. Outgoing messages are computed based on the incoming messages to each node using simple computations. Each message represents an estimate of the bit associated to the edge carrying the message. If the corruption of the original message is not too severe, after multiple iterations the decoder outputs a decoded codeword that then satisfies the parity-check matrix. The operation of the message-passing decoders can be understood by focusing on a single bit as follows. Consider a simplified system in which the bits of an LDPC codeword are transmitted over a communications channel such that some of them are corrupted during transmission so that a becomes a or vice versa. At the receiver, each bit node gets to see the bit that arrived at the receiver corresponding to the one that was transmitted from the equivalent node at the transmitter. The node would like to know if that bit is in error or not, so it asks all of its neighboring check nodes what they think the bit s value should be. Each neighboring check node then asks its other neighbors what their bit values are and sends back to the original bit node the modulo 2 sum of the answers returned. The bit node now has several opinions as to the bit s correct value and it must somehow reconcile these opinions. A simple solution could be to take a majority vote. Now, the answers that were returned to the check nodes might have been the received values associated to the variable nodes. Alternatively, the variable nodes could have first queried their other check node neighbors for their opinions about their bit s value and returned a combined result, forwarding information to the requesting check nodes. This gathering of information can proceed through additional 9/3/4 4
iterations and the resulting information forwarding process characterizes message-passing algorithms. In the above simple example, the opinions forwarded were single bits. Better results are obtained if, in addition, a reliability of the bit is also forwarded. With a realistic channel model, the received bits also have reliabilities associated to them. Under belief propagation, the reliabilities indicate the probability that the indicated bit is correct and the opinion-combining rules implemented at the nodes combine the messages applying the rules of probability in an optimal manner under the assumption that incoming messages are independent. In practice, the messages are not independent, and so the decoder is sub-optimal in this sense. Designing good LDPC codes amounts to finding structures so that performance is close to optimal and for which optimal performance is close to capacity. Vector-LDPC Hardware Architecture Flarion Technologies has developed a hardware architecture for encoding and decoding a specific class of LDPC codes known as Vector-LDPC codes, where the term vector refers to the parallel structure. Vector-LDPC codes were invented by Flarion s Chief Scientist, Thomas J. Richardson, who is a pioneer and expert on the analysis and design of LDPC codes. Certain structural properties have been intentionally imposed to allow for an efficient parallel hardware architecture. Fortunately, these constraints place few restrictions on the breadth of Vector-LDPC codes, allowing for the design of near-optimal LDPC codes that maximize coding gain for a variety of code rates and conditions. This hardware architecture takes advantage of the parallel structure in the Vector-LDPC codes to remove a major obstacle in the practical implementation of LDPC codes: the routing of the messages (which is similar to the interleaving used in Turbo decoding). The heart of the architecture is a dedicated programmable parallel processor that reads a description of the particular Vector-LDPC code from memory, and then performs the required parallel computations. The programmability of the Vector-LDPC architecture allows for multiple codes of different rates and block lengths to reside in the device at once, and switching between them incurs no overhead, which is important for supporting multiple users and fast link adaptation. The Flarion Vector-LDPC architecture is an efficient, flexible, highly optimized, high-speed parallel implementation of a low-complexity message-passing algorithm that results in nearcapacity performance. Some features of this architecture are as follows: Programmable: The Vector-LDPC codes are stored in memory, and can be updated without changing the hardware. This allows for great flexibility in block lengths and code rates. Supports multiple codes: The same hardware works for all LDPC codes in FLASH- OFDM, for both traffic and control channels. o o FLASH-OFDM includes over 5 distinct codes, with code rates varying from /6 to 5/6 and block lengths up to 2688 bits. It is possible to switch between codes on the fly with no incurred delay. Compact description: The descriptions of the Vector-LDPC Codes require relatively little memory 9/3/4 5
Statistical multiplexing: Input and output buffers allow for decoding iterations to be spent unevenly over different codewords, allowing for more iterations to be used on codewords that need them. The principles behind the design of the codes are: Independent Computations: The independence of computation inherent in LDPC codes is exploited to perform computations in arbitrary order. Inherent Parallelism: The parallelism in graph structure is mirrored in the hardware. Routing: The computation is structured to achieve a more efficient data flow. Interleaving is shared between memory access of vectors of messages and a generic routing mechanism. Approximate Computation: The simplicity of the decoding algorithm is exploited to find a low-complexity approximate algorithm that essentially uses only addition and shift operations with 5-bit messages. This loses less than. db from full floating point belief propagation. The flexibility of this architecture supports many different code designs, and in particular, we have taken advantage of this flexibility to introduce a new class of codes known as Multi- Edge Type LDPC codes, which is a generalization of irregular LDPC codes. The codes that Flarion uses in FLASH-OFDM are both Vector-LDPC codes (for high-speed parallel decoding) and Multi-Edge Type LDPC codes (for increased coding gain). Comparison with Turbo Codes Figure 2 shows performance curves for codes of rate ½ over the additive white Gaussian noise (AWGN) channel, showing Frame Error Rate vs. the Signal-to-Noise Ratio (SNR) in terms of Eb/N. The red curves correspond to a Vector-LDPC code, the blue curves correspond to the Turbo code in the CDMA2 standard, and the green curve corresponds to the rate ½ convolutional code appearing in the IS-95 standard. Curves for block lengths of 344 bits and 8 bits are shown. In each case, we see that the Flarion s Vector-LDPC code outperforms the Turbo Code. Meanwhile, both of these systems significantly outperform the convolutional code. The primary advantages of Flarion s Vector-LDPC implementation are lower complexity and greater flexibility, but as can be seen in Figure 2, they also offer some improved performance over Turbo Codes. 9/3/4 6
Figure 2. Performance Curves for Vector-LDPC, Turbo Codes and Convolutional Codes of Rate /2 Note that the performance curves shown in Figure 2 represent the optimal performance for these codes (in both cases) under iterative decoding. In practice, certain losses are to be expected due to computational complexity limitations and quantization loss. It is estimated that the Vector-LDPC decoder offers its performance at roughly one-eighth to one-fourth the complexity of the corresponding Turbo Code decoder. As a result, the lower complexity of Vector-LDPC decoding as compared with Turbo decoding means that in practice, fewer computational resources are need to achieve near-optimal performance from the Vector- LDPC code. Implementation in FLASH-OFDM The Vector-LDPC technology has been tightly integrated into Flarion s flash-ofdm air interface for a mobile wireless communications system and end-to-end mobile broadband networking for services and applications based on Internet Protocol (IP). Vector-LDPC codes give better performance than Turbo Codes and other error-correction schemes, yielding advantages in increased transmission distance and superior robustness for FLASH-OFDM. The flexibility of Vector-LDPC codes has been leveraged in the design of the FLASH-OFDM protocol. For the traffic channels, LDPC codewords of relatively long blocklengths (344 to 5248 bits) are used in order to obtain coding gain. For channel used for control, access and signaling, codewords of relatively FLASH-OFDM OVERVIEW FLASH-OFDM is the culmination of a focused design effort to produce a broadband mobile communications system utilizing standard Internet protocols (IP). At the core of the system is a highly efficient, mobile implementation of OFDM, along with Flarion's Vector-LDPC codes - an advanced and efficient forward error correction code scheme. OFDM is a wireless access method that combines the attributes of its two predecessors - TDMA and CDMA - to address the unique demands posed by mobile users of broadband data and packet voice applications. short length (e.g., less than 3 bits) are used in order to decrease the latency of those messages. The modulation schemes supported by FLASH-OFDM include QPSK, 6QAM, 64QAM and 256QAM. The coding rates range from /6 to 5/6, and the system uses adaptive modulation to rapidly switch between codes. 9/3/4 7
Traffic Channel Code Type Code rate Length Modulation DL Traffic Channel LDPC /6 to 5/6 344-5248 QPSK- 256QAM UL Traffic Channel LDPC /6 to 5/6 344 QPSK In Figure 3, the measured performance of these rate options in the downlink are shown in terms of Frame Error Rate (FER) vs. the signal-to-noise ratio (SNR) as measured by Es/N. These are for different combinations of code rates and modulation constellations. FER - -2-3 Downlink Downlink Vector-LDPC Code Code Performance Performance rate rate rate 2 rate 3 rate 4 rate 5 rate 6 rate 7 rate 8 rate 9 rate Sim. -4-5 -5-2 4 7 3 6 9 22 25 28 E /N s Figure 3. Performance Curves for Downlink Vector-LDPC Codes The hardware implementation of the Vector-LDPC decoder occupies relatively little hardware area. The base station uses a Virtex-E 2 FPGA where the Vector-LDPC decoder occupies 25 component logic blocks (CLB). The mobile terminals use a FLASH-OFDM ASIC chip, on which the Vector-LDPC decoder takes 45K logic gates and approximately 25 Kbits of memory (for block lengths up to about 6K bits). It should be noted that the speed of the Vector-LDPC decoder supports roughly 6 decoding iterations per codeword. This is many times more than sufficient to handle the typical situation. For example, at a FER of -2, the average number of iterations is estimated to be around 5 iterations. Hence the low complexity and parallelism of the Vector-LDPC architecture has enabled the development of very powerful hardware encoder and decoder implementations that comfortably exceed the requirements of the FLASH-OFDM air interface. Turbo Equalization for Improved Uplink Performance Turbo Equalization refers to joint demodulation and decoding in the receiver using the Turbo principle. The basic idea can be summarized as follows:. the demodulator generates its estimate of transmitted symbols (with no prior information about those symbols and no knowledge of the error-correction code) from the received channel symbols and feeds them to the decoder; 9/3/4 8
2. the decoder for the error-correction code makes a decoding attempt based on the output from the demodulator; at the end, it updates the estimates on the transmitted symbols and feeds them back to the demodulator; 3. the demodulator takes the updated estimation from the decoder as prior information on the transmitted symbols, and re-generates its estimate based on the received symbols and again feeds them to the decoder. Repeat steps 2 and 3 until decoding is successful. Vector-LDPC encoder QPSK modulation j r i = e θ si + n i AWGN Channel with phase noise Vector-LDPC decoder Demodulation Turbo equalization Figure 4. Turbo Equalization Block Diagram The basic receiver configuration for FLASH-OFDM in the uplink is shown in Figure 4. The uplink transmissions for the traffic channel are organized into sets of 7 QPSK symbols occupying the same tone on 7 contiguous OFDM -symbols. A known pilot symbol is included as one of the 7 QPSK symbols. Since the channel typically does not vary significantly over this time period, the pilot symbol is used to obtain initial channel phase information that can then in turn be used for demodulating the other 6 QPSK symbols. Turbo Equalization is used in the FLASH-OFDM uplink to increase the coding gain and improve the robustness of the system. Turbo Equalization refers to the iteration between the pilot-based demodulation process and Vector-LDPC decoding, in which the output of the LDPC decoder is used to improve the estimate of the channel. In other words, the channel phase estimate based on the pilot can be made more accurate using the other QPSK symbols whose reliability has improved due to the LDPC decoder. Figure 5 shows a single combined Tanner graph for demodulation and decoding. Messages are passed between the bit nodes and modulation nodes in order to yield joint decoding and demodulation. 9/3/4 9
Figure 5. Combined Tanner Graph for Turbo Equalization The measured performance resulting from the Turbo Equalizer is shown in Figure 6, which shows Frame Error Rate vs. SNR in terms of Es/N. Compared with a system without Turbo Equalization (i.e., using differential modulation (DQPSK) and LDPC coding), the relative performance gains are 2.dB,.8dB,.6dB,.5dB, and.db for Vector-LDPC codes with parameters (n,k) equal to (344,224), (344,432), (344,64), (344,848), and (344,56), respectively, using QPSK modulation. This increased coding gain on the uplink makes the system more robust and can be used to increase the throughput and range of the system as well. In terms of implementation, the use of Turbo Equalization adds less than a 5% complexity increase over the Vector-LDPC decoder. Since this Turbo Equalizer only needs to be implemented in the base station, the cost impact is minimal for the system, while the gain in performance is significant. 9/3/4
- Flarion Uplink Code Performance curves Frame Error Rate -2-3 -4-2 - 2 3 4 5 6 7 8 9 E s /N (db) Figure 6. Performance Curves for Uplink Vector-LDPC Code Conclusions In conclusion, Vector-LDPC codes in FLASH-OFDM yield high coding gain while requiring low-complexity in decoding. This Vector-LDPC parallel architecture requires relatively low power and small size. This architecture is flexible, with a programmable encoder and decoder that support many different code designs, codes rates and codeword lengths. The key Vector-LDPC innovations that Flarion has developed for the FLASH-OFDM system are as follows: The definition of new powerful classes ( Multi-Edge Type ) of LDPC codes that offer performance better than Turbo codes The design of near-optimal Vector-LDPC code design in the FLASH-OFDM system using in-house software tools The development of a programmable parallel hardware architecture for encoding and decoding Vector-LDPC codes The development of a hardware Turbo Equalizer for improving the performance of the uplink through joint decoding and demodulation 9/3/4
References Multi-Edge Type LDPC Codes Thomas J. Richardson and Rüdiger Urbanke To appear The Capacity of Low-Density Parity-Check Codes under Message-Passing Decoding Thomas J. Richardson and Rüdiger Urbanke IEEE Information Theory Transactions Special Issue on Codes and Graphs and Iterative Algorithms March 2 Design of Capacity-Approaching Irregular Low-Density Parity-Check Codes Thomas J. Richardson, Amin Shokrollahi and Rüdiger Urbanke IEEE Information Theory Transactions Special Issue on Codes and Graphs and Iterative Algorithms March 2 Efficient Encoding of Low-Density Parity-Check Codes Thomas J. Richardson and Rüdiger Urbanke IEEE Information Theory Transactions Special Issue on Codes and Graphs and Iterative Algorithms March 2 Analysis of Sum-Product Decoding of Low-Density Parity-Check Codes Using a Gaussian Approximation Sae-Young Chung, Thomas J. Richardson, and Rüdiger Urbanke IEEE Information Theory Transactions Special Issue on Codes and Graphs and Iterative Algorithms March 2 A Recursive Approach to Low Complexity Codes R. Michael Tanner IEEE Information Theory Transactions, vol. IT-27, no. 5 September 98 Low-Density Parity-Check Codes Robert G. Gallager MIT Press, 963 More information about Flarion can be accessed at http://www.flarion.com. 9/3/4 2