HDL LIBRARY OF PROCESSING UNITS FOR GENERIC AND DVB-S2 LDPC DECODING

Similar documents
A New Space-Repetition Code Based on One Bit Feedback Compared to Alamouti Space-Time Code

x y z HD(x, y) + HD(y, z) HD(x, z)

CHAPTER 5 A NEAR-LOSSLESS RUN-LENGTH CODER

A study on the efficient compression algorithm of the voice/data integrated multiplexer

Design of FPGA- Based SPWM Single Phase Full-Bridge Inverter

Application of Improved Genetic Algorithm to Two-side Assembly Line Balancing

Logarithms APPENDIX IV. 265 Appendix

A New Basic Unit for Cascaded Multilevel Inverters with the Capability of Reducing the Number of Switches

Single Bit DACs in a Nutshell. Part I DAC Basics

General Model :Algorithms in the Real World. Applications. Block Codes

Spread Spectrum Signal for Digital Communications

APPLICATION NOTE UNDERSTANDING EFFECTIVE BITS

A SELECTIVE POINTER FORWARDING STRATEGY FOR LOCATION TRACKING IN PERSONAL COMMUNICATION SYSTEMS

Design of FPGA Based SPWM Single Phase Inverter

PROJECT #2 GENERIC ROBOT SIMULATOR

High-Order CCII-Based Mixed-Mode Universal Filter

Analysis of SDR GNSS Using MATLAB

Intermediate Information Structures

3. Error Correcting Codes

OPTIMIZATION OF RNS FIR FILTERS FOR 6-INPUTS LUT BASED FPGAS

MEASUREMENT AND CONTORL OF TOTAL HARMONIC DISTORTION IN FREQUENCY RANGE 0,02-10KHZ.

Adaptive Resource Allocation in Multiuser OFDM Systems

PHY-MAC dialogue with Multi-Packet Reception

COMPRESSION OF TRANSMULTIPLEXED ACOUSTIC SIGNALS

ELEC 204 Digital Systems Design

4. INTERSYMBOL INTERFERENCE

DIGITALLY TUNED SINUSOIDAL OSCILLATOR USING MULTIPLE- OUTPUT CURRENT OPERATIONAL AMPLIFIER FOR APPLICATIONS IN HIGH STABLE ACOUSTICAL GENERATORS

Reducing Power Dissipation in Complex Digital Filters by using the Quadratic Residue Number System Λ

High Speed Area Efficient Modulo 2 1

Compound Controller for DC Motor Servo System Based on Inner-Loop Extended State Observer

SIDELOBE SUPPRESSION IN OFDM SYSTEMS

Cooperative Diversity Based on Code Superposition

Delta- Sigma Modulator with Signal Dependant Feedback Gain

Comparison of Frequency Offset Estimation Methods for OFDM Burst Transmission in the Selective Fading Channels

A Novel Three Value Logic for Computing Purposes

INCREASE OF STRAIN GAGE OUTPUT VOLTAGE SIGNALS ACCURACY USING VIRTUAL INSTRUMENT WITH HARMONIC EXCITATION

WAVE-BASED TRANSIENT ANALYSIS USING BLOCK NEWTON-JACOBI

Novel pseudo random number generation using variant logic framework

A New Design of Log-Periodic Dipole Array (LPDA) Antenna

Sapana P. Dubey. (Department of applied mathematics,piet, Nagpur,India) I. INTRODUCTION

SEVEN-LEVEL THREE PHASE CASCADED H-BRIDGE INVERTER WITH A SINGLE DC SOURCE

Cancellation of Multiuser Interference due to Carrier Frequency Offsets in Uplink OFDMA

Cross-Layer Performance of a Distributed Real-Time MAC Protocol Supporting Variable Bit Rate Multiclass Services in WPANs

An Adaptive Image Denoising Method based on Thresholding

A Novel Small Signal Power Line Quality Measurement System

ECE 333: Introduction to Communication Networks Fall Lecture 4: Physical layer II

LETTER A Novel Adaptive Channel Estimation Scheme for DS-CDMA

Radar emitter recognition method based on AdaBoost and decision tree Tang Xiaojing1, a, Chen Weigao1 and Zhu Weigang1 1

Measurement of Equivalent Input Distortion AN 20

Decode-forward and Compute-forward Coding Schemes for the Two-Way Relay Channel

Data Acquisition System for Electric Vehicle s Driving Motor Test Bench Based on VC++ *

Lossless image compression Using Hashing (using collision resolution) Amritpal Singh 1 and Rachna rajpoot 2

Survey of Low Power Techniques for ROMs

Counting on r-fibonacci Numbers

X-Bar and S-Squared Charts

A Bipolar Cockcroft-Walton Voltage Multiplier for Gas Lasers

Symbol Error Rate Evaluation for OFDM Systems with MPSK Modulation

Efficient Feedback-Based Scheduling Policies for Chunked Network Codes over Networks with Loss and Delay

History and Advancement of the Family of Log Periodic Toothed Planer Microstrip Antenna

Outline. Motivation. Analog Functional Testing in Mixed-Signal Systems. Motivation and Background. Built-In Self-Test Architecture

Information-Theoretic Analysis of an Energy Harvesting Communication System

Joint Power Allocation and Beamforming for Cooperative Networks

7. Counting Measure. Definitions and Basic Properties

Comparison of Convolutional and Turbo Coding For. Broadband FWA Systems

Combined Scheme for Fast PN Code Acquisition

Reconfigurable architecture of RNS based high speed FIR filter

AkinwaJe, A.T., IbharaJu, F.T. and Arogundade, 0.1'. Department of Computer Sciences University of Agriculture, Abeokuta, Nigeria

CHAPTER 8 JOINT PAPR REDUCTION AND ICI CANCELLATION IN OFDM SYSTEMS

Cross-Entropy-Based Sign-Selection Algorithms for Peak-to-Average Power Ratio Reduction of OFDM Systems

International Power, Electronics and Materials Engineering Conference (IPEMEC 2015)

Encode Decode Sample Quantize [ ] [ ]

Some Modular Adders and Multipliers for Field Programmable Gate Arrays

A Reduced Complexity Channel Estimation for OFDM Systems with Precoding and Transmit Diversity in Mobile Wireless Channels Hlaing Minn, Dong In Kim an

Implementation of Fuzzy Multiple Objective Decision Making Algorithm in a Heterogeneous Mobile Environment

FPGA Implementation of the Ternary Pulse Compression Sequences

Chapter 3 Digital Logic Structures

Consensus-based Synchronization of Microgrids at Multiple Points of Interconnection

A 5th order video band elliptic filter topology using OTRA based Fleischer Tow Biquad with MOS-C Realization

Fingerprint Classification Based on Directional Image Constructed Using Wavelet Transform Domains

Computational Algorithm for Higher Order Legendre Polynomial and Gaussian Quadrature Method

(2) The MOSFET. Review of. Learning Outcome. (Metal-Oxide-Semiconductor Field Effect Transistor) 2.0) Field Effect Transistor (FET)

Technical Explanation for Counters

Super J-MOS Low Power Loss Superjunction MOSFETs

Measurements of the Communications Environment in Medium Voltage Power Distribution Lines for Wide-Band Power Line Communications

Message Scheduling for the FlexRay Protocol: The Dynamic Segment

Design of a Mixed Prime Factor FFT for Portable Digital Radio Mondiale Receiver

}, how many different strings of length n 1 exist? }, how many different strings of length n 2 exist that contain at least one a 1

The Detection of Abrupt Changes in Fatigue Data by Using Cumulative Sum (CUSUM) Method

Markov Modulated Punctured Autoregressive Processes for Traffic and Channel Modeling *

Power Optimization for Pipeline ADC Via Systematic Automation Design

Run-Time Error Detection in Polynomial Basis Multiplication Using Linear Codes

Roberto s Notes on Infinite Series Chapter 1: Series Section 2. Infinite series

Introduction to Wireless Communication Systems ECE 476/ECE 501C/CS 513 Winter 2003

Frequency Offset Estimation With Improved Convergence Time and Energy Consumption

Design and Construction of a Three-phase Digital Energy Meter

Nonlinear System Identification Based on Reduced Complexity Volterra Models Guodong Jin1,a* and Libin Lu1,b

Total Harmonics Distortion Reduction Using Adaptive, Weiner, and Kalman Filters

The Potential of Dynamic Power and Sub-carrier Assignments in Multi-User OFDM-FDMA Cells

Subband Coding of Speech Signals Using Decimation and Interpolation

A Radio Resource Allocation Algorithm for QoS Provision in PMP-based Systems

Transcription:

HDL LIBRARY OF PROCESSING UNITS FOR GENERIC AND DVB-S2 LDPC DECODING Marco Gomes 1,2, Gabriel Falcão 1,2, João Goçalves 1,2, Vitor Silva 1,2, Miguel Falcão 3, Pedro Faia 2 1 Istitute of Telecommuicatios, Uiversity of Coimbra, Coimbra, Portugal 2 Departmet of Electrical ad Computer Egieerig, Uiversity of Coimbra, Coimbra, Portugal 3 Chipidea Microelectroica SA, Porto, Portugal marco@co.it.pt, gff@co.it.pt, jpag@co.it.pt, vitor@co.it.pt, mfalcao@chipidea.com, faia@deec.uc.pt Keywords: Abstract: LDPC, HDL, DVB-S2, Iterative Decodig, Schedulig, Taer Graph. This paper proposes a efficiet HDL library of processig uits for geeric ad DVB-S2 LDPC decoders followig a modular ad automatic desig approach. Geeral purpose, low complexity ad high throughput bit ode ad check ode fuctioal models are developed. Both full serial ad parallel architecture versios are cosidered. Also, a dedicated fuctioal uit for a array processor LDPC decoder architecture to the DVB-S2 stadard is cosidered. Additioally, it is described a automatic HDL code geerator tool for arbitrary decoder architectures ad LDPC codes, based o the proposed processig uits ad Matlab scripts. 1 INTRODUCTION Low Desity Parity-Check (LDPC) codes (Gallager 1962; MacKay & Neal 1996) are amog the most powerful forward error correctio codes kow ad ca be applied i a vast umber of applicatios, from data storage to telecommuicatios. The existece of efficiet codig ad decodig algorithms combied with their good decodig performace called the attetio of the scietific commuity ad led already to their iclusio i the recet digital video satellite broadcastig stadard (DVB-S2) (ETSI 2005). Although simple, the decodig algorithm presets a sigificat challege from the hardware implemetatio poit of view. LDPC codes are a sub-set of liear block codes, defied by sparse parity check matrix H, to which a Taer graph (Taer 1981) ca be coupled as for ay liear block code. This bipartite graph is formed by two types of odes, Check Nodes (CN), oe per each code costrait (H rows), ad Bit Nodes (BN), oe per each bit of the codeword (H colums), with the coectios betwee them give by H. The importace of the Taer graph is reiforced by the fact that best kow LDPC decodig algorithms, amely the Sum Product Algorithm (SPA) (Gallager 1962; Che & Fossorier 2002), are all derived from the Taer Graph structure. The iterative procedure is based o a exchage of messages betwee the BN s ad CN s of the Taer graph, cotaiig believes about the value of each codeword bit with these messages (probabilities) beig represeted rigorously i their domai or, more compactly, usig logarithm likelihood ratios (LLR). The iterative procedure stops whe a valid codeword is achieved or the maximum umber of iteratios is attaied (i this case a decoder failure is declared). A simple iterative decoder ca thus be costructed by cosiderig each CN ad BN of the Taer graph as processig uits, ad the coectios betwee them as bidirectioal commuicatio chaels through which the processed iformatio is set. I this paper we propose a geeric hardware implemetatio for the CN ad BN processig uits. A full parallel decoder is impracticable whe cosiderig codes of legth 64800, as the oes that are proposed for the DVB-S2 stadard, because of the large silico area that would be eeded for a implemetatio of this type, imposed ot oly by the high umber of processig uits, but also by the huge umber of coectios betwee them (which imposes severe routig problems). Followig this lie of thought Kiele et al. (2005) have proposed a partial parallel architecture with processig uits beig shared by groups of odes, which allows a drastic reductio of the used silico area.

Aother advatage of their proposed implemetatio is the fact that it explores the particular characteristics, amely, the periodicities, of the sub-set of LDPC codes adopted by the DVB-S2 stadard (ETSI 2005), kow as LDPC-IRA (LDPC - Irregular Repeat ad Accumulate Codes). This allows the decoder to work i a recofigurable way. The fact that LDPC decoders ca be costructed takig a modular approach allows the usage of auxiliary tools/libraries i their developmet. It is possible to desig Matlab applicatio scripts, that accordig to certai parameters, are capable of creatig ad coectig the full set of module uits eeded for each decoder, accordig to the target architecture. Furthermore, these applicatio scripts will be able to automatically geerate HDL code, sice the umber of module uits ad respective itercoectios deped oly o the give parity test matrix H of the code. I the followig sectios we will describe with further detail the proposed HDL models for each processig uit. I Sectio 2 we preset a short descriptio of the LDPC-IRA codes ad the special characteristics of the oes adopted by the DVB-S2 stadard. Sectio 3 presets a brief review of the sum product algorithm i the logarithmic domai (LSPA) followig the traditioal floodig schedule approach. Alterative schedulig methods that speed up the covergece of LSPA algorithm are also referred i this sectio. I sectio 4, geeric hardware modules are proposed for the basic processig uits of a LDPC decoder. Sectio 5 describes the particular characteristics of a geeric processig uit for a array processor DVB-S2 LDPC decoder. Fially, i sectio 6, we describe the procedure of automatically geeratig Verilog/VHDL code for a LDPC decoder based o simple Matlab applicatio scripts ad previously developed libraries. 2 LDPC-IRA CODES The ew Satellite Digital Video Broadcastig stadard (DVB-S2) adopted a special class of LDPC codes kow by IRA codes (Eroz, Su & Lee 2004) as the mai solutio for the FEC system. LDPC-IRA codes ally to the powerful error correctio capabilities of the LDPC codes, a liear ecodig complexity. I fact, although the parity check matrix, H, of a LDPC code is sparse, the geerator matrix eeded for ecodig, which is obtaied from H through the Gaussia elimiatio method, is, i geeral, ot sparse, leadig to storage ad ecodig complexity problems. By restrictig the H matrix to be of the form H( ) = A k ( k) B k ( k) ( k) = a a a 1 0 0 00 01 0, k 1 a a a 1 1 0 10 11 1, k 1 0 1 1 = 0 a a a 1 1 0 k 2,0 k 2,1 k 2, k 1 a a a 0 0 1 1 k 1,0 k 1,1 k 1, k 1, (1) where A is a radom sparse matrix ad B a staircase lower triagular oe, we ca obtai a LDPC code with almost the same performace (less tha 0.1dB loss) as the best kow LDPC codes for the same code dimesios, with liear ecodig complexity. The obtaied code is systematic, c = i p, with the message/iformatio bits, i = i0 i 1 ik 1, beig associated to the A matrix, ad the parity check bits, p = p0 p 1 p k 1, to the B matrix. The correspodig BN s of the Taer Graph are kow by Iformatio Nodes (IN) ad Parity Nodes (PN) respectively. The parity bits ca be recursively calculated as: p = a i + a i + + a i 0 00 0 01 1 0, k 1 k 1 p = a i + a i + + a i + p 1 10 0 11 1 1, k 1 k 1 0 p = a i + a i + + a i + p k 1 k 1,0 0 k 1,1 1 k 1, k 1 k 1 k 2 2.1 H Periodicity. (2) The H matrices of the DVB-S2 LDPC codes have other properties beyod beig of IRA type. Some periodicity costraits were put o the pseudo-radom costructio of the A matrices, which allows a sigificat reductio o the storage requiremet of their descriptios, ad also, the desig of efficiet decodig architectures (Kiele et al. 2005). The matrix A costructio techique is based o dividig the IN s i groups of M cosecutives oes. All the IN s of a group, say group l, should have the same weight, w l, ad it is oly ecessary to choose the CN s that coect to the first IN of the group i order to specify the CN s that coect to each oe of the remaiig M 1 IN s of that group. The choice of the w l CN s that are coected to the first IN of group l, is radom with the restrictio that the resultig LDPC code is cycle-4 free ad the umber of legth 6 cycles is the shortest possible. Deotig by c1, c2,, c wl the idices of the CN s that coect to the first IN of group l, the idices of the CN s that coect to the i-th IN of that group (with i M ) ca be obtaied by:

( 1) mod ( ), ( 1) mod ( ), c1 + i q k c2 + i q k ( ) ( ) (3) cw l + i 1 q mod k, with q = ( k ) M ad M = 360 (a commo factor for all DVB-S2 supported codes). 3 SOFT-DECODING Best kow LDPC decodig algorithms (Gallager 1962) are based o a iterative message-passig betwee the BN s ad CN s of the Taer graph, cotaiig believes about the value of each codeword bit. Give a (, k ) LDPC code, we assume BPSK modulatio which maps a codeword c = ( c1, c2,, c ), oto the sequece x = ( x1, x2,, x ), accordig to c x ( 1) i i =. The, the modulated vector x is trasmitted through a additive white Gaussia oise (AWGN) chael. The received sequece is y = ( y1, y2,, y ), with yi = xi + i, where i is a radom gaussia variable with zero mea ad variace N 0 2. We deote the set of bits that participate i check m by N ( m ) ad, similarly, we defie the set of checks i which bit participates as M ( ). We also deote N ( m) \ as the set N ( m ) with bit excluded ad M ( ) \ m as the set M ( ) with check m excluded. Deotig the log-likelihood ratio (LLR) of a radom variable x as L( x) = l ( p( x = 0) p( x = 1) ), we desigate: LP - The a priori LLR of BN, derived from the received value y. Lr m - The message that is set from CN m to BN, computed based o all received messages from BN s N ( m) \. It is the LLR of BN, assumig that the CN m restrictio is satisfied. Lq m - The LLR of BN, which is set to CN m, ad is calculated, based o all received messages from CN s M ( ) \ m ad the chael iformatio, LP. LQ - The a posteriori LLR of BN. 3.1 Traditioal Floodig-Schedule Traditioally, the LDPC iterative decodig procedure follows the so-called floodig schedule approach which cosists i: all messages set by BN s are updated alltogether before beig set to CN s processig uits ad vice-versa. The Sum Product Algorithm (SPA), proposed by Gallager, is carried out i the logarithmic domai as follows: - For each ode pair (BN, CN m ), correspodig to h = 1 i the parity check matrix H of the code do: m 2y Iitializatio: Lq m = LP =, (4) 2 σ Iterative body: A. Calculate the log-likelihood ratio of message set from CN m to BN,: Lr = Lq, (5) m ' N( m) \ ' m with a b sig( a) sig( b) mi ( a, b ) + LUT 1 ( a, b), ad a+ b a b LUT a, b = log 1+ e log 1+ e. 1 ( ) ( ) ( ) B. Calculate the log-likelihood ratio of message set from BN to CN m : Lq = LP + Lr. (6) m m' m' M ( )\ m C. Compute the a posteriori pseudo-probabilities ad perform hard decodig: LQ = LP + Lr. (7) m' m' M ( ) 1 0, ˆ LQ < c =. (8) 0 LQ > 0 The iterative procedure is stopped if the decoded word ĉ verifies all parity check equatios of the code ( ch ˆ T = 0 ) or the maximum umber of iteratios is reached. 3.2 Alterative Schedulig Methods It is well kow that SPA, followig the traditioal floodig-schedule message updatig rule, is a optimum a posteriori probability (APP) decodig method whe applied to codes described by TG s without cycles (Kschischag et al. 2001). However, good codes always have cycles ad the short oes ted to degrade the performace of the iterative message-passig algorithms (results far from optimal). Motivated by the referred problem ad the speed up covergece goal, ew message-passig schedules have bee proposed (Zhag & Fossorier 2002; Sharo et al. 2004; Xiao & Baihashemi 2004). Cosiderig floodig-schedule, the messages set by BN s are updated all together (i a serial or parallel maer) before CN s messages could be updated ad, vice-versa. At each step, the messages used i the computatio of a ew message, are all from the previous iteratio. A differet approach is to use ew

iformatio as soo as it is available, so that the ext ode to be updated could use more up-to-date (fresh) iformatio. This ca be doe, for example, followig two differet strategies kow by horizotal ad vertical schedulig with a cosiderable processig gai i the umber of iteratios to reach a valid codeword (Sharo et al. 2004). Vertical-schedule operates alog the BN s that are processed i a serial maer. After a BN, says, be processed, the messages, Lr m', set by each CN m M ( ), to all the other BN s ' N ( m) \, are updated accordig to (5) takig i accout the fresh received iformatio, Lq m, from BN. This way, the ext received BN to be processed receives iformatio more updated. Horizotal-schedule strategy is similar to verticalschedule, with the oly differece that it operates alog the CN s. 4 PROCESSING UNITS FOR A GENERIC LDPC DECODER As already metioed, a simple iterative decoder ca be costructed by cosiderig each CN ad BN of the Taer graph as processig uits, ad the coectios betwee them as bidirectioal commuicatio chaels through which the processed iformatio is set. Yet, this approach presets some disadvatages (pricipally for log ad ustructured LDPC codes) from the hardware implemetatio poit of view, as the high umber of processig uits required, but also the huge umber of coectios betwee them which impose severe routig problems. However, eve for best kow hardware structured ad efficiet LDPC codes, such as the oe recetly proposed for DVB-S2 stadard (ETSI 2005; Kiele et al. 2005) or for LDPC decoders followig differet schedule approaches, the updatig procedure of a sigle BN or a sigle CN remais uchaged which meas that elemetary hardware processig uits ca be developed for both CN ad BN ad, thus, LDPC decoders ca be costructed uder a modular approach. 4.1 BN Processig Uit A BN processor should calculate the log-likelihood ratio messages set from the assiged BN to its CN s eighbours, the a posteriori pseudo-probability associated to the curret BN ad perform hard decodig takig a decisio about its bit value. Cosiderig a BN of weight w, the BN processor ca be see as a black box with w + 1 iputs, from where it receives the chael iformatio plus w CN messages, Lr m, set from the CN s coected to it, ad with w + 1 outputs, through where it commuicates the hard decodig decisio ad seds the w messages, Lq m, to the CN s coected to it. Observig equatios (6) ad (7) we ote that the message set from BN to CN m, ca easily be obtaied by Lqm = LQ Lrm. (9) The computatio procedure ca thus be optimized ad doe i serial or parallel mode. I a parallel versio the iputs are added all together, producig the value of the a posteriori pseudo-probability, LQ. The message outputs ca the be computed simultaeously by just subtractig all etries from the output of the referred adder. This type of implemetatio requires a adder capable of addig w + 1 iputs of x bits, as well as, w output x bits adders i order to be able to perform the w subtractios. This meas that a high umber of gates is required to implemet just a sigle processig uit, but has the great advatage of a miimum delay system (high throughput), allowig us to lower the clock frequecy which implies a reductio i the power cosumptio. Figure 1: High level HDL model for a BN processor uit - parallel cofiguratio. Alteratively, i a serial versio, the iputs are added o a recursive maer as show i figure 2. The Reg_Sum register is iitialized with the received chael iformatio. The output messages ca be obtaied i a parallel maer as i figure 1, or usig a full serial approach as show i figure 2, with a ew message beig obtaied at each clock cycle. This implemetatio miimizes the hardware complexity (measured i terms of umber of logic gates) at the cost of a sigificat icrease i processig time (time restrictios could require a icrease i the clock frequecy). The serial implemetatio has also the advatage of supportig the processig of a BN of ay weight, at the expese of little additioal cotrol.

Reg Etrys[0] Reg Etrys[1] Reg Etrys[2] Reg Etrys[w-1] Figure 2: High level HDL model for a BN processor uit - serial cofiguratio. 4.2 CN Processig Uit A similar approach to the oe used i the previous sectio, ca be followed i the computatio of the Lr m messages, set by a CN. I fact, the boxplus operatio defied i (5) ca be reversed as: x y = z x = z y, (10) where the boxmius operatio is defied as: ad 2 2 ( ) a b LUT a, b b, ( ) ( a + b a b a b e ) ( e ) LUT, = log 1 log 1. Figure 4: Block diagram of the Boxmius uit. Sometimes the boxplus operatio is eve more simplified, with a small decrease i performace, by cosiderig a void correctio factor. This simplificatio of the SPA algorithm is kow by Mi-Sum (Che & Fossorier 2002; Hu et al. 2001). Based o the proposed boxplus ad boxmius hardware modules, it is possible to adopt a serial or parallel cofiguratio for the CN processor (similar to the oes described for the BN processor uit). Nevertheless, the complexity of the boxplus operatio o a parallel implemetatio requires a boxplus-sum chai of all iputs accordig to figure 5. Also, Equatio (5) ca be rewritte i the followig way However, ( ) 1 Lrm = Lq' m Lq m ' N( m). (11) LUT 2 fuctios cotai logarithmic operators whose hardware implemetatio cosumes a sigificat umber of resources. Their implemetatio ca be sigificatly simplified by approximatig them by fixed poit piece-wise liear fuctios, amely, with powers of two based multiplyig factors (shifts ad adders) (Hu et al. 2001; Masera et al. 2005). Boxplus ad boxmius operatios ca both be implemeted at the cost of four additios, oe compariso ad two correctios, each ivolvig a shift ad a costat additio, as show i figure 3 ad figure 4. LUT ad ( ) Figure 5: High level HDL model for a CN processor uit - parallel cofiguratio. The advatages of oe cofiguratio compared with the other are similar to the oes that were metioed for the BN processor. However, it should be oted that the proportio of silico area, occupied by a parallel implemetatio with respect to a serial implemetatio, is i this case sigificatly higher tha the oe for the BN processor, due to the umber of operatios ivolved i the boxplus ad boxmius processig. I fact, the umber of gates required by the boxplus ad boxmius processig uits is superior to the commo add ad subtract arithmetic operatios. Figure 3: Block diagram of the Boxplus uit.

Reg Etrys[0] Reg Etrys[1] Reg Etrys[2] Reg Etrys[-1] Figure 6: High level HDL model for a CN processor uit - serial cofiguratio. 5 PROCESSING UNIT FOR A DVB-S2 LDPC DECODER The particular characteristics of LDPC-IRA codes adopted by the DVB-S2 stadard tur possible to thik i more efficiet decoder solutios that surpass the evidet limitatios of a full parallel architecture. I figure 7 is preseted the basic architecture of a partial parallel array processor decoder solutio for LDPC DVB-S2 (Kiele et al. 2005). This efficiet architecture ot oly explores the periodicities of the adopted LDPC-IRA codes, but also has the great advatage of supportig all code rates ad code legths defied by the DVB-S2 stadard, through a simple recofigurable mechaism. I this sectio we suggest a possible implemetatio for each processor or fuctioal processig uit (FU) that merges both the fuctios performed by the BN ad CN uits uit shared by differet BN s ad CN s, the serial implemetatio show i figures 2 ad 6 must be adopted. Thus, all messages are serially loaded to the fuctioal uits, with the cotrol beig based o the BN s ad CN s weights. Attedig to the fact that messages set from CN s to BN s are computed based o the previous messages received from BN s, ad vice-versa, it meas that a message value oce used ca be discarded, ad the memory place that it occupies be re-used to store the ew computed message. The shufflig etwork is resposible for the correct exchage of the messages betwee the CN s ad BN s emulatig the Taer Graph. Cosiderig the zigzag coectivity betwee PN s ad CN s, the PN s ad IN s are updated followig differet schedule methods. The traditioal floodig schedule is carried o the IN s, while PN s are updated joitly with CN s followig the horizotal schedule approach. This fact requires some modificatios o the CN processig uit from figure 6 i order to costruct the basic fuctioal uit. As referred, a sigle FU uit is shared by a costat umber of IN s, CN s ad PN s (CN s ad PN s are processed joitly), depedig o the code legth ad rate. More precisely, for a (, k ) DVB-S2 LDPC-IRA code, the FU i, with i = 0,, 359, i BN mode updates i a serial maer the followig IN s: { i, i + 360, i + 2 360,, i + ( α 1) 360}, with α = k 360. I CN mode, the same FU updates the CN s ad PN s: { j, j + 1,, j + q 1}, with j = i q ad q = ( k ) 360. The used 360 FU s operate i parallel ad share all the cotrol sigals. They are sufficiet to process i real time all the BN s ad k CN s of the code. I BN mode, oly IN s are processed ad the FU layout is similar to figure 2. FU FU FU LrPNm-1 LqPNm Lq PNm 1 Lq PNm Lp PNm Shufflig Network Figure 7: Array processor architecture for a DVB-S2 LDPC decoder. Sice the IN s are divided i groups of 360 cosecutives oes, with the properties of all the IN s of a group (i.e. their weight ad the idices of the CN s to which each oe coects) beig characterized i terms of just the 1 st IN of that group, it turs possible the simultaeous processig of each IN s set, which appreciably simplifies the decoder cotrol. At the other had, cosiderig the fact that there are BN s ad CN s with differet weights, i order to have a processig Figure 8: FU i CN mode ad zigzag coectivity betwee PN s ad CN s. I CN mode, each FU updates ot oly the associated CN s but also the correspodig PN s (ote that per each CN restrictio exists a PN bit). Attedig to the zigzag coectivity betwee PN s ad CN s, whe updatig a PN, say m, accordig to (6), it works as a simple passig ode because the message that it seds to the CN m+1 is simply the message received

from CN m added to the chael iformatio, ad viceversa (see figure 8). Sice each FU processes q cosecutive CN s, the PN s updatig ca follow a horizotal schedule approach (both PN s ad CN s processed simultaeously). This way, the message that travels through CN m, PN m ad CN m + 1 is kept i the FU ad oly the backward message that is set from CN m to PN m 1, Lr m PNm, is saved i the 1 exteral memory. The equatios that describe the operatio of the FU i CN mode are: Lrm = Lq' m LqPNm m Mem (12) ' IN ( m) \ where ( ) Lrm PN = Lq m 1 ' m Lq PN m m ' IN ( m) LQPN Mem Lrm PN (13) = + (14) m 1 m 1 Mem = Lq' m Mem + LpPN m ' IN ( m ) (15) IN m meas the set of IN coected to CN m, ad Mem the iteral memory of the FU. A problem arises whe CN s m ad m + 1 are ot processed by the same FU. This situatio occurs cyclically wheever ( m + 1) mod q = 0, which meas that if CN m is processed by the FU i, the, CN m + 1 will be processed by the FU i+1. This situatio was solved by trasferrig the cotets of memory FU i to FU i+1, with i = 0,, 358, ad FU 0 iitialized with the eutral elemet (maximum admissible LLR value). This sigificatly simplifies the system cotrol. Figure 9 presets the architecture of a FU i CN mode. Lq IN s Reg SUM Reg SUM_Out 6 AUTOMATIC HDL CODE GENERATION WITH A MATLAB PRE-PROCESSOR As metioed, a LDPC is a liear code described by a sparse parity check matrix. Also, LDPC codes with good error correctig capabilities have ormally log codeword widths (> 10000 bits per codeword) which meas that the had desig of the Verilog/VHDL decoder may seem almost impossible. Besides that, mior chages o the H matrix always have cosiderable repercussios o the structure of the correspodet LDPC decoder, eve whe the architecture priciples remai uchaged. Those simple modificatios may represet a cosiderable amout of time i the developmet of the Verilog/VHDL code of the decoder. Cosiderig the fact that LDPC decoders ca be costructed takig a modular approach ad the basic LDPC decodig operatios, such as boxplus ad boxmius, are hardware traslated by idepedet modules that ca be assembled accordigly to the decoder architecture, it allows the usage of auxiliary tools/libraries i their developmet. Followig these cosideratios, it is possible to desig Matlab libraries cotaiig the basic buildig LDPC decoder blocks. Those simple blocks (for ex. BN processig uit parallel cofiguratio), are fully cofigurable (umber of iputs, message precisio, etc.). The desig of a LDPC decoder for a particular code accordig to a previously defied architecture is, thus, achieved. A simple Matlab applicatio script receives the parity check matrix of the code, iterprets it ad, accordigly, creates ad coects a full set of module uits eeded to implemet the required decoder. The procedure is described i figure 10. Lq PNm Lp PNm Adder Lr PNm-1 H matrix Architecture Type Resolutio Reg Adder - + > 0 Hard Decod. PNm-1 Lr IN s Algorithm Implemetatio Algorithm Tests HDL Module Geeratio HDL Library FIFO Figure 9: High level HDL model for the FU architecture i CN mode. MATLAB Simulatio The FU system cotrol guaraties that equatios (12) to (15) are computed accordig to that order. HDL Sythesis FPGA Implemetatio Figure 10: Automatic HDL decoder desig flowchart.

7 CONCLUSIONS I this paper we have proposed a efficiet ad geeric HDL library of processig uits which combied with Matlab scriptig for automatic HDL code geeratio, allows a flexible approach to the costructio of geeric ad DVB-S2 LDPC decoders. This techique cosiderably reduces the desig developmet time, especially for log codes such as the oes adopted to the DVB-S2 stadard. Xiao, H. & Baihashemi, A., 2004, Graph-Based Message-Passig Schedules for Decodig LDPC Codes, IEEE Trasactios o Commuicatios, vol. 52, o. 12, pp. 2098-2105. Zhag, J. & Fossorier, M., 2002, Shuffled Belief Propagatio Decodig, Sigals, Systems ad Computers 2002. Coferece Record of the Thirty- Sixth Asilomar Coferece o, vol.1, pp. 8-15. REFERENCES Che, J. & Fossorier, M., 2002, Near Optimum Uiversal Belief Propagatio Based Decodig of Low-Desity Parity Check Codes, IEEE Trasactios o Commuicatios, vol. 50, o. 3, pp. 406-414. Eroz, M., Su, F. & Lee, L., 2004, DVBS2 low desity parity check codes with ear Shao limit performace, Iteratioal Joural of Satellite Commuicatios ad Networkig, vol. 22, o. 3, pp. 269-279. ETSI, 2005, Digital Video Broadcastig (DVB) Secod geeratio framig structure for broadbad satellite applicatios, EN 302 307 V1.1.1. Gallager, R., 1962, Low-Desity Parity-Check Codes, IRE Trasactios o Iformatio Theory, vol. IT-8, pp.21-28. Hu, X., Eleftheriou, E., Arold, D. & Dholakia, A., 2001, Efficiet Implemetatios of the Sum-Product Algorithm for Decodig LDPC Codes, IEEE GLOBECOM '01, vol. 2, pp. 1036-1036E. Kiele, F., Brack, T. & Weh, N., 2005, A Sythesizable IP Core for DVB-S2 LDPC Code Decodig, DATE 05, vol. 3, pp. 100-105. Kschischag, F., Frey, B. & Loeliger, H., 2001, Factor Graphs ad the Sum-Product Algorithm, IEEE Trasactios o Iformatio Theory, vol. 47, o. 2, pp. 498-519. MacKay, D. & Neal, R., 1996, Near Shao Limit Performace of Low Desity Parity Check Codes, IEEE Electroics Letters, vol. 32, o.18, pp. 1645-1646. Masera, G., Quaglio, F. & Vacca, F., 2005, Fiite precisio implemetatio of LDPC decoders, IEE Proceedigs-Commuicatios, vol. 152, No. 6, pp. 1098-1102. Sharo, E., Litsy, S. & Goldberger, J., 2004, A efficiet message-passig schedule for LDPC decodig, 23rd IEEE Covetio of Electrical ad Electroics Egieers i Israel Proceedigs, pp. 223-226. Taer, R., 1981, A Recursive Approach to Low Complexity Codes, IEEE Tras. Iform. Theory, vol. 27, pp. 533-547.