c 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

c 2009 IEEE. Personal use of ths materal s permtted. Permsson from IEEE must be obtaned for all other uses, n any current or future meda, ncludng reprntng/republshng ths materal for advertsng or promotonal purposes, creatng new collectve works, for resale or redstrbuton to servers or lsts, or reuse of any copyrghted component of ths work n other works. do: http://dx.do.org/10.1109/icc.2009.5199493

Stochastc Decodng of LDPC Codes over GF(q) Gab Sarks, She Mannor and Warren J. Gross Department of Electrcal and Computer Engneerng, McGll Unversty, Montreal, Quebec, Canada H3A 2A7 Emal: gab.sarks@mal.mcgll.ca, she.mannor@mcgll.ca, warren.gross@mcgll.ca Abstract Nonbnary LDPC codes have been shown to outperform currently used codes for magnetc recordng and several other channels. Currently proposed nonbnary decoder archtectures have very hgh complexty for hgh-throughput mplementatons and sacrfce error-correcton performance to mantan realzable complexty. In ths paper, we present an alternatve decodng algorthm based on stochastc computaton that has a very smple mplementaton and mnmal performance loss when compared to the sum-product algorthm. We demonstrate the performance of the algorthm when appled to a GF(16) code and provde detals of the hardware resources requred for an mplementaton. I. INTRODUCTION Low-Densty Party Check (LDPC) codes are lnear block codes that can acheve performance close to the Shannon lmt under teratve decodng. Bnary LDPC codes have receved much nterest and are specfed n recent wreless and wrelne communcatons standards, for example dgtal vdeo broadcast (DVB-S2), WMAX wreless (IEEE 802.16e) and 10 ggabt Ethernet (IEEE 802.3an). Nonbnary LDPC codes defned over q-ary Galos felds (GF(q)) were ntroduced n [1] and were shown to perform better than equvalent bt-length bnary codes for addtve-whte Gaussan nose (AWGN) channels. In [2], Song et al. showed that GF(q) LDPC codes sgnfcantly outperform bnary LDPC and Reed-Solomon (RS) codes for the magnetc recordng channel. Chen et al. [3] demonstrated that LDPC codes over GF(16) perform better than RS codes for general channels wth bursts of nose; thus makng GF(q) LDPC a canddate to replace RS codng n many storage systems. Djordjevc et al. concluded that nonbnary LDPC codes acheve lower BER than other codes whle allowng for hgher transmsson rates when used wth the fber-optc channel [4]. LDPC codes over GF(q) are defned such that elements of the party check matrx H are elements of GF(q). As n the bnary case, these codes are decoded by the sum-product algorthm (SPA) appled to the Tanner graph representaton of the party-check matrx H. Unfortunately, the nonbnary values of H result n very hgh complexty for the check node updates n the graph, presentng a sgnfcant barrer to practcal realzaton. The only hardware mplementaton n the lterature s fully seral, consstng of only one varable node and one check node [5]. There have been a number of approaches to reduce the complexty of the check node update n the lterature. MacKay et al. proposed usng the fast Fourer transform (FFT) to convert convoluton to multplcaton n the check nodes [6]. Song et al. use the log-doman to replace multplcaton wth addton [2]. Declercq et al. ntroduced the extended mn-sum (EMS) algorthm as an approxmaton to the SPA, computng lkelhood values for only a subset of the feld elements; thus reducng the number of computatons performed [7]. Whle these approaches are smpler than a drect mplementaton of the SPA, there s a need to further reduce the complexty for practcal decoder mplementatons. Recently, a new approach to decodng bnary LDPC codes based on stochastc computaton ([8], [9]) was ntroduced n [10]. Stochastc decoders use random bt-streams to represent probablty messages and result n smple node hardware and reduced wrng complexty. Subsequently, area-effcent fullyparallel hgh-throughput decoders wth performance close to the SPA were demonstrated n feld-programmable gate arrays (FPGAs) [11], [12]. We realzed that the complexty benefts of stochastc decodng mght be even greater for nonbnary LDPC codes and could result n a practcal decoder mplementaton. In ths paper, we present a generalzaton of stochastc decodng to LDPC codes over GF(q). The algorthm has sgnfcantly lower hardware complexty than other nonbnary decodng algorthms n the lterature. A. Notaton II. SUM-PRODUCT DECODING Snce most dgtal systems transmt data usng 2 p symbols, the focus n current research s on codes defned over GF(2 p ). In ths secton, we descrbe the SPA for decodng LDPC codes over GF(2 p ). However, t should be noted that the SPA works on any feld GF(q) wth mnor modfcatons to notaton and channel lkelhood calculatons. The elements of GF(2 p ) can be represented as powers of the prmtve element α, or usng polynomals; the latter form s used n ths secton; so that the polynomal (x) = p l=1 lx l 1, where l are bnary coeffcents, represents an element of GF(2 p ). The notaton used n ths secton for representng nternode messages s smlar to that of [7]; namely that U and V represent messages headng n the drecton of check and varable nodes respectvely. The subscrpts represent the source and destnaton nodes. For example, U xy s a message from node x to node y. All the messages are probablty mass functon (PMF) vectors ndexed usng GF(2 p ) elements. Fg. 1a shows ths notaton appled to a Tanner graph. B. Algorthm Whle nonbnary codes can also be decoded usng SPA on Tanner graphs, the check node update s modfed because the

Ths convoluton represents a sgnfcant computatonal challenge n mplementng nonbnary LDPC decoders. III. STOCHASTIC DECODING (a) Fg. 1: Stochastc decoder graphs wth X and X 1 denotng forward and nverse permutaton operatons. (a) message labels, (b) message propagaton wth EMs added to the decoder. elements of H are nonbnary. Therefore the check constrant for a check node of degree d c s: d c h k k (x) = 0, (1) k=1 where h k s the element of H wth ndces correspondng to the check and varable nodes of nterest. Ths s dfferent from the bnary case where the check constrant s d c k=1 k(x) = 0. To accommodate ths change, Davey et al. [1] assgned values from H as labels to the edges connectng varable and check nodes and ntegrated the multplcaton nto the check node functonalty. Declercq et al. [7] ntroduced a thrd node type called the permutaton node whch connects varable and check nodes and performs multplcaton as shown n Fg. 1a; therefore, revertng the check node constrant to d c k=1 j k(x) = 0. Whle the two approaches are functonally equvalent; the one n [7] results n smpler equatons and mplementaton snce all check nodes of the same degree are dentcal. The frst step n the SPA s computng the channel lkelhood vector L v [(x)] for each varable node v whch s computed based on the channel model and modulaton scheme. The outgong message from varable node v to permutaton node z s gven by: U vz = L v d v p=1,p z (b) V pv, (2) where s the term-by-term product of vectors and d v s the varable node degree. Normalzaton s needed so that a GF (2 p ) U vz[a] = 1. Permutaton nodes mplement multplcaton by an element from H when passng messages from the varable to check nodes, and multplcaton by the nverse of an element from H n the other drecton. As shown n [7] the multplcaton and multplcaton by nverse can performed usng cyclc shfts of the postons of the values n a message vector except those values ndexed by 0. The party check constrant does not nclude multplcaton by elements of H anymore; therefore, the check node update equaton s the convoluton of ncomng messages as shown n [7]: V ct = dc p=1,p t U pa. (3) A message n the SPA for LDPC codes over GF(q) s a vector contanng the probabltes of each of the q possble symbols. Stochastc decodng uses streams of symbols chosen from GF(q) to represent these messsages; the number of occurrences of a symbol n a stream dvded by the total number of symbols observed n the stream gves the probablty of that symbol. The advantage of utlzng such a method for message passng les n the smple crcutry requred to manpulate the stochastc streams to reflect lkelhood changes as presented n Secton III-D. Stochastc decodng of bnary LDPC codes results n smple hardware structures. The reader s referred to [8], [9], [10], [11], [12] for detals on bnary stochastc decodng algorthms and ther mplementaton. Smlar notaton to the SPA s used when descrbng the stochastc decodng message updates, the dfference beng that messages are seral stochastc streams nstead of vectors; thus, an ndex t s used to denote the locaton of a symbol wthn a stream and the stream name s overlned, e.g. U vp (t). A. Node Equatons Wnstead et al. [13] presented a stochastc decodng algorthm that uses streams of ntegers nstead of the conventonal bnary streams. In that work, an nteger stream encodes the probabltes of the states n a trells, leadng to a demonstraton of trells decodng of a (16,11) Hammng code and a turbo product decoder bult from the Hammng component decoders. However, that work dd not nterpret the ntegers as fnte feld symbols and dd not utlze GF(q) arthmetc. In ths secton we present the node equatons for a stochastc decoder for LDPC codes over GF(q). Takng the vew that the nonbnary streams are composed of fnte feld elements, we present message update rules that are much smpler than those derved from a straghtforward applcaton of the rules n [13]. In partcular, the trells representaton of the convoluton n the check node reduces to Galos feld addton. Secton III-E demonstrates the performance of the stochastc algorthm when decodng a (256,128)-symbol LDPC code over GF(16). Varable Node: A stochastc varable node of degree d v takes as nput d v stochastc streams from permutaton nodes n addton to one generated based on channel lkelhood values. In [13], the output of a node s updated f ts nputs satsfy some constrant; otherwse, the output remans unchanged from the prevous teraton. To mplement a varable node constrant on an output message stream at tme t, we copy the nput symbol to the output symbol f the nput symbols on all the other ncomng edges are equal at tme t. For a stochastc varable node wth output U vp and nputs V v, we propose the followng update rule: { a f V U vp (t) = v = a, : p (4) U vp (t 1) otherwse

Usng equaton (4) and assumng the nputs are ndependent, the PMF of the output s: P [U vp (t) = c] = P [V v (t) = c] +(1 P [V v (t) = a])p [U vp (t 1) = c] a GF(q) As n [13], f the stochastc streams are assumed to be statonary, then P [U vp (t) = c] = P [U vp (t 1) = c] and the PMF of U vp (t) becomes: P [V v (t) = c] P [U vp (t) = c] = a GF(q) (5) P [V v (t) = a]. (6) Equaton (6) s dentcal to the normalzed output of a sumproduct varable node; therefore, equaton (4) s a vald update rule for the stochastc varable node. Permutaton Node: The functon of the permutaton node s to remove multplcaton by elements of H from the check node constrant. In the sum-product algorthm ths s acheved by a cyclc shft of the message vector elements as n secton II-B. Here, we demonstrate that multplyng the stochastc stream from a varable to a check node by an element of H accomplshes the same result. Assumng a permutaton node p whch corresponds to h = α, the permutaton node output message n a SPA decoder s defned such that each element n the message vector s gven by: U pc [a] = U vp [a.α ], a GF(q). When, n a stochastc decoder, the permutaton node multples all elements of the nput stream by h, the output PMF becomes: P [U pc (t) = a] = P [U vp (t) = a.α ] The SPA and stochastc output PMFs are dentcal and snce the multplcatve group of GF(q) s cyclc and multplcaton s closed on GF(q), the stochastc permutaton node operaton s equvalent to that of the SPA algorthm. Smlarly, t can be shown that for messages passed from check to varable nodes, the nverse permutaton node operaton s multplcaton by h 1. It should be noted that h 0, snce a value of 0 n H sgnfes the lack of a connecton between a varable and a check node. Therefore, there are no permutaton nodes wth a multpler h = 0. Check Node: When dervng the stochastc update message for a check node, a degree-three node s consdered and the result s generalzed to a check node of any degree. Let U 1c and U 2c be the node nputs, whch are assumed to be ndependent, and V cp ts output. From equaton (3), the output of such a node when usng the SPA s gven as: P [V cp = z U 1c, U 2c ] = P [U 1c = x]p [U 2c = y], (7) where s GF(q) addton. x y=z In the stochastc node, we defne the output as the GF(q) addton of nput,.e V cp (t) = U 1c (t) U 2c (t). The PMF of the output s computed as: P [V cp (t) = z] = P [U 1c (t) U 2c (t) = z] (8) = P [U 1c (t) = x]p [U 2c (t) = y]. x y=z The PMFs (7) and (8) are dentcal; therefore t s concluded that GF(q) s a vald update message for a degree-3 stochastc check node. Snce the output of a check node can be computed recursvely [7], the prevous concluson can be generalzed to a check node of any degree, and the output messages for these nodes are gven as: V cp (t) = d c =1, p U c (t), (9) where the summaton s GF(q) addton. It can be readly shown that the prevous node equatons reduce to the bnary ones presented n [10] for GF(2). B. Nose-Dependent Scalng and Edge-Memores In bnary stochastc decodng the swtchng actvty can become very low resultng n poor bt-error-rate performance. Ths phenomenon s called latch-up and s caused by cycles n the graph that cause the stochastc streams to become correlated nvaldatng the ndependent stream assumpton used to derve equatons (4) and (9). Two solutons were proposed n [10]: nose-dependent scalng and edge memores. Both of these methods are used to mprove the performance of the GF(q) decoder. Nose-dependent scalng ncreases swtchng actvty by scalng down the channel lkelhood values. For example, when transmttng data usng BPSK modulaton over an AWGN channel the scaled lkelhood of each receved bt l () s calculated by: l () = [l()] 2ασ 2 n Y, where l() s the unscaled bt lkelhood, σn 2 s the nose varance, and the rato α Y s determned offlne to yeld the best performance n the SNR range of nterest. Accordngly the equaton for computng the channel lkelhood values becomes: p L[(x)] = [l( k )] 2ασ 2 n Y. (10) k=1 Edge memores (EM) are fnte depth buffers nserted between varable nodes and permutaton nodes and randomly reorder symbols n the output streams of varable nodes; thus, they break correlaton between streams wthout affectng the overall stream statstcs. The EM contents are updated wth the varable node output when the node update condton s satsfed, and reman ntact otherwse. The output of the EM s that of the varable node n the frst case, or a randomly selected symbols from ts contents n the second. Due to the

Algorthm Multplcaton Addton LUT FFT-SPA [2] 2 p (d 2 c + 4d c ) p2 p+1 d c + 2 p 0 Log-FFT-SPA [2] 0 (p2 p+1 + 2 p+2 )d c p2 p+1 d c Stoc. d c 1 d c 1 0 Stoc.-LUT 0 d c 1 d c 1 TABLE I: The number of operatons needed by FFT-SPA, Log-FFT-SPA, and stochastc decoders to compute a sngle check node output message ncludng the permutaton node operatons. memory s fnte length, older symbols are dscarded when new ones are added. Fgure 1b demonstrates the message passng mechansm and the locaton of edge memores wthn a stochastc decoder. For complexty comparson, Table I provdes the number of operatons needed to compute a sngle check node output message n the FFT-SPA and Log-FFT-SPA algorthms as presented n [2]. It should be noted that the operatons for the SPA are for real numbers and quantzaton wll degrade the decoder performance; whle those for the stochastc decoder are over a fnte feld GF(2 p ). C. Algorthm Descrpton At the begnnng of the algorthm the edge memores are ntalzed usng scaled channel lkelhood values as PMFs for ther content dstrbuton. The followng steps descrbe the stochastc decodng algorthm for each decodng cycle. 1: Varable node messages are computed usng equaton (4), edge memores are updated where approprate, and messages are sent from edge memores to permutaton nodes. 2: Permutaton nodes perform GF(q) multplcaton on ncomng messages and send the results to check nodes. 3: Check node messages are computed as n equaton (9) and are sent to permutaton nodes. 4: Permutaton nodes perform GF(q) multplcaton by nverse and send resultng messages to varable nodes. 5: Each varable node contans counters C[a] correspondng to GF(q) elements. These counters are ncremented based on ncomng messages and the channel message L(t). A varable node belef s defned as arg max C[a]. 6: Varable nodes belefs are updated accordngly. The streams are processed on a symbol-by-symbol bass, one symbol each cycle (steps 1-5), untl the algorthm converges (the varable node belefs satsfy the check constrants) or a maxmum number of teratons s reached. As n the bnary algorthm presented n [10] the processng s not packetzed. D. Implementaton Whle the stochastc decodng algorthm s defned for any fnte feld; the mplementaton presented n ths secton s lmted to GF(2 p ) as these are the most utlzed felds and they yeld the smplest mplementaton. The polynomal representaton of GF(2 p ) s used when mplementng the algorthm. Ths choce greatly smplfes the crcutry needed to perform GF(2 p ) addton. All gate number estmates assume 2-nput logc gates n a tree confguraton. (a) d v = 2 var. node (b) d c = 4 chk. node Fg. 2: GF(8) stochastc elements. Varable Node: To mplement the operaton specfed by equaton 4, a GF(2 p ) equalty check s needed. XNOR gates and an AND gate are used to perform the check and provde an enable (latch) sgnal to an edge-memory as shown n Fgure (2a). To extend the crcut for a hgher order feld, more XNOR gates are used and connected to a larger AND gate. Ths accommodates the ncrease n the number of bts requred to represent each GF(2 p ) symbol n the stochastc streams. For hgher degree nodes, the number of nputs to each XNOR gate s ncreased. The total number of gates, wthout counters, requred by a varable node s: [p(d v 1)XNOR + (p 1)AND]d v. (11) Each varable node requres a maxmum of 2 p counters to track occurances of each symbol and determne the node belef. The sze of EMs assocated wth a varable node of degree d v s d v lp bts, where l s the EM length. Permutaton Node: Permutaton nodes can be mplemented usng GF(2 p ) multplers. For a partcular code, the symbols arrvng at a permutaton node are always multpled by the same element of H. As a result, the multpler can be desgned to multply by a specfc (constant) element of GF(2 p ) nstead of a generc GF(2 p ) multpler, sgnfcantly reducng crcut complexty. Alternatvely, look-up tables (LUT) can be used snce ther sze would not be large. The multplcaton by nverse for messages passed n the other drecton s mplemented n a smlar manner. If LUTs are used to mplement multplcaton, each node requres two LUTs: one for multplcaton by h and one for multplcaton by h 1. An operaton LUT contans 2 p 1 entres each p bts wde. Check Node: The outgong messages from check nodes are GF(2 p ) summatons of ncomng messages. Snce the GF(2 p ) symbols are represented usng the polynomal form, ths operaton can be realzed utlzng XOR operatons between correspondng bt lnes of messages. The crcut n Fg. 2b s an example of a degree 4 check node n GF(8). To mplement a hgher degree check node, the number of nputs to each XOR gate s ncreased to account for the extra ncomng messages. Extendng ths crcut to hgher order felds can be done by addng more XOR gates. The total number of gates requred by a check node s: [p(d c 1)XOR]d c. (12)

10 0 10-1 SP Stochastc DC max = 10 6 Stochastc DC max = 10 5 10 0 10-1 SP Stochastc DC max = 10 6 Stochastc DC max = 10 5 10-2 Frame Error Rate 10-2 10-3 10-4 Bt Error Rate 10-3 10-4 10-5 10-6 10-5 10-7 10-6 0 0.5 1 1.5 2 2.5 3 3.5 4 E b /N 0 (db) 10-8 0 0.5 1 1.5 2 2.5 3 3.5 4 E b /N 0 (db) Fg. 3: FER for a (256,128)-symbol (2,4)-regular LDPC code over GF(16). EM length = 50, α Y = 0.5. Fg. 4: BER for a (256,128)-symbol (2,4)-regular LDPC code over GF(16). EM length = 50, α Y = 0.5. SNR (db) 2.0 2.5 3.0 3.5 4.0 DCavg (DCmax = 10 6 ) 22599 8888 4243 2329 1433 DCavg (DCmax = 10 5 ) 17958 8511 4209 2326 1433 TABLE II: Average number of decodng cycles. E. Performance Fgures 3 and 4 demonstrate the performance of the stochastc decoder compared to that of a SPA decoder when decodng a (256,128)-symbol LDPC code over GF(16) [14], when usng an AWGN channel, BPSK, and random codewords. The SPA decoder has a maxmum of 1000 teratons, whle the stochastc decoder s maxmum s 10 6 decodng cycles (DC). The performance of the two decoders s very smlar and the two decoders perform dentcally for hgher SNR values. The change n the slope of the error rate graph was also observed n [14]. We note that the maxmum number of decodng cycles s much greater than the average number of decodng cycles as shown n Table II, wth DCavg determnng the decoder throughput. Fgures 3 and 4 demonstrate that, at hgher SNRs, DCmax can be reduced wth a small performance loss. It should be noted that the number of teratons n the SPA decoder and decodng cycles n the stochastc decoder are not drectly comparable. SPA teratons nvolve complex operatons, for example, the node operatons n EMS [15] nvolve sortng and teratng over ncomng message elements; thus, requrng many clock cycles. In a stochastc decoder, a decodng cycle s very smple and can be completed wthn a sngle clock cycle. Also, due to the nature of stochastc computaton, the proposed mplementaton lends tself to ppelnng (due to the random order of the messages, the feedback loop n the graph s broken allowng ppelnng [12]); thus, enablng clock rates faster than those possble wth the SPA. IV. CONCLUSION In ths paper we presented a stochastc decodng algorthm whch we expect to enable practcal hgh-throughput decodng of LDPC codes over GF(2 p ). ACKNOWLEDGEMENT The authors would lke to thank Prof. D. Declercq from ENSEA for helpful dscussons. REFERENCES [1] M. Davey and D. MacKay, Low-densty party check codes over GF(q), IEEE Commun. Lett., vol. 2, no. 6, pp. 165 167, 1998. [2] H. Song and J. Cruz, Reduced-complexty decodng of Q-ary LDPC codes for magnetc recordng, IEEE Trans. Magn., vol. 39, no. 2, pp. 1081 1087, 2003. [3] J. Chen, L. Wang, and Y. L, Performance comparson between nonbnary LDPC codes and reed-solomon codes over nose bursts channels, n Proc. Internatonal Conference on Communcatons, Crcuts and Systems, L. Wang, Ed., vol. 1, 2005, pp. 1 4 Vol. 1. [4] I. Djordjevc and B. Vasc, Nonbnary LDPC codes for optcal communcaton systems, IEEE Photoncs Technology Letters, vol. 17, no. 10, pp. 2224 2226, 2005. [5] C. Spagnol, W. Marnane, and E. Popovc, FPGA mplementatons of LDPC over GF(2 m ) decoders, n Proc. IEEE Workshop on Sgnal Processng Systems, W. Marnane, Ed., 2007, pp. 273 278. [6] D. MacKay and M. Davey, Evaluaton of Gallager codes for short block length and hgh rate applcatons, n In Codes, Systems and Graphcal Models. Sprnger-Verlag, 2000, pp. 113 130. [7] D. Declercq and M. Fossorer, Decodng algorthms for nonbnary LDPC codes over GF(q), IEEE Trans. Commun., vol. 55, no. 4, pp. 633 643, 2007. [8] B. Ganes, Advances n Informaton Systems Scence. Plenum, New York, 1969, ch. 2, pp. 37 172. [9] V. Gaudet and A. Rapley, Iteratve decodng usng stochastc computaton, Electroncs Letters, vol. 39, no. 3, pp. 299 301, Feb. 2003. [10] S. Sharf Tehran, W. Gross, and S. Mannor, Stochastc decodng of LDPC codes, IEEE Commun. Lett., vol. 10, no. 10, pp. 716 718, 2006. [11] S. Sharf Tehran, S. Mannor, and W. J. Gross, An area-effcent FPGAbased archtecture for fully-parallel stochastc LDPC decodng, n Proc. IEEE Workshop on Sgnal Processng Systems, 17 19 Oct. 2007, pp. 255 260. [12], Fully parallel stochastc LDPC decoders, IEEE Trans. Sgnal Process., vol. 56, no. 11, pp. 5692 5703, Nov. 2008. [13] C. Wnstead, V. Gaudet, A. Rapley, and C. Schlegel, Stochastc teratve decoders, n Proc. Internatonal Symposum on Informaton Theory ISIT, 2005, pp. 1116 1120. [14] C. Poullat, M. Fossorer, and D. Declercq, Desgn of regular (2, d c)-ldpc codes over GF(q) usng ther bnary mages, IEEE Trans. Commun., vol. 56, no. 10, pp. 1626 1635, October 2008. [15] A. Vocla, F. Verder, D. Declercq, M. Fossorer, and P. Urard, Archtecture of a low-complexty non-bnary LDPC decoder for hgh order felds, n Proc. Internatonal Symposum on Communcatons and Informaton Technologes ISCIT 07, F. Verder, Ed., 2007, pp. 1201 1206.