Analysis of Low Density Codes. and. Improved Designs Using Irregular Graphs. 1 Introduction. codes. As the codes that Gallager builds are derived

Analysis of Low Densiy Codes and Improved Designs Using Irregular Graphs Michael G. Luby Michael Mizenmacher y M. Amin Shokrollahi z Daniel A. Spielman x Absrac In [6], Gallager inroduces a family of codes based on sparse biparie graphs, which he calls low-densiy pariycheck codes. He suggess a naural decoding algorihm for hese codes, and proves a good bound on he fracion of errors ha can be correced. As he codes ha Gallager builds are derived from regular graphs, we refer o hem as regular codes. Following he general approach inroduced in [7] for he design and analysis of erasure codes, we consider error-correcing codes based on random irregular biparie graphs, which we call irregular codes. We inroduce ools based on linear programming for designing linear ime irregular codes wih beer error-correcing capabiliies han possible wih regular codes. For example, he decoding algorihm for he rae 1/ regular codes of Gallager can provably correc up o 5.17% errors asympoically, whereas we have found irregular codes for which our decoding algorihm can provably correc up o 6.7% errors asympoically. We include he resuls of simulaions demonsraing he eeciveness of our codes on sysems of reasonable size. Inernaional Compuer Science Insiue, Berkeley, CA. Pars of his research were done while sill a he Digial Equipmen Corporaion Sysems Research Cener, Palo Alo, CA. Research parially suppored by NSF operaing gran NCR-9416101. E-mail: luby@icsi.berkeley.edu. y Digial Equipmen Corporaion, Sysems Research Cener, Palo Alo, CA. E-mail: michaelm@pa.dec.com. z Inernaional Compuer Science Insiue Berkeley, and Insiu fur Informaik der Universia Bonn, Germany. Research suppored by a Habiliaionssipendium of he Deusche Forschungsgemeinschaf, Gran Sh 57/1{1. E-mail: amin@icsi.berkeley.edu. x Deparmen of Mahemaics, M.I.T. E-mail: spielman@mah.mi.edu. 1 Inroducion In [6], Gallager inroduces a family of codes based on sparse biparie graphs, which he calls low-densiy pariycheck codes. As he codes ha Gallager builds are derived from regular graphs, we refer o hem as regular codes. He suggess a naural decoding algorihm for hese codes, and proves a good lower bound on he fracion of errors ha can be correced, assuming ha here are no shor cycles in he underlying graph. While much of his work concerns randomly chosen graphs, his analysis does no direcly apply o such graphs. Insead, he consrucs explici graphs of large girh o which his analysis does apply. The main conribuion of his paper is he design and analysis of low-densiy pariy-check codes based on irregular graphs. This work follows he general approach inroduced in [7] for he design and analysis of erasure codes. There i is shown ha using irregular graphs yields codes wih much beer performance han regular graphs. In accordance wih [7], we consider errorcorrecing codes based on random irregular biparie graphs, which we call irregular codes. We develop ools based on linear programming for designing linear ime encodable and decodable irregular codes wih beer error-correcing capabiliies han regular codes. For example, he rae 1/ regular codes of Gallager can provably correc up o 5.17% errors, whereas we have found irregular codes ha can provably correc up o 6.7%. The only mehod we currenly have for consrucing irregular codes is by randomly choosing he irregular graph. However, he analysis used by Gallager does no direcly apply o randomly chosen graphs. Thus, o analyze he performance of he irregular codes, we develop an analysis ha applies o randomly chosen graphs. Using echniques from [8] for sudying random processes, we can calculae for a random regular graph he fracion of erroneous bis for which Gallager's original algorihm can correc all bu an arbirarily small consan fracion of he errors. Once he number of erroneous bis is reduced o his level, we swich from Gallager's

algorihm o one used by Spielman and Sipser in [15], and prove ha his new hybrid mehod successfully nishes he decoding wih high probabiliy. This analysis easily exends o he irregular codes ha we inroduce. Addiionally, he bound on he probabiliy of error we derive using his mehodology improves upon he bound derived by Gallager for he regular graphs he explicily consruced. Gallager's decoding algorihm is a simplicaion of \belief propagaion" [14]. Belief propagaion has been exensively esed wih Gallager's low-densiy pariycheck codes [, 6, 11, 1, 17] and is srongly relaed o he highly successful urbo codes [1, 3, 10, 5]. In a separae work, we describe empirical ess on irregular codes using a full belief propagaion algorihm and demonsrae irregular codes wih beer performance han regular codes [9]. We believe our analysis here provides an imporan sep owards analyzing codes based on belief propagaion echniques. The paper proceeds as follows: in Secion.1, we presen a descripion of regular codes and analyze Gallager's decoding scheme. We show in Secion. how expander-based argumens can be used in addiion o he previous analysis o demonsrae a decoding algorihm ha works wih high probabiliy for regular codes. We inroduce irregular codes in Secion 3, where we demonsrae ha our argumens generalize o irregular codes and describe how o nd irregular graphs ha lead o good codes. In Secion 4, we discuss some simulaion resuls ha show he eeciveness of our analysis for designing pracical codes. We conclude wih a discussion of open problems. Regular Codes.1 Analyzing Regular Codes We rs review he codes developed by Gallager and his analysis [6]. Laer we explain how his analysis combined wih he argumen from [8] shows ha his suggesed decoding algorihm correcs all bu an arbirarily small consan fracion of he nodes wih high probabiliy for random regular codes. The decoding algorihm of Gallager's ha we analyze is an example of hard decision decoding, which signies ha a each sep he sae is derived from local decisions of wheher each bi is 0 or 1, and his is all he informaion he sae conains (as opposed o more deailed probabilisic informaion). We noe ha Gallager also proposes a belief propagaion ype decoding algorihm, which uses a more complicaed sae; for more deails, see for example [4, 9, 11, 17]. In he following we refer o he nodes on he lef and righ sides of a biparie graph as is message nodes and check nodes respecively. A biparie graph wih n nodes on he lef and r nodes on he righ gives rise o a linear code of dimension k n? r and block lengh n in check node messsage node check nodes message nodes Figure 1: Represening he code as a ree. he following way: he bis of a codeword are indexed by he message nodes. A binary vecor x = (x 1 ; : : : ; x n ) is a codeword if and only if Hx = 0, where H is he r n incidence marix of he graph whose rows are indexed by he check nodes and whose columns are indexed by he message nodes. In oher words, (x 1 ; : : : ; x n ) is a codeword if and only if for each check node he exclusive-or of is inciden message nodes is zero. An alernaive approach is o allow he nodes on he righ o represen bis raher han resricions, and hen use a cascading series of biparie graphs, as described for example in [16] or [7]. In his siuaion, we know inducively he correc value of he check nodes in each layer when we correc he message nodes, and he check nodes are he exclusive-or of heir inciden message nodes. In he sequel we focus on one biparie graph only, and assume ha only he nodes on he lef are in error. The analysis ha we provide in his case works for eiher of he wo approaches given above, as we may inducively focus on jus one layer in he conex of cascading series of graphs [16, 7]. We call he linear codes ha are obained by eiher of he above consrucions regular codes. Consider a regular random graph wih he message nodes having degree d` and he check nodes having degree d r. Wih probabiliy p a message node receives he wrong bi. The decoding process proceeds in rounds, where in each round rs he message nodes send each inciden check node a single bi and hen he check nodes send each inciden message node a single bi. To picure he decoding process, consider an individual edge (m; c) beween a message node m and a check node c, and an associaed ree describing a neighborhood of m. This ree is rooed a m, and he ree branches ou from he check nodes of m excluding c, as shown in Figure 1. For now le us assume ha he neighborhood of m is accuraely described by a ree for some xed number of rounds. Each message node m remembers he received bi r m c m

ha is purpored o be he correc message bi. (Thus, r m is no he correc message bi wih probabiliy p.) Each edge (m; c) remembers a bi g m;c ha is a guess of he correc bi of m. This bi is coninually updaed each round based on all informaion ha is passed from c o m. During each round a bi is passed in each direcion across edge (m; c). Each round consiss of an execuion of he following scrip: For all edges (m; c) do he following in parallel: { If his is he zeroh round, hen se g m;c o r m. { If his is a subsequen round, hen g m;c is compued as follows: if all he check nodes of m excluding c sen he same value o m in he previous round, hen se g m;c o his value, else se g m;c o r m. { In eiher case, m sends g m;c o c. For all edges (m; c) do he following in parallel: { he check node c sends o m he exclusive-or of he values i received in his round from is adjacen message nodes excluding m. Of course he parallel work can easily be simulaed sequenially. Moreover, he work per round can easily be coded so ha i is linear in he number of edges. Le p i be he probabiliy ha m sends c an incorrec value g m;c in round i. Iniially p 0 = p. Following he work of Gallager, we deermine a recursive equaion describing he evoluion of p i over a consan number of rounds. Consider he end of he ih round, and consider a check node c 0 of m oher han c. The node c 0 sends m is correc value as long as here are an even number (including possibly 0) message nodes oher han m sending c 0 he wrong bi. As each bi was incorrecly sen o c 0 wih probabiliy p i, i is easy o check ha he probabiliy ha c 0 receives an even number of errors is 1 + (1? p i ) d r?1 : (1) Hence, he probabiliy ha m was received in error and sen correcly in round i + 1 is p 0 1 + (1? p i ) d r?1 d`?1 and similarly he probabiliy ha m was received correcly bu sen incorrecly in round i + 1 is given by 1? (1? p i ) (1? p 0 ) d d`?1 r?1 : ; This yields an equaion for p i+1 in erms of p i : d`?1 1 + (1? p i ) p i+1 = d r?1 p 0? p 0 1? (1? p i ) d r?1 + (1? p 0 ) d`?1 : () Gallager's idea is hen o nd he supremum p of all values of p 0 for which he sequence p i is monoonically decreasing and hence converges o 0. Noe, however, ha even if p i converges o 0, his does no direcly imply ha he process necessarily correcs all message nodes, even wih high probabiliy. This is because our assumpion ha he neighborhood of (m; c) is accuraely represened by a ree for arbirarily many rounds is no rue. In fac, even for any consan number of rounds i is rue only wih high probabiliy. Gallager proves ha, as he block lengh of he code and girh of he graph grow large, his decoding algorihm works for all p 0 < p. Since random graphs do no have large girh, Gallager inroduced explici consrucions of regular sparse graphs ha do have sucienly large girh for his analysis o hold. We will shorly provide an analysis ha shows ha Gallager's decoding algorihm successfully correcs a large fracion of errors for a randomly chosen regular graph wih high probabiliy. Then in Secion. we show how o ensure he decoding erminaes successfully wih high probabiliy using a slighly dieren decoding rule. Gallager noes ha he decoding rule can be relaxed in he following manner: a each round, here is a universal hreshold value b i (o be deermined below) ha depends on he round number. For each message node m and neighboring check node c, if a leas b i neighbors of m excluding c sen he same bi o m in he previous round, hen m sends his bi o c in his round; oherwise m sends o c is iniial bi r m. The res of he decoding algorihm is he same. Using he same analysis as for equaion (), we may nd a recursive descripion of he p i. For convenience, we dene 1 + y 1? y j?1? g(y; ; j) = : (3) Also, for convenience we le z i = 1? p i. Then, d`?1 d`? 1 d p i+1 = p 0? p 0 g(z r?1 i ; ; d`) =b i d`?1 d`? 1 d + (1? p 0 ) g(?z r?1 i ; ; d`) (4) =b i We choose b i so as o minimize p i+1. To do his we compare he odds of being righ iniially o he odds of being righ using he check nodes and he hreshold b i.

As deermined by Gallager, he correc choice of b i is he smalles ineger ha saises 1? p 0 1 + (1? p i ) d bi?d`+1 r?1 1? (1? p i ) d : (5) r?1 p 0 Noe ha b i is an increasing funcion of p i ; his is inuiive, since as p i decreases, smaller majoriies are needed o ge an accurae assessmen of m's correc value. Also, noe ha while he algorihm funcions by passing values along he edges, i can also keep a running guess for he value of each message node based on he passed values. The algorihm coninues unil he proposed values for he message nodes saisfy all he check nodes, a which poin he algorihm erminaes wih he belief ha i has successfully decoded he message, or i can fail afer a prese number of rounds. I follows simply from a similar argumen in [8] ha he recursive descripion given by equaion (4) is correc wih high probabiliy over any consan number of rounds. Theorem 1 Le i > 0 be an ineger consan and le Z i be he random variable describing he fracion of edges se o pass incorrec messages afer i rounds of he above algorihm. Furher, le p i be as given in he recursion (4). Then here is a consan c such ha for any > 0 and sucienly large n we have Pr(jZ i? p i j > ) < exp(?cn): Proof: We skech he proof. There are wo consideraions requiring care. Firs, he neighborhood around a message bi m may no ake he form of a ree. We show ha his does no happen oo ofen wih an edge exposure maringale argumen. Second, even assuming he number of non-rees is small, we sill need o prove igh concenraion of p i around he expecaion given ha message bis may be wrong iniially wih probabiliy p 0. This follows from a separae maringale argumen, exposing he iniial values a each node one by one. For he rs consideraion, i is easily seen ha here is a number depending on i and he maximum degree of he graph such ha he probabiliy ha he neighborhood of deph i semming from an edge is no a ree is =n. For sucienly large n he value =n is less han =4. Now by exposing he edges one by one using an edge exposure maringale and applying Azuma's inequaliy [13, Secion 4.4] we see ha he fracion of edges wih non-ree neighborhoods is greaer han = wih probabiliy a mos exp(?cn). Now le Z i be he expeced number of edges se o pass incorrec messages afer i rounds. Then jz i? p i j < = wih high probabiliy by he above. We can show ha Z i and Z i are close using a maringale argumen, exposing he iniial values a he verices one by one. Again using Azuma's inequaliy we obain Pr(jZ i?z i j > =) exp(?cn) for some consan c (depending on i). This now gives he asserion. Q.E.D. Corollary 1 Given a random regular code wih p i as de- ned by equaion (4), if he sequence p i converges o 0, hen for any > 0 here is a sucienly large message size n such ha Gallager's hard decision decoding correcly decodes all bu a mos n bis in some consan number r of rounds wih high probabiliy.. Compleing he Work: Expander-based Argumens In he previous secion we have shown ha he hard decision decoding correcs all bu an arbirarily small consan fracion of he message nodes for regular codes wih sucienly large block lenghs. The analysis, however, is no sucien o show ha he decoding process complees successfully. In his secion, we show how o nish he decoding process wih high probabiliy once he number of errors is sucienly small using slighly dieren algorihms. Our work uilizes he expanderbased argumens in [15, 16]. We rs dene wha we require in erms of he biparie graph represened by he code being a good expander. Deniion 1 A biparie graph has expansion (; ) if for all subses S of size a mos n of he verices on he lef, he size of he neighborhood N(S) of S on he righ saises N(S) j(s)j, where (S) is he se of edges aached o verices in S. Following he noaion of [15], we call a message node corrup if i diers from is correc value, and we call a check node saised (respecively unsaised) if is value is (is no) he sum of he values of is adjacen message nodes. The work of [15] shows ha if he underlying biparie graph of a code has sucien expansion for ses of size up o n, hen boh of he following algorihms can correc any se of n= errors: Sequenial decoding: if here is a message node ha has more saised han unsaised neighbors, ip he value of ha message node. Repea unil no such message node remains. Parallel decoding: for each message node, coun he number of unsaised check nodes among is neighbors. Flip in parallel each message node wih a majoriy of unsaised neighbors. Noe ha he above algorihms are very similar o Gallager's hard decision decoding algorihm, excep ha here we need no hold values for each (message node,

1 1 0 1 1 0 Figure : If he wo lef nodes are supposed o be 0, and all oher nodes are correc, hen he majoriy ells he lef nodes no o change. check node) pair. We call upon he resuls of [15] o show ha once we use hard decision decoding o correc all bu some arbirarily small fracion of he message nodes, we can nish he process. The nex lemma follows from Theorems 10 and 11 of [15]. Lemma 1 Le > 0 and > 3=4 + for some xed > 0. Le B be an (; ) expander. Then he sequenial and parallel decoding algorihms correc up o n= errors. The sequenial decoding algorihm does so in linear ime and he parallel decoding algorihm does so in O(log n) rounds, wih each round requiring a linear amoun of work. We use he following sandard lemma o claim ha he graph we choose is an appropriae expander, and hence we can nish o he analysis of he decoding process using he previous lemma. Lemma Le B be a biparie graph formed as follows wih n nodes on he lef and n nodes on he righ, where > 0 is a xed consan. Suppose ha a degree is assigned o each node so ha all lef nodes have degree a leas ve, and all righ nodes have degree a mos C for some consan C. Suppose ha a random permuaion is chosen and used o mach each edge ou of a lef node wih each edge ino a righ node. Then, wih 1? O(1=n), for some xed > 0, > 0, and = 3=4 +, B is an (; ) expander. We noe ha he resricion in Lemma ha he lef degrees are a leas ve appears necessary. For example, i is enirely possible for random graphs wih degree hree on he lef o fail o complee using he proposed sequenial and parallel algorihms even afer almos all nodes have been correced. A problem occurs when he graph has a small even cycle. In his case, if all he nodes in he cycle are received incorrecly, he algorihm may fail o erminae correcly. (See Figure.) Even cycles of any consan lengh occur wih consan probabiliy, so errors remain wih consan probabiliy. 1 1 To circumven his problem Gallager designs regular graphs wih no small cycles [6]. To circumven his problem in random graphs, we make a small change in he srucure of he graph, similar o ha in [7]. Suppose ha we use he previous analysis o correc all bu a mos n message bis wih high probabiliy. We add an addiional 0 n check nodes, where 0 is a consan ha depends on, and consruc a regular random graph wih degree 5 on he lef beween all he n message nodes and he 0 n check nodes. The decoding proceeds as before on he original random graph, correcing all bu a mos n message bis. We hen use he 0 n check nodes previously held in reserve o correc he remaining message bis using he Sipser-Spielman algorihm. Tha his procedure works follows direcly from Lemmas 1 and. Moreover, as boh and 0 can be made arbirarily small by Corollary 1, he change in he rae of he code due o his addiional srucure is negligible, and is ignored in he sequel. I is worh noing ha since explici consrucions are known for regular expanders, using he previous analysis (Theorem 1 and Lemma 1) we may consruc regular codes wih he same asympoic performance as Gallager's regular codes ha are guaraneed o work wih probabiliy exponenial in n. Gallager proved ha his codes and decoding algorihm worked correcly wih probabiliy exponenial in a roo of n. Hence our proof yields slighly beer bounds on he error probabiliy in his case..3 Theoreically Achievable Error Correcion For every rae, and for every possible lef degree and corresponding righ degree, he value of p can be compued by he above analysis. A naural quesion o ask is which regular code can achieve he larges value of p. Among rae 1= regular codes, i urns ou ha he larges p is achieved when all lef nodes have degree 4 and all righ nodes have degree 8, in which case p 0:0517. Thus, combining Corollary 1, Lemma 1, and Lemma, we have shown ha when he corresponding biparie graph is chosen randomly his code can correc all errors wih high probabiliy when he iniial fracion of errors approaches 0:0517. All of hese regular codes run in linear ime if we use he sequenial decoding algorihm in he nal sage. This follows from he fac ha we need o run he hard decision decoding only for a consan number of rounds (a linear ime per round), and hen he sequenial decoding algorihm can x he remaining errors in linear ime. 3 Irregular Codes 3.1 Inuiion Before we show how o derive irregular random graphs ha improve upon he performance of Gallager's low-

densiy pariy-check codes, we oer some inuiion as o why irregular graphs prove useful. I is convenien o hink of he process as a game, wih he message nodes and he check nodes as he players, and each player rying o choose he righ number of edges. A consrain on he game is ha he message nodes and he check nodes mus agree on he oal number of edges. From he poin of view of a message node, i is bes o have high degree, since he more informaion i ges from is check nodes he more accuraely i can judge wha is correc value should be. In conras, from he poin of view of a check node, i is bes o have low degree, since he higher he degree of a check node, he more likely i is o ransmi incorrec guesses o he message node. These wo compeing requiremens mus be appropriaely balanced. If one allows irregular graphs, here is more exibiliy in balancing hese compeing requiremens. In fac, for he decoding algorihm we describe below, he improved performance arises from varying he degrees of he message nodes. Message nodes wih high degree end o heir correc value quickly. These nodes hen provide good informaion o he check nodes, which subsequenly provide beer informaion o lower degree message nodes. Irregular graph consrucions hus lead o a wave eec, where high degree message nodes end o ge correced rs, and hen message nodes wih slighly smaller degree, and so on down he line. 3. Analyzing Irregular Codes We now describe a decoding algorihm for codes based on irregular graphs, or wha we call irregular codes. Following he noaion used in [7], for an irregular biparie graph we say ha an edge has degree i on he lef (righ) if is lef (righ) hand neighbor has degree i. Le us suppose we have an irregular biparie graph wih some maximum lef degree d` and some maximum righ degree d r. We specify our irregular graph by sequences ( 1 ; ; : : : ; d`) and ( 1 ; ; : : : ; dr ), where i ( i ) is he fracion of edges wih lef (righ) degree i. Furher, we dene (x) := P i ix i?1. Our decoding algorihm in he case of irregular graphs is similar o Gallager's hard decision decoding as described in Secion.1, bu generalized o ake ino accoun he varying degrees of he nodes. Again we look a he process from he poin of view of an edge (m; c). Consider he end of he ih round, and consider a check node c 0 of m oher han c. The node c 0 sends m is correc value as long as here are an even number (including possibly 0) of oher message nodes sending c 0 he wrong bi. As each bi was correcly sen o c 0 wih probabiliy p i, i is simple o check ha he probabiliy ha c 0 receives an even number of errors is 1 + (1? p i ) : (6) Equaion 6 is he generalizaion of equaion 1, aking ino accoun he probabiliy disribuion on he degree of c 0. Also similarly o Secion.1, afer round i a message node m of degree j passes is iniial value along (m; c) o check node c unless a leas b i;j of he check nodes c 0 adjacen o m oher han c send m he same value. Noe ha now he hreshold value for a node depends on is degree. Also, he value of b i;j changes according o he round. To analyze he decoding process, consider a random edge (m; c). The lef degree of (m; c) is j wih probabiliy j. I hus follows from he same argumen as in Secion.1 ha he recursive descripion for p i is (again using z i = 1? p i and g as dened in Equaion (3)) p i+1 = p 0? d` j=1 + (1? p 0 ) j 4 p0 j?1 j?1 =b i;j =b i;j j? 1 j? 1 g((z i ); ; j) 3 g(?(z i ); ; j) 5 :(7) We need o deermine b i;j so as o minimize he value of p i+1. As in equaion (5), he bes value of b i;j is given by he smalles ineger ha saises: bi;j?j+1 1? p 0 1 + (1? p i ) : (8) 1? (1? p i ) p 0 This equaion has an ineresing inerpreaion. Noe ha b i;j? j + 1 is a consan xed by he above equaion. The value b i;j? j + 1 = b i;j? (j? 1? b i;j ) can be inerpreed as he dierence beween he number of check nodes ha agree in he majoriy and he number ha agree in he minoriy. We call his dierence he discrepancy of a node. Equaion (8) ells us ha we need only check ha he discrepancy is above a cerain hreshold o decide which value o send, regardless of he degree of he node. 3.3 Designing Irregular Graphs We now describe echniques for designing codes based on irregular graphs ha can handle larger probabiliies of error a poenially some expense in encoding and decoding ime. Given our analysis of irregular codes, our goal is o nd sequences = ( 1 ; ; : : : ; d`) and = ( 1 ; ; : : : ; dr ) ha yield he larges possible value of p 0 such ha he sequence of p i decreases o 0 for a given rae. We frame his problem in erms of linear programs. Our approach canno acually deermine he bes sequences and. Insead, our echnique allows us o deermine a good vecor given a vecor and he desired rae of he code. This proves sucien for nding codes ha perform signicanly beer han regular codes. (Similarly, we may also apply his echnique

o deermine a good vecor given a vecor and he desired rae; as we explain below, however, his does no prove useful in his seing.) Le p 0 be xed. For convenience, we use z = 1? x below. For a given degree sequence = ( 1 ; ; : : :; dr ) le he real valued funcion f(x) be dened by f(x) = p 0? where now b i;j = d` j=1 +(1? p 0 ) j 4 p0 j?1 j?1 =b i;j =b i;j j? 1 j? 1 j? 1 + log((1? p 0)=p 0 ) log((1 + x)=(1? x)) g((z); ; j) 3 g(?(z); ; j) 5 ; = and he j are variables o be deermined. Observe ha condiion (7) now reads as p i+1 = f(p i ). For a given p 0 and righ hand degree sequence, we are ineresed in nding a degree sequence ( 1 ; : : : ; d`) such ha he corresponding funcion f(x) saises f(x) < x on he open inerval (0; p 0 ). We begin by choosing a se L of posiive inegers which consiue he range of possible degrees on he lef hand side. To nd appropriae `, ` L, we use he condiion f(x) < x above o generae linear consrains ha he ` mus saisfy by considering dieren values of x. For example, by examining he condiion a x = 0:01, we obain he consrain f(0:01) < 0:01, which is linear in he `. We generae consrains by choosing for x muliples of p 0 =N for some ineger N. We also include he consrains ` 0 for all ` L, as well as he consrain `L `=` = R i i =i; (9) where R is he rae of he code. This condiion expresses he fac ha he number of edges inciden o he lef nodes equals he number of edges inciden o he righ nodes. We hen use linear programming o deermine if suiable ` exis ha saisfy our derived consrains. The choice for he objecive funcion is arbirary as we are only ineresed in he exisence of feasible soluions. Given he soluion from he linear programming problem, we can check wheher he ` compued saisfy he condiion f(x) < x on (0; p 0 ). The bes value for p 0 is found by binary search. Due o our discreizaion, here are usually some conic inervals in which he soluion does no saisfy his inequaliy. Choosing large values for he radeo parameer N resuls in smaller conic inervals, alhough i requires more ime o solve he linear program. For his reason we use small values of N during he binary search phase. Once a value for p 0 is found, we use larger values of N for ha specic p 0 o obain small conic inervals. In he las sep we ge rid of he conic inervals by slighly decreasing he value of p 0. This linear programming ool allows for ecien search for good codes. Tha is, given a vecor we can nd a good parner vecor. In a similar fashion, we can similarly nd a good parner vecor from a given. However, our experimens reveal ha he bes vecor for his decoding algorihm is always he one where are he nodes on he righ have he same degree (or all nodes have as close o he same degree as possible). There is inuiion explaining his phenomenon. From he poin of view of a message node m, i appears bes if he expeced number of oher neighbors a neighboring check node c has is as small as possible. This can be seen as follows. A he end of he ih round, he probabiliy ha c sends he correc voe o m is 1+(1?p i). For small p i values, his is approximaely 1?p i P dr i=1 (i? 1) i. To maximize his probabiliy, we seek o minimize P dr i=1 (i? 1) i, which is exacly he expeced number of oher neighbors c has. This quaniy is minimized (subjec o he consrains P d r i=1 i = 1 and equaion (9)) when all check nodes have equal degree, or as nearly equal as possible. In conras, we noe ha using varying degrees for he check nodes is advanageous when using a more complicaed decoding algorihm based on belief propagaion [9]. Using he linear programming echnique, we have considered graphs where he nodes on he lef side may have varying degrees and he nodes on he righ side all have he same degree. In oher words, we have found good codes by considering vecors wih jus one nonzero enry. As we shall see in Secion 4, his suces o nd codes wih signicanly beer performance han ha given by codes deermined by regular graphs. I remains o show ha he codes we derive in his manner in fac funcion as we expec. Tha is, given a vecor ( 1 ; : : : ; d ), he righ degree d r, and he iniial error probabiliy p 0, if he sequence p i given by equaion (7) is monoonically decreasing and hence converges o 0, hen he code obained from he corresponding irregular random graph correcs a p 0 -fracion of errors, wih high probabiliy. We rs noe ha he equivalen of Theorem 1 holds in his case as well, by a similar proof (modied o ake ino accoun he dieren degrees). Tha is, we can use he hard decision decoding algorihm o decrease he number of erroneous bis down o any consan fracion. To nish he decoding, we use he sequenial algorihm from Secion.. The overall decoding ime is linear. To prove he sequenial decoding algorihm works, we need an equivalen of Lemma 1 for irregular graphs. Lemma 3 Le > 0 and > 3=4 + for some xed > 0. Suppose ha B is an irregular biparie (; )

expander, and ha d is he maximum degree on a lef node of B. Then he sequenial decoding algorihm correcs up o n=d errors in linear ime. Proof: We follow Theorem 10 of [15]. We show ha he number of unsaised check nodes decreases afer each sep in he sequenial algorihm. Le V be he se of corrup message nodes, wih jv j = v and j(v )j = dv. Suppose here are u unsaised check nodes and le s be he number of saised neighbors of he corrup variables. By he expansion of B, we have u + s > (3=4) dv: As each saised neighbor of V shares a leas wo edges wih V, and each unsaised neighbor shares a leas one, we have dv u + s: I follows ha u > dv=; (10) and hence here is some message node wih more han 1/ of is inciden check nodes unsaised. Hence a each sep he sequenial algorihm may ip a message node and decrease he number of unsaised check nodes. Therefore he only way he algorihm can fail is if he number of corrup message nodes increases so ha v n during he algorihm. Bu if v n hen, by Equaion (10), u > dn= n=. However, iniially u is a mos d imes he maximum number n=d of iniial message bi errors, i.e., iniially u < n=. As u decreases hroughou he course of he algorihm, we can no have ha v n during he algorihm, and hence i canno fail. Q.E.D. I follows ha he irregular codes we derive funcion as we expec as long as our random graphs have suf- cien expansion. This expansion propery holds wih high probabiliy if we choose he minimum degree o be a leas ve. However, as saed previously, graphs wih message nodes of smaller degree may be handled wih a small addiional srucure in he graph. 3.4 Theoreically Achievable Error Correcion We have designed some irregular degree sequences using he linear programming mehodology described in subsecion 3.3. The codes we describe all have rae 1=. These codes perform well in pracice as well as according o our heoreical model. However, i is likely ha one could nd codes ha perform slighly beer codes using our echniques. I is worh noing ha Shannon upper bound (or enropy bound) for p for codes of rae 1= is 11:1%. Alhough he irregular codes we have designed o dae are far from his limi, hey are sill much beer han regular codes. Code Righ Deg. Lef Degree Parameers Code 14 14 5 = 0:496041, 6 = 0:17386, 1 = 0:0775, 3 = 0:5871 Code 5 = 0:84961, 6 = 0:14061, 7 = 0:068844, 9 = 0:1090, 30 = 0:119796, 100 = 0:93135 Code 10' 10 3 = 0:13397, 4 = 0:555093, 16 = 0:31510 Code 14' 14 3 = 0:093368, 4 = 0:346966, 1 = 0:159355, 3 = 0:40031 Table 1: Parameers of our codes. Code 14 and Code, described fully in Table 1 are wo irregular codes ha we designed. For Code 14 all nodes on he righ have degree 14, and for Code all nodes on he righ have degree. 1 In boh hese codes, he minimum degree on he lef hand side is ve. This ensures ha he graphs have good expansion as needed in Lemma, and hus here is no need for he addiional srucure discussed in Secion.. Using he analysis of Secion 3., we deermine he appropriae value of p is approximaely 0:0505 for Code 14 and 0:0533 for Code. We can achieve even beer performance by considering graphs wih smaller degrees on he lef. While such graphs do no have sucien expansion for Lemma o hold, we can use he addiional srucure discussed in Secion. o nish he decoding. For Code 10' all nodes on he righ have degree 10, and for Code 14' all nodes on he righ have degree 14. Using he analysis of Secion 3., we deermine he appropriae value of p is approximaely 0:0578 for Code 10' and 0:067 for Code 14'. Recall ha 0:0517 is he bes value of p ha is possible using regular graphs for rae 1/ codes. 4 Experimenal Resuls We include preliminary experimenal resuls for new codes we have found using he linear programming approach. Our experimenal design is similar o ha of [15], whose resuls can be compared wih ours. We describe a few imporan deails of our experimens and implemenaions. In our implemenaion, we simply run Gallager's decoding echnique unil i nishes, or unil a pre-specied number of rounds pass wihou success. In our experimens i urns ou ha i is unnecessary o swich o he modied decoding algorihm of Secion. or use he addiional srucure described in Secion., as in our experience he hard decision decoding algorihm of Gallager nishes successfully once he number of errors becomes small. We do no perform an acual encoding, bu insead 1 Acually, o balance he number of edges, we do allow one node on he righ o have a dieren degree.

for each rial use an iniial message consising enirely of zeroes. To more accuraely compare code qualiy, insead of inroducing errors wih probabiliy p, we se he same number of errors (corresponding o a fracion p of he block lengh) in each rial. I is worhwhile o noe ha even when he decoding algorihm fails o decode successfully because oo many rounds have passed, i can repor ha failure back. We have ye o see he decoding algorihm produce a codeword ha saised all consrains bu was no he original message, alhough heoreically i is a possible even. Our implemenaion akes as inpu a schedule ha deermines he discrepancy value b i;j? j + 1 a each round. This schedule can be calculaed according o equaion (8). In pracice, however, he schedule deermined by equaion (8) mus be modied somewha. If he discrepancy hreshold is changed premaurely, before enough edges ransfer he correc value, he decoding algorihm is signicanly more likely o fail. Hence changing he hreshold according o he round as given by equaion (8) ofen fails o work well when he block size is small, since he variance in he number of edges sending he correc value can be signican. In pracice we nd ha sreching ou he schedule somewha, so ha he discrepancy hreshold is changed afer a few more rounds han he equaions sugges, prevens his problem, a he expense of increasing he running ime of he decoding algorihm. In our experimens, a random graph was consruced separaely for each rial a a cerain error rae. No effor was made o es graphs or weed ou poenially bad ones, and hence we expec ha our resuls would be slighly beer if several random graphs were esed and he bes ones chosen. Following he ideas of [15] and [11], when necessary we remove double edges from our graphs. 4.1 Some Experimens We rs describe experimens on codes of rae 1= wih 16,000 message bis and 8,000 check bis. In Figure 3, we describe he performance of Code 14 and Code ha we inroduced in subsecion 3.4. Each daa poin represens he resuls from,000 rials. Recall ha he appropriae value of p is approximaely 0:0505 for Code 14 and 0:0533 for Code. Recall ha p represens he error rae we would expec o be able o handle for arbirarily long block lenghs, and ha we only expec o approach p asympoically in pracice as he number of nodes grows. Our resuls show ha for block lenghs of lengh 16,000 he codes appear o perform exremely well when a random fracion 0:045 (or 70) of he original message bis are in error. For he,000 rials, Code 14 never failed, and Code failed jus once. (In fac in 10,000 rials wih his number of errors, Code 14 proved suc- Percenage of Successes 100 80 60 40 0 0 4.5 4.6 4.7 4.8 4.9 5 Percenage of Errors Regular (4/8) Code 14 Code Figure 3: Percenage of successes based on 000 rials. cessful every ime.) The probabiliy ha he code succeeds falls slowly as he error probabiliy approaches p. Furher experimens wih larger block lenghs demonsrae ha performance improves wih he number of bis in he message, as one would expec. These codes herefore perform beer han he codes based on regular graphs presened in [15], albei a he expense of a greaer (bu sill linear) running ime. They also perform much beer han regular codes. For insance, as menioned before, he bes regular code of rae 1= is obained from random regular biparie graphs wih degree 4 on he lef and degree 8 on he righ. The performance of his code is also shown in Figure 3. Alhough he p value for his regular code is approximaely 0:0517, in pracice, wih 16,000 message bis his regular code failed 3 imes in,000 rials wih a fracion of 0:045 errors. We now consider Code 10' and Code 14' inroduced in subsecion 3.4. The experimens were run on 16,000 message bis and 8,000 check bis for,000 rials. In our experimens, we remove boh double edges and some small cycles, as suggesed in [11]. Recall ha he appropriae value of p is approximaely 0:0578 for Code 10' and 0:067 for Code 14'. These codes again perform near wha our analysis suggess, and hey signicanly ouperform previous similar codes wih similar decoding schemes, including regular codes. In summary, irregular codes Code 14 and Code appear superior o any regular code in pracice, and irregular codes Code 10' and Code 14' are far superior o any regular code. We have similarly found irregular codes ha perform well a oher raes. 5 Conclusion We have proven ha a class of linear ime error-correcing codes correc a large fracion of errors wih high probabiliy. We have also deermined new codes based on irregular graphs ha perform beer han codes based

Percenage of Successes 100.00 80.00 60.00 40.00 0.00 0.00 5 5.5 5.5 5.75 6 Percenage of Errors Code 10' Code 14' Figure 4: Percenage of successes based on 000 rials. on regular graphs on sysems of pracical size, as well as described a general echnique for producing such codes. Our work leaves several ineresing open quesions. An ambiious projec is o fully analyze he behavior of eiher regular or irregular codes when using a decoding algorihm based on belief propagaion. Such decoding algorihms are similar o he decoding algorihm of Gallager described in Secion.1, excep ha more exensive informaion is passed hrough messages along he edges each round. Analyzing hese algorihms would be a signican breakhrough in he heory of codes based on low-densiy pariy-check marices. Anoher ineresing quesion is o ie ogeher more srongly he heory and pracice of hese codes. Our equaions ha describe he asympoic behavior of he codes do no ell us which codes perform bes for reasonably sized sysems (say, wih housands or ens of housands of bis). A more sysemaic approach raher han rial and error would be useful. References [1] C. Berrou, A Glavieux, and P. Thiimajshima, \Near Shannon Limi Error-Correcing Coding and Decoding: Turbo-Codes", Proceedings of IEEE Inernaional Communicaions Conference, 1993. [] J.-F. Cheng and R. J. McEliece, \Some High- Rae Near Capaciy Codecs for he Gaussian Channel", 34h Alleron Conference on Communicaions, Conrol and Compuing. [3] D. Divsalar and F. Pollara, \On he Design of Turbo Codes", JPL TDA Progress Repor 4-13. [4] G. D. Forney, Jr. \The Forward-Backward Algorihm", Proceedings of he 34h Alleron Conference on Communicaions, Conrol and Compuing, 1996, pp. 43-446. [5] B. J. Frey and F. R. Kschischang, \Probabiliy Propagaion and Ieraive Decoding", Proceedings of he 34h Alleron Conference on Communicaions, Conrol and Compuing, 1996. [6] R. G. Gallager, Low-Densiy Pariy-Check Codes, MIT Press, 1963. [7] M. Luby, M. Mizenmacher, M. A. Shokrollahi, D. A. Spielman, and V. Semann, \Pracical Loss- Resilien Codes", Proc. 9 h Symp. on Theory of Compuing, 1997, pp. 150{159. [8] M. Luby, M. Mizenmacher, and M. A. Shokrollahi, \Analysis of Random Processes via And-Or Trees", Proc. 9 h Symp. on Discree Algorihms, 1998. [9] M. Luby, M. Mizenmacher, M. A. Shokrollahi, and D. A. Spielman, \Improved Low Densiy Pariy Check Codes Using Irregular Graphs and Belief Propagaion", submied o he 1998 Inernaional Symposium on Informaion Theory. [10] D. J. C. MacKay, R, J. McEliece, and J.-F. Cheng, \Turbo Coding as an Insance of Pearl's 'Belief Propagaion' Algorihm", o appear in IEEE Journal on Seleced Areas in Communicaion. [11] D. J. C. MacKay and R. M. Neal, \Good Error Correcing Codes Based on Very Sparse Marices", available from hp://wol.ra.phy.cam.ac.uk/mackay. [1] D. J. C. MacKay and R. M. Neal, \Near Shannon Limi Performance of Low Densiy Pariy Check Codes", o appear in Elecronic Leers. [13] R. Mowani and P. Raghavan, Randomized Algorihms, Cambridge Universiy Press, 1995. [14] J. Pearl, Probabilisic Reasoning in Inelligen Sysems: Neworks of Plausible Inference, Morgan Kaufmann Publishers, 1988. [15] M. Sipser, D. A. Spielman, \Expander Codes", IEEE Transacions on Informaion Theory, 4(6), November 1996, pp. 1710-17. [16] D. A. Spielman, \Linear Time Encodable and Decodable Error-Correcing Codes", IEEE Transacions on Informaion Theory, 4(6), November 1996, pp. 173-1731. [17] N. Wiberg, \Codes and decoding on general graphs" Ph.D. disseraion, Dep. Elec. Eng, U. Linkoping, Sweeden, April 1996.