Run-Time Error Detection in Polynomial Basis Multiplication Using Linear Codes

Ru-Time Error Detectio i Polyomial Basis Multiplicatio Usig Liear Codes Siavash Bayat-Saramdi ad M.A. Hasa Departmet of Electrical ad Computer Egieerig, Uiversity of Waterloo Waterloo, Otario, Caada N2L 3G1 {bayat,ahasa}@ece.uwaterloo.ca Abstract I this article we cosider detectio of errors i polyomial basis multipliers, which have applicatios i chael codig, VLSI testig, ad cryptography. Error detectio is performed by applyig a class of liear codes while the multiplier is i use. I this article, two error detectio schemes are preseted. Results show that the probability of error detectio of our sigle-iput ecodig (SIE) scheme usig eight redudat bits is approximately 0.996. Additioally, the time ad area overheads of the schemes for our bit-serial implemetatios are i a reasoable rage, e.g., for the SIE scheme with eight redudat bits, the area overhead is 39.71% ad the time overhead has bee observed to be egligible. 1 Itroductio Hardware implemetatio of some high performace digital systems require sigificat amout of circuits. I such circuits, faults may occur with a sigificat probability durig the use of the system. Faulty circuits are likely to geerate erroeous results that are ot desirable specially i sesitive ad critical applicatios, icludig deep space chael codig [11], VLSI testig [8], ad cryptography [2, 3]. As a result, error correctio ad detectio are importat for these digital systems. O the other had, oe of the importat ad area cosumig compoets of the above metioed applicatios is fiite field multipliers. I this work, we cosider detectio of radom errors i polyomial basis fiite field multipliers. Our proposed scheme detects certai errors while the multiplier is workig (i.e., ru-time error detectio). I order to detect radom errors i fiite field multipliers, a umber of schemes have bee proposed i the recet past. Oe approach to detect these errors i a fiite field multiplier is to use parity bits, see for example [1, 4, 10]. The secod approach is to scale the iputs of the multiplier by a factor ad at the ed of the multiplicatio the correctess of the result is checked by oe or two divisios, see for example [5]. Aother approach is to use oliear techiques [6], which is expesive i terms of area ad time ad i tur may ot be very efficiet for detectig radom errors. This article presets two schemes for the detectio of errors i both bit-serial ad bit-parallel polyomial basis multipliers over biary extesio fields based o the secod approach. The proposed schemes, which are referred to as sigle-iput ecodig (SIE) ad double-iput ecodig (DIE), ca be applied to ay fiite field GF (2 m ). I these schemes, we use liear codes. Such codes have also bee used i [5]. Importat differeces betwee this work ad [5] are as follows. First, the error model of this work is more geeric ad the error ca occur i ay locatio of the circuit. Secodly, this work gives much more flexibility to choose the field defiig ad the code geerator polyomials. This leads to a reductio i the umber of redudat bits ad i tur a reductio i the area overhead. I this article, for the proposed SIE scheme, its probability of udetected error ad overheads i terms of area ad time are preseted. For the DIE scheme, some commets are made for its error detectio capability ad detailed area ad time overheads are preseted. Results show that, i our bit-serial implemetatios for eight redudat bits, the area overheads are lower tha dual modular redudat systems, ad the time overheads are quite small, i.e., less tha 2%. The orgaizatio of this article is as follows. I Sectio 2, some prelimiaries about polyomial basis multiplicatio ad codig theory are discussed. Two ru-time error detectio schemes are preseted i Sectio 3. Usig oe of the schemes, amely the sigle-iput ecodig, we develop error detectable bit-serial ad bit-parallel multiplier structures i Sectio 4. The error detectio capability of the sigle-iput ecodig scheme is the ivestigated i Sectio 5. Our secod scheme is explaied i Sectio 6. The time ad area overheads of the schemes are preseted i Sectio 7. Fially, Sectio 8 gives a few cocludig remarks. 1-4244-1027-4/07/$25.00 2007 IEEE 204

2 Prelimiaries I this sectio, first polyomial basis multiplicatio is briefly reviewed. The a class of liear codes is explaied. 2.1 Polyomial Basis Multipliers Let f(x) =x m + i=1 f ix i +1be a irreducible polyomial over GF (2) of degree m. Polyomial (or caoical) basis is defied as the followig set: { 1,x,x 2,,x }. Each elemet A of GF (2 m ) ca be represeted usig the polyomial basis (PB) as A = a ix i where a i GF (2). LetC be the product of two elemets A ad B of GF (2 m ). The PB represetatio of C is as follows: C = AB mod f(x) =A b i x i mod f(x) = b i.a i =(b.a + b m 2.A m 2 + + b 1.A 1 + b 0.A 0 ), (1) where A 0 = A ad A i = xa i 1 mod f(x). The multiplicatio of x ad a arbitrary elemet A of GF (2 m ) is performed as follows: xa mod f(x) =x a i x i mod f(x) = a + (a f i + a i 1 ) x i. i=1 Hereafter, the hardware that receives A GF (2 m ) as iput ad geerates xa mod f(x) as output will be referred to as Shift-ad-Reduce (SR) module. I (1),. deotes a scalar multiplicatio of b i GF (2) ad A i GF (2 m ), ad + is a vector additio of two elemets of GF (2 m ). Hardware for scalar multiplicatio ad that for vector additio are hereafter referred to as ad modules, respectively. Usig SR,, ad modules, oe ca costruct PB multipliers i accordace with (1). For bit-serial implemetatio, i additio to these modules, registers are used for storig itermediate results. 2.2 Liear Codes I a (, m) block code, the iput iformatio sequece is divided ito m-bit blocks ad each block is ecoded to a -bit codeword ( >m). Oe importat class of block codes is liear codes. These are extesively used i commuicatio applicatios for correctig/detectig errors i (2) trasmissio chaels. Here, the biary liear codes are cosidered for detectig errors i the polyomial basis multipliers. I the simplest form, a (, m) block code is liear if ad oly if the modulo-2 additio of two codewords is also a codeword. Let V =(v 0,v 1,,v 1 ) be a codeword. A polyomial whose coefficiets are the compoets of V,issaidto be a code polyomial. A code polyomial of degree up to 1 is geerated with a polyomial of degree m of the followig form: g(x) =1+g 1 x + g 2 x 2 + + g x + x m. Polyomial g(x) is called a geerator polyomial. Every code polyomial i the code is a multiple of g(x). I fact, our (, m) liear code, which hereafter is referred to as L code, maps a elemet of a fiite field GF (2 m ) to a elemet of a commutative rig with modulus f(x)g(x), where f(x) is the irreducible polyomial used for represetig the elemets of GF (2 m ). Note that the well-kow cyclic code has the correspodig modulus as x 1. For give f(x) ad, the use of cyclic codes, however, limits the umber of choices of g(x). 3 Ru-Time Error Detectio Schemes Errors may be caused by differet types of faults such as ope faults, short (bridgig) faults, ad/or stuck-at faults. Furthermore, the faults ca be trasiet or permaet. I this article, we ivestigate two schemes for detectig radom errors. I the first scheme, which lays foudatio of discussios for the secod oe, oly oe of the iputs of the PB multiplier is ecoded, i.e., it is multiplied by geerator g(x). The secod iput is ot ecoded. I the secod scheme, both iputs are ecoded. Thus, the first ad the secod schemes are referred to as sigle-iput ecodig (SIE) ad double-iput ecodig (DIE), respectively. DIE is expected to have a better error detectio capability tha SIE at the expese of a icreased area overhead. Nevertheless, the probability of error detectio of SIE ca be withi a acceptable rage because for some applicatios, for example i a elliptic curve cryptographic processor, the secod iput either comes from other operatios such as adders ad multipliers or comes as the direct iput to the multiplier. I the first case, if the previous operatio has a error detectio circuitry, its output, which is the secod iput of the curret multiplier, is expected to be error free. I the secod case, oe ca use a ru-time error detectio techique for the iput of the multiplier oce to avoid faulty iputs. Depedig o the further use of the multiplier s output, the PB multiplier with oe of these schemes 205

ca produce either a ecoded output, i.e., multiplied by oly oe geerator, or a uecoded output. 4 SIE Based Error Detectable Multipliers As metioed i Sectio 2, a PB multiplier ca be costructed with three types of modules: 1) SR, 2), ad 3). I the followig, (, m) L codes are applied to the iputs of these modules to obtai error detectable multipliers. For bit-serial implemetatio, clearly, the size of registers should icrease from m bits to bits. 4.1 ad Modules Suppose that a (, m) L code is used ad g(x) is the geerator polyomial. Let A, B, S ad P GF (2 m ) ad b GF (2), where scalar multiplicatio b.a = P ad vector additio A + B = S. Suppose A,B,S ad P are the results of ecodig A, B, S ad P, respectively. Thus, for scalar multiplicatio we have: b.a = b.ag = Pg = P, ad for vector additio we have: A + B = Ag + Bg =(A + B)g = Sg = S. Accordigly, for usig L codes, the sizes of ad modules should icrease from m bits to bits each. 4.2 SR Module As show i Figure 1(a), the uecoded iput ad the output of the SR module are U(x) = u ix i ad U s (x) = u s i x i, respectively. The code geerator polyomial, g(x), over GF (2) of degree m is used for ecodig. The ecoded iput ad the output of the SR module (see Figure 1(b)) are V (x) = 1 v ix i ad V s (x) = 1 v s i x i, respectively. I a SR module with uecoded iput, we have: Accordig to (2): U s (x) = m 2 U s (x) =xu(x)modf(x). u i x i+1 + u f i x i = x u i x i ) +u (x m + f i x i = xu(x)+u f(x). O the other had, for ecoded iputs to SR module we have: V (x) =U(x)g(x). (4) (3) U(x) m V (x) SR (f) (a) SR (F ) (b) U s(x) m V s(x) v 0 v i 1 v 2 v 1 (c) f 1 f i f 1 Figure 1. SR module: (a) with uecoded iput, depeds o f(x), (b) with ecoded iput, depeds o F (x), (c) details of (b) v s0 v s1 v si v s 1 Thus, usig (3) ad (4), for iput V (x) the output of the SR module is: V s (x) =U s (x)g(x) =xu(x)g(x)+u f(x)g(x) (5) = xv (x)+u f(x)g(x). Let F (x) =f(x)g(x). Sice F (x) ca be cosidered to be fixed, it ca be pre-computed. O the other had, v 1 = u.g m ad g m =1, thus: Therefore, usig (5) ad (6) we have: v 1 = u. (6) V s (x) =xv (x)+v 1 F (x) (7) Remark 1 Let ω(f ) be the Hammig weight of F (x). The umber of XOR gates required for costructig the SR module with ecoded iput, show i Figure 1(c), is ω(f ) 2. 4.3 Bit-serial ad Bit-parallel Polyomial Basis Multipliers To costruct a bit-serial ad a bit-parallel multiplier with ru-time error detectio capability, we will use updated versios of SR,, ad modules with ecoded iput. Figure 2(a) shows a bit-serial multiplier with ru-time error detectio (RTED) capability. For multiplyig A ad B with RTED capability, register D is iitialized with ecoded A, i.e., A. A error checker ca be placed at each of the three locatios: L1, L2 ad L3. I the ext sectio, the frequecy of check poits will be discussed. Figure 2(b) shows a bit-parallel multiplier with RTED capability. I the bit-parallel multiplier a error checker ca be placed after each modules. Thus, there ca be as may as 3m 2 error checkers for a bit-parallel multiplier. 206

5 Error Detectio Capability L1 D I this sectio, our error model ad the probability of a udetected error of the SIE scheme are give. The frequecy of the check poits is also discussed. row 1 row i row (m 1) bi L2 A (a) Bit-serial (b) Bit-parallel C b0 b1 b2 b Figure 2. Polyomial-basis multiplicatio 4.4 L Code Ecoders ad Checkers Ecoders, decoders ad/or checkers of liear codes are well studied i the literature, e.g., see [7] for shift register based architectures. For ecodig, data (i.e., a elemet of GF (2 m ))ismultiplied by geerator polyomial, g(x). The ecoder ca be implemeted i serial or parallel fashios. I this work, we oly cosider the parallel oe, sice it is much faster. For parallel implemetatio of a ecoder, a parallel multiplier that multiplies the data by a geerator g(x) should be used. To check whether a -tuple at a certai locatio i the circuit is a codeword, a checker is placed at that poit. A checker basically divides the polyomial correspodig to the -tuple by the geerator polyomial g(x) of the L code ad if the divisio has a ozero remaider, a error sigal is give. Agai, checkers ca be implemeted i serial or parallel fashios. For parallel implemetatio, a parallel divider ca be used. L3 C 5.1 Error Modellig The error model i this work is a bit-flip model. To illustrate the model, suppose that the error free value of a locatio, say L, of a polyomial basis multiplier is a -tuple, say v =(v 0,v 1,,v 1 ). A error vector is also a - tuple, say e =(e 0,e 1,,e 1 ). The umber of possible errors is 2 1. The erroeous value of the locatio L is v e = v + e, where + is bitwise XOR. I other words, a error is a modulo-2 additive term at a certai locatio of a PB multiplier ad the i th bit of the error vector e beig oe implies that the i th bit of the value of the locatio L has chaged from 0 to 1 or vice versa. If the locatio is oe of the modules (SR, or ), without loss of geerality we ca assume that the error vector should be XORed with the output of the compoet. Note that the ecoders ad checkers should be fault free or at least self-checkig [9]. Sice i practice the umber of redudat bits, m, is expected to be much less tha the size of the iput operads of the multiplier, m, theselfcheckig techique is feasible. Therefore, i this work, we assume that these ecoders ad decoders are fault free or self-checkig. I the followig, we ivestigate what kid of errors could ot be detected by this scheme. 5.2 Probability of a Udetected Error For the purpose of error detectio, a received -tuple should be checked if it is still a codeword or ot. Therefore, based o our error model, ay ozero error that is a multiple of the geerator polyomial g(x) caot be detected. Let the probability of error detectio ad the probability of a udetected error be referred to as Pr D ad Pr U,respectively. Clearly, Pr D =1 Pr U. Suppose W i is the umber of codewords of weight i i a (, m) L code, i.e., W i is the umber of codewords that cotai i oes. The probability of a udetected error ca be computed usig such weight distributio of the code. As metioed, a udetected error occurs whe the error vector is amog oe of the ozero codewords. Thus, Pr U = W i p i (1 p) i, i=1 where p is the probability of a bit of error vector beig oe. The weight distributio is kow for some special codes such as Hammig codes; however; the distributio is ot 207

kow for the oe we use i this work. Hece, a closed form for Pr U caot be obtaied ad the probability of a udetected error is ivestigated by a simulatio-based faultijectio (the details of the fault-ijectio are skipped for brevity). Figure 3 shows the result of our simulatio for (167, 163), (169, 163) ad (171, 163) L codes. Probability of a Udetected Error 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.0 0.1 0.2 0.3 0.4 0.5 p (171,163) L code (169,163) L code (167,163) L code Figure 3. Probability of a udetected error vs. p A well-kow upper boud for the probability of a udetected error for some (, m) codes such as Hammig codes is 2 ( m). Here, the umbers of redudat bits are 4, 6 ad 8, ad the dashed ad dotted lies i Figure 3 show the values 2 4, 2 6 ad 2 8, respectively. As it ca be see i the figure, the values of Pr U are either smaller tha or quite close to the bouds for all three cases. 5.3 Frequecy of Check Poits Suppose that there are several multiple-bit errors i a locatio of the circuit of a PB multiplier. For havig a error detectio capability Pr D as discussed i previous sectio, each of the above metioed locatios i Sectio 4.3 should have a parity checker. This requires a very high area overhead especially for bit-parallel multipliers. The followig lemma helps us reduce the umber of checkers cosiderably. Lemma 1 Suppose oly a maximum of oe multiple-bit error occurs per roud of a bit-serial multiplier or per row of a bit-parallel multiplier (see Figure 2). The ay such error ca be detected with the probability Pr D, discussed i Sectio 5.2, usig a parity checker at L3 of the bit-serial multiplier or a parity checker before the vertical iput of every ad oe parity checker after the fial i the bit-parallel multiplier. Proof 1 The proof is skipped for brevity. 6 Double-Iput Ecodig (DIE) Havig oly oe iput of the PB multiplier ecoded ca be of cocer. If the secod iput of the multiplier becomes erroeous, it caot be detected. Oe way to improve this situatio is to ecode both iput operads. I geeral, the geerators for ecodig iputs ca be differet. However, there are some issues with regard to choosig the geerators that eed to be dealt with ad they are briefly discussed i Sectio 6.2. 6.1 Polyomial Basis Multipliers with Ru-Time Error Detectio Capability I the double-iput ecodig, iput A is ecoded by the geerator g 1 (x) ad B by g 2 (x), where these two geerator polyomials eed ot to be differet. Let C = A B mod f(x), where f(x) is the field defiig polyomial. Multiplyig each side by g 1 (x)g 2 (x), we obtai: Cg 1 g 2 = ABg 1 g 2 mod fg 1 g 2. Hece, E g1g 2 (C) = E g1 (A)E g2 (B)modF(x), where F(x) =f(x)g 1 (x)g 2 (x) ad E g (Z) implies that Z is ecoded by geerator g. Let the degrees of g 1 (x) ad g 2 (x) be r 1 ad r 2, respectively. Clearly, the degree of F(x) is N = m + r 1 + r 2. A SR module ca be costructed usig (7) ad by replacemets of F (x) ad with F(x) ad N, respectively. To costruct a bit-serial multiplier ad/or a bit-parallel multiplier with ru-time error detectio capability, we use updated versios of SR,, ad modules i a very similar maer as show i Figure 2. Here, the umber of rouds of the bit-serial multiplier ad the umber of rows of the bit-parallel multiplier are m + r 2 each. 6.2 Error Detectio Usig DIE Like Sectio 5.1, here, the bit-flip error model is assumed. For the purpose of error detectio, checkers that use the geerator g 1 are placed i the same locatios as discussed i Sectio 4.3. If there is o error i the circuit, the the output value of the last checker that uses the geerator g 1 is Cg 2 = ABg 2. Therefore, oe more checker that uses the geerator g 2 should be placed at the output of the last checker. The, the fial result of the multiplicatio is the output of the checker that used the geerator g 2. Assumig that oly a maximum of oe multiple-bit error occurs per roud of a bit-serial multiplier or per row of a bit-parallel multiplier, we have: 208

if a error occurs o iput B ad the error is a multiple of g 2, it caot be detected. if errors occur o iput A ad/or iside the PB multiplier ad they are ot multiples of g 1, they are detected. If they are multiples of g 1 but the output of the last checker that used geerator g 1 is ot a multiple of g 2, the errors are detected as well. Otherwise, they are ot detected. Note that g 2 ca be preferably chose such that its degree is smaller tha that of g 1. Polyomial g 2 is maily used for detectig errors i iput B although it affects the error detectio of the etire multiplier circuit. Furthermore, this choice decreases the area overhead of the scheme. 7 Aalysis of Time ad Area Overheads I this sectio, area ad time overheads of the SIE ad the DIE error detectio schemes are ivestigated. We used the NIST recommeded field defiig polyomials for ECDSA f(x) =x 163 +x 7 +x 6 +x 3 +1 for our bitserial implemetatios. Furthermore, the code polyomial for the SIE scheme was of degree eight ad two code polyomials required for the DIE scheme were of degrees eight ad three. We described the scheme by VHDL to obtai a realistic approximatio of the area ad the time overheads. We used Modelsim to simulate the desig for checkig its correct fuctioality ad we implemeted the scheme o a Xilix Sparta 3 (XC3S5000) FPGA usig Xilix ISE 7.1i. Bit-serial implemetatios Overhead SIE DIE area (%) 39.71 52.94 clock cycle 0 r 2 =3 clock period (%) 1 0 0 latecy (%) 0 1.84 1 ca be cosidered as throughput overhead. Table 1. The time ad the area overheads of the bit-serial implemetatios of the SIE ad the DIE schemes The area overhead ad the time overhead (clock period overhead or latecy overhead) of the bit-serial implemetatios of the SIE ad the DIE schemes for a polyomial basis multiplier are give i Table 1. As expected, DIE has higher area overhead tha SIE. Additioally, both schemes have lower area overheads tha that of the covetioal dual modular redudat system. Moreover, the time overhead of SIE has bee observed to be egligible ad the time overhead of DIE is also very small. Therefore, oe ca choose ay of the above metioed implemetatios based o the area overhead, time overhead ad/or error detectio capability. 8 Coclusios This article presets two schemes for detectio of multiple-bit radom errors i biary polyomial basis multipliers usig liear codes. Based o our simulatio, the probability of a udetected error for the sigle-iput ecodig scheme is approximately 0.004 with eight redudat bits i the codewords. Furthermore, the overheads of the error detectio schemes for bit-serial implemetatios are lower tha the overhead of the dual modular redudat scheme for a sufficiet umber of redudat bits. Additioally, the time overheads of the schemes have bee observed to be small, i.e., less tha 2%. Ackowledgmets This work was supported i part by a NSERC grat awarded to Dr. Hasa. The authors also would like to thak Dr. Miguel F. Ajos for lettig them ru part of the simulatio o his computer. Refereces [1] S. Bayat-Sarmadi ad M. A. Hasa. O cocurret detectio of errors i polyomial basis multiplicatio. IEEE Tras. VLSI, 15(4):413 426, April 2007. [2] D. Boeh, R. Demillo, ad R. Lipto. O the improtace of checkig cryptographic protocols for faults. I Proc. It l Cof. Eurocrypt, pages 37 51. Spriger-Verlag, 1997. [3] M. Ciet ad M. Joye. Elliptic curve cryptosystems i the presece of permaet ad trasiet faults. Desigs, Codes ad Cryptography, 36(1):33 43, July 2005. [4] S. Fe, M. Gossel, M. Beaissa, ad D. Taylor. Olie error detectio for bit-serial multipliers i GF (2 m ). J. Electroics Testig: Theory ad Applicatios, 13:29 40, 1998. [5] G. Gaubatz ad B. Suar. Robust fiite field arithmetic for fault-tolerat public-key cryptography. I Proc. Workshop FTDC, pages 196 210, 2006. [6] G. Gaubatz, B. Suar, ad M. G. Karpovsky. No-liear residue codes for robust public-key arithmetic. I Proc. FTDC Workshop, pages 173 184, 2006. [7] W. W. Peterso ad E. J. Weldo. Error Correctig Codes. MIT Press, Cambridge, MA, 2d editio, 1972. [8] D. Pradha ad M. Chatterjee. GLFSR-a ew test patter geerator for built-i-self-test. I Proc. It l Test Cof., pages 481 490, 1994. [9] T. Rao ad E. Fujiwara. Error-Cotrol Codig for Computer Systems. Pretice Hall, 1989. [10] A. Reyhai-Masoleh ad M. A. Hasa. Fault detectio architectures for field multiplicatio usig polyomial bases. IEEE Tras. Comp., 55(9):1089 1103, 2006. [11] S. B. Wicker ad V. K. Bhargava, editors. Reed-Solomo Codes ad Their Applicatios. Joh Wiley, NY, 1999. 209