JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 1847 Fast Algorthm of A 64-bt Decmal Logarthmc onverter Ramn Tajallpour, Md. Ashraful Islam, and Khan A. Wahd Dept. of Electrcal and omputer Engneerng, Unversty of Saskatchewan, Saskatoon, Saskatchewan, anada Emal: rat177@mal.usask.ca, md142@mal.usask.ca, khan.wahd@usask.ca Abstract The paper presents an effcent algorthm to compute base- logarthm of a decmal number. The algorthm uses a 64-bt floatng-pont arthmetc, and s based on a dgt-by-dgt teratve computaton that does not requre look-up tables, curve fttng, decmal-bnary converson, or dvson operatons. It s the frst FPGA prototype of ts knd that uses a 64-bt (decmal 16-dgt) precson. Two numercal examples have been presented for the purpose of llustraton. The algorthm produces very accurate result wth a maxmum absolute error of 3.53x - 14. The archtecture s ppelned and mplemented on to the Xlnx Vrtex2p FPGA. It costs 6,752 logc cells, outputs at a mnmum rate of 51 mega-samples/sec, and consumes 125.7 mw of power. The scheme s very sutable for tmng and accuracy crtcal applcatons and complant wth the IEEE754-2008 standard (decmal64 format). Index Terms Decmal logarthm converter, floatng-pont arthmetc, teratve computaton, IEEE754-2008 I. INTRODUTION Elementary functons such as the logarthm and the exponental operatons have become very useful n many applcatons such as, fnancal analyss, tax calculaton, nternet based applcatons, and ecommerce [1], where these operatons are used to avod hardware-expensve multplcaton and dvson operatons. In the past, several hardware-effcent methods have been proposed for computng the base-2 logarthm of bnary numbers [4][5][11]-[16]. However, after the ncluson of decmal floatng-pont (FP) operaton n the latest IEEE754-2008 standard [6], more researchers have devoted ther effort n developng decmal FP algorthms and archtectures to effcently compute logarthms [7][17], exponentaton [28], trgonometrc operatons, etc. A study has shown that 55% of the numbers stored n the database of 50 bg organzatons s decmal [21]. There are several software packages avalable to customer to compute decmal numbers usng decmal arthmetc to mnmze error [23], but the softwaremplemented decmal arthmetc requres much longer tme to execute than the hardware verson [1], whch led momentum to ts mplementaton n hardware. IBM has Manuscrpt receved January 1, 2009; revsed June 1, 2009; accepted July 1, 2009. opyrght credt, project number, correspondng author, etc. recently mplemented decmal FP archtecture n ther POWER6 [3][19], z9 [29], and z mcroprocessors [20]. Several decmal archtecture of mult-operand carry-save adder [24][25], carry look-ahead adder [26], parallel BD adder [27], sgned-dgt adder [18], etc. have been proposed. There are several applcatons whch requre the drect computaton of decmal (or radx-) logarthm, such as, to measure the ph n chemstry, the earthquake ntensty n Rchter scale, the optcal densty n spectrometry and optcs, the brghtness of stars n astronomy, etc. [2]. Moreover, the radx- logarthm s wdely used n computng the rato of voltage and power levels (called bel) n telecommuncatons, electroncs and acoustcs. In most base- logarthmc converters, the decmal nput s frst converted to bnary followed by base-2 logarthm computaton; after the completon, the result s converted back to decmal radx these back and forth conversons of bases ntroduce errors on the system. A generalzed teratve algorthm to compute base-k logarthm has been presented n [8]; however, the dvson operaton n that work lmts the performance by ncurrng erroneous computaton. Moreover, the use of lookup tables and the lack of user control on the number of teraton make ths algorthm very neffcent for hardware mplementaton. We have recently presented a 32-bt decmal logarthm (n short, log) converter [22]. Whle the decmal32 format, as defned n the IEEE754-2008 standard [6], s only used for storage, the decmal64 and decmal128 are used for more accurate decmal computaton. Beng motvated by the fact, n ths paper, we extend the algorthm to compute the radx- log usng decmal64 precson, whch s the frst FPGA prototype of ts knd. The algorthm s based on a dgt-by-dgt teratve computaton that does not requre error correcton crcutry, look-up tables, curve fttng, or dvson operatons. The number of teratons of the log converter depends on the user defned precson. The prevous 32- bt desgn [22] suffers from hgh latency (e.g. 40 clock cycles) due to an neffcent power- algorthm and unppelned operaton. Here, we present a very effcent power- module wth a ppelned archtecture that may take only 4 clock cycles to produce the log result wth a hgh process throughput of 51 mega-samples/sec. The error analyss shows that the proposed scheme produces very accurate result. The archtecture s developed based do:.4304/jcp.5.12.1847-1855
1848 JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 on 64-bt bnary coded decmal (BD) representaton and complant wth the IEEE decmal64 FP standard. The paper s organzed as follows: Secton II presents the background. In secton III, the dgt-by-dgt algorthm s presented. The pseudo-code of the algorthm and two examples are also presented for llustraton. A detaled hardware mplementaton s dscussed n secton IV, where the descrpton of dfferent nternal modules s presented. Secton V dscusses the performance analyss wth a comparson of hardware among related log converter desgns. The paper s concluded n secton VI. II. BAKGROUND The general form of any postve number, L can be expressed as: 2 1 0... 2 1 0 (1) 1 2 1R 2R... L R R R R Where, R s the numercal base, and s the coeffcent for the th power of that base, rangng from 0 to R 1. For a decmal base, R equals to. After takng the logarthm of any postve decmal number, P, (1) results n the followng (2): Llog P 1... 1... 1 0 1 1 0 1 2 2..., P 0 The coeffcents n (2) can now be dvded nto two categores: nteger (or character :...,,,...,, ) 1 1 0 and fracton (or mantssa : 1, 2, 3,...). The procedure to compute these coeffcents s descrbed n the followng secton. III. ALGORITHM FOR DEIMAL64 LOG A. Decmal64 format n IEEE754-2008 The IEEE 754-2008 decmal FP arthmetc supports the decmal32, decmal64, and decmal128 computaton and data nterchange formats, and mplements all the operatons and conversons [6]. The basc decmal FP format s llustrated n Fg. 1. The Sgn s a 1-bt feld and ndcates the sgn of the number where S s 0 or 1. The combnaton feld s a w+5-bts feld that encodes two most sgnfcant bts (MSBs) of the exponent and the most sgnfcant dgt (MSD) of the coeffcent. The Not-a-Number (NaN) and Infnte number (Inf) are ndcated n the ombnaton Feld. The based exponent s a w+2 bt quantty, where the value of the frst two bts of the based exponent taken together s 0, 1, or 2. The whole encoded exponent s an unsgned bnary nteger wth the largest unsgned value. The value of the exponent s calculated by subtractng an (2) exponent bas from the value of the encoded exponent, to be able to represent both negatve and postve exponents. Fgure 1. Decmal floatng-pont number format n IEEE754-2008 The Talng Sgnfcand Feld (3j x bts) s formed by appendng the decoded contnuaton dgts (j-bt) as a suffx to the most sgnfcant dgt (MSD) derved from the combnaton feld. Each -bt group represents three decmal dgts, usng Densely Packed Decmal (DPD) encodng [30]. The format encodes a total of p=3j+1 decmal dgts, where p = the number of dgts n the sgnfcand (precson). For decmal64 format: w = 8; j = 5, exponent bas = 398, and p = 16. B. Proposed teratve algorthm The IEEE754-2008 standard [6] defnes any nonnormalzed unsgned decmal fracton as: d0. d1d2d3... d 15, where 0d 9. To be compatble wth the standard, we extend our nput decmal number, P as follows: ( 1) S a P P c (3) Where, S s the sgn, a s the exponent, and P c s the coeffcent (n nteger form). Takng a 64-bt log of (3) results n (4): Llog ( P) log ( ) log ( P c ) (4) a Here, 000000000000001P c 999999999999999. The computaton of log ( P c ) follows the teratve algorthm. It starts wth the computaton of the upper lmt of, called max, whch s the number of mantssa dgts desred n the fnal converted answer. max s set by the user and defnes the number of teratons. In order to perform the ntal range reducton, we extend (4) further as gven below: Llog ( P) ab log ( k) (5) Where, b (range: 1 b15) s the characterstcs of log ( P) and s obtaned by detectng leadng zeros; k s a decmal fracton (range: 0.1k 1). After separatng a, and combnng (2) and (5), we get the followng (6): 2 1 b 1 P (6) The ntermttent data s accumulated nto a temporary varable, A, where 1 A :
JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 1849 2 1 P 1 A (7) b Ths dvson (by ) operaton can be easly mplemented by rght shftng the nput dgts. In order to determne the fractonal parts (e.g., 1, 2, 3,... ), we take the power- of (7) as shown below: A ( ) 1 1 2 1 1 1 Now (8) has a structure smlar to (7). The frst mantssa coeffcent, 1, s computed by smply countng the number of ntegers n (8). The temporary value (stored n A ) undergoes another power range reducton and s accumulated back nto A. The process contnues for the remanng mantssa coeffcents untl the number of teraton reaches max, set earler by the user. For cases where the nput les between 0 and 1, the log produces negatve result. Interestngly, the proposed algorthm s capable of handlng such cases. For ths purpose, we frst adjust he decmal pont as follows: (8) P m n, where m1, n 0 (9) Takng the 64-bt radx- log n both sdes of (9) leads to the followng (): L log P n log m () log Now, the computaton of m follows the procedure descrbed n (5) (8).. Pseudocode of the algorthm The pseudo-code of the proposed algorthm summarzed below and llustrated n Fg. 2: 3. Detect the number of ntegers n A 4. Start computng the mantssa coeffcents, 5. ompute power- of A, and perform power range reducton 6. Decrease by 1 7. If > 0, repeat steps 3-6 D. Examples In order to better llustrate the algorthm, we present two examples n the followng secton: 1) Determne the logarthm of decmal number, P = 123456789.123456 up to three fractonal dgts. The computaton steps are as follows: Here the user sets the number of fractonal bts; so, max 3; hence, the computaton process wll contnue up to the computaton of, and the fnal 3 answer wll be n the format: 0. 123 The number of nteger n P: N 9 ; hence, 0 (9 1) 8 Rght shft the dgts of P by eght dgts: A( P8) 1.23456789123456 ompute power- of A and accumulate the result: A A (1.23456789123456) 8.2252626737231264115782140871879 ompute the number of ntegers n A: N 1; hence, 1 (1 1) 0 Rght shft the dgts of A by zero dgt: A (8.2252626737231264115782140871879 0) 8.2252626737231264115782140871879 ompute power- of A and accumulate the result: A A (8.2252626737231264115782140871879) 1417417401.8443859394911136893318 ompute the number of ntegers n A: N ; hence, 2 ( 1) 9 Rght shft the dgts of A by nne dgts: A (1417417401.8443859394911136893318 9) 1.4174174018443859394911136893318 ompute power- of A and accumulate the result: A A (1.4174174018443859394911136893318) 32.732381445138501594634638993872 Once agan, compute the number of ntegers n A: N 2 ; hence, 3 (21) 1 The teraton stops, and the fnal answer s: L 8.091 Fgure 2. Flow graph of the decmal log algorthm 1. Read decmal nput, P and the number of mantssa n the log result, 2. Transfer P to A and perform ntal range reducton 2) Determne the logarthm of decmal number, P = 0.00000123456789 up to two fractonal dgts. The computaton steps are as follows: Here the nput s a fracton ( 0 P 1); so, max (2 1) 3 ; hence, the computaton process wll
1850 JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 contnue up to the computaton of, and the fnal 3 answer wll be n the format: 0. 12 3 6 Adjust the decmal pont: P 1.23456789 ; m = 1.23456789, n = -6 The number of nteger n P: N 1; hence, 0 (11) 0 Rght shft the dgts of P by zero dgt: A( P0) 1.23456789 ompute power- of A and accumulate the result: A A (1.23456789) 8.22526259147257950476114366154 ompute the number of ntegers n A: N 1; hence, 1 (11) 0 Rght shft the dgts of A by zero dgt: A (8.22526259147257950476114366154 0) 8.22526259147257950476114366154 ompute power- of A and accumulate the result: A A (8.22526259147257950476114366154) 1417417260.35587702142524239761 ompute the number of ntegers n A: N ; hence, 2 ( 1) 9 Rght shft the dgts of A by nne dgts: A (1417417260.35587702142524239761 9) 1.41741726035587702142524239761 ompute power- of A and accumulate the result: A A (1.41741726035587702142524239761) 32.7323487129826281766581287077 Once agan, compute the number of ntegers n A: N 2 ; hence, 3 (2 1) 1 The teraton stops, and the fnal answer s: L 60.091 5.909 core archtecture s developed usng unsgned BD representaton wth an nternal precson of 16 dgts (64- bt bnary). The DPS (decmal-pont separator) module detects and separates the DP, and then stores the unsgned magntude to a temporary regster. The DP follows a separate path (DP Accumulator DP Update) that s parallel to the core computaton. The DP Update module tracks the poston of the decmal-pont and updates t after every computaton step. Fgure 3. Block dagram of the entre system Thus, t can be seen that the algorthm does not requre any lookup tables, curve fttng, FP dvson operatons, or error correcton crcutry. The followng secton descrbes the hardware mplementaton of the proposed scheme. IV. HARDWARE IMPLEMENTATION The archtecture of the radx- log converter s shown n Fg. 3. It conssts of two major unts both connected to a controller: Synchronous regster-ounter and ore unt. The converter accepts two nputs: a 16-dgt decmal number (P, n BD) ncludng the decmal pont (nputted as hex ) and the desred number of dgts after the decmal radx pont (, n bnary) n the fnal log result. Dependng on, a down counter s set whch defnes the number of teratons. The core unt performs the fundamental computaton supervsed by the ontroller. The wdth of the data lnes n all the followng fgures s n decmal dgt, unless otherwse specfed. The archtecture of the core unt s shown n Fg. 4. It does not show the nteracton wth the controller. The Fgure 4. Archtecture of the core unt The 16-dgt unsgned nput s passed to the Zero Detector (ZD) module that determnes the number of ntegers, whch s the frst dgt (or coeffcent) of the fnal log result. The coeffcent s updated at the same tme n the oeffcent Update module. The rght-shft (RS) operaton to be performed on A s also acheved at the same clock cycle by smply updatng the poston of the DP n the DP Update module. The ntermttent data s passed nto the Power- Unt and fed back to ZD for further processng. The data flow to ZD s controlled by
JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 1851 the controller (lnes not shown) through a 2:1 multplexer. The process contnues untl the counter reaches to zero and the controller then stops the computaton. The fnal result s accumulated n the Result Update unt after combnng the outputs of oeffcent Update and DP Update. A. Power- archtecture The Power- module s a key unt of the log converter and the accuracy of the fnal result greatly depends on ts effcent mplementaton. Because of ts complexty, we have explored several optons based on dvde and conquer algorthm to effcently mplement the unt whch are shown n Fg 5. onsderng the tradeoff between hardware cost and speed, we have chosen opton 3 for our mplementaton. The algorthm s based on a recursve powerng that requres one parallel multpler unt and 4 cc to complete. Ths s a sgnfcant mprovement over the prevous mplementaton [22] whch had taken 40 cc for such computaton. multplcaton stage. The tmng dagram for each clock pulse and the selecton/control sequence are shown n Fg 6(b) [here, n ndcates the nstance of the clock pulse at any gven tme, t]. X 0 1 A B Sel Decmal Mult En Temp Latch (a) Acc. Output Latch X (b) Fgure 6. (a) Archtecture of the Power- unt; (b) Tmng dagram (a) Opton 1 (b) Opton 2 Mult unt 1 X X 2 X 4 X 8 X Latency - 4 cc (c) Opton 3 Fgure 5. Power- algorthms and the cost of mplementaton The overall archtecture of the Power- unt s shown n Fg. 6(a) where the wdth of all data lnes s 16-dgt. It conssts of a 16-dgt combnatonal multpler, an accumulator, and a few latches. The selecton bt (Sel) dctates the multplcaton operaton: 0 for A*A; 1 for A*B. A key step of the proposed algorthm s to count the number of ntegers before the decmal pont to evaluate the coeffcents,, whch may take any value between 0 and 9 (where, 0 ). Insde the multpler, the most sgnfcant 16 dgts are retaned and accumulated for further processng. The data flow s controlled usng a 2:1 multplexer by the controller (lnes not shown). After the frst multplcaton stage, the output (.e., X 2 ) s stored n a temporary latch so that t can be used later at the fourth B. 16-dgt decmal multpler Several effcent methods for decmal multplcaton have been proposed n the past [9][]. Here, we have used a general purpose 16-dgt combnatonal multplcaton algorthm whch s a modfed verson of [9]. Ths s another mprovement over the prevous 32-bt desgn [22] where a sequental multpler was used. The multpler archtecture (as shown n Fg. 7) s optmzed to reach the desred throughput. The multpler nput s recoded and the partal products are frst kept n a redundant format and then accumulated by a tree of redundant adders. Fnally the 32-dgt product s obtaned by convertng the carry-save tree s outputs nto BD format. The presented combnatonal archtecture results n low latency and that s why t s chosen for the log converter. The product, p s gven below, where A and B are the sgned multpler and the multplcand respectvely: n1 p AB. AB, B[0,9] (11) 0 In order to make computaton smple, B s recorded nto two groups Bh {0,5,} and Bl { 2,1,0,1,2}, where B Bh B. Negatve l numbers are represented n radx- complement and are mplemented by performng the 9 s complement of the BD dgt and addng a 1 wth the least sgnfcant dgt. In order to compute 2X (two tmes nput), we frst duplcate each dgt of the nput and record t usng 5 bts
1852 JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 one carry (1 bt) and one dgt (4-bts). For a 16-dgt nput, t generates 16 dgts and 16 carry bts. In the second step, the carry bts are added; the carry s not propagated to the next dgt. The generaton of 5X (fve tmes nput) s performed by frst computng X (ten tmes nput) and a smple dvson operaton. The overall archtecture of the partal product generator (PPG) s shown n Fg 8. where c S u (12) 1 1 0 c c1 f u 1 f u. otherwse 1 U 1 Fgure 7. 16-dgt general purpose BD multpler S Fgure 9. Fnal converson to BD usng sgned-dgt adder 5:1 MUX V. PERFORMANE EVALUATION The archtecture of the 64-bt log converter has been prototyped usng Verlog and syntheszed onto Xlnx Vrtex2p FPGA (xc2vp30ff1152-7). The breakdown of the cost of dfferent unts s shown n Table I. It can be seen that the Power- unt (ncludng a combnatonal decmal multpler) consumes the most resources of the entre system. 3:1 MUX Radx- Adder TABLE I. HARDWARE OST OF THE PROPOSED 64-BIT DEIMAL LOG ONVERTER Module Submodule Bt length Reg. Logc cells ore Unt Others 64 269 213 (nc. Decmal Power-) mult 64 0 6,403 ontroller and others 16 73 136 Total 64 342 6,752 Fgure 8. Block dagram of the partal product generator (PPG) The partal products are added usng adder tree and converted to BD. For ths converson, we have used an effcent sgned-dgt decmal adder [18] whch has the beneft of carry-free addton; however a carrypropagaton adder (PA) must be used to transform the sgned-dgt sum nto an unsgned sum. The operaton s shown n Fg. 9 and descrbed as follows by (12): A. Error analyss For the targeted 64-bt decmal FP applcatons, the proposed log converter must be able to acheve the mnmum accuracy (.e., 32-bt bnary precson as defned n [31]) to guarantee correct operaton. In order to compute the maxmum error of the converted log result, we have performed an error analyss, where a long test vector comprsng of 0 16-dgt postve decmal numbers (rangng from 0000000000000000 to
JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 1853 9999999999999999 wth arbtrary poston of decmal pont) s used. The error s computed takng a precson of 22-dgt (wrtten n Matlab) as reference. Fg. shows the normalzed error plot (n log scale) for only fve arbtrary samples for dfferent nternal dgt precson. In the x-axs, the number of decmal dgts retaned after one power- teraton s shown. It can be seen from the plot that the algorthm produces less error f hgher number of dgts s retaned. The proposed archtecture s based on a 64-bt bnary precson, and thus can only handle up to 16-decmal dgts. As a result, the error, as seen n the plot, s fxed after 16 dgts. The maxmum normalzed error at ths precson s estmated to be 3.53 14. Normalzed error (n log scale) 2 0-2 -4-6 -8 - -12 7411478522.589632 1234567891236547 987.4563232152 4567124587954.823 242.5816395012042-14 4 6 8 12 14 16 18 20 22 Number of decmal dgt retaned Fgure. Normalzed error of log converter for dfferent precson B. Hardware comparsons Table II compares the results of the proposed log converter wth other smlar desgns. In the cases, where the nformaton of logc cells s not avalable, we have computed t from the count of slces (e.g., one slce s equvalent to two logc cells). The latency ndcates the mnmum number of clock cycles requred to produce decmal 1-dgt output. In all cases, smlar Xlnx FPGA technology s used and the maxmum absolute error along wth the number of dgt accuracy s presented. We start wth two bnary desgns [12][13] whch gve us a rough estmate about the relatve comparson between bnary and radx- desgns. Note that, the results presented n [12] and [13] are based on the synthess of HDL code that was orgnally generated automatcally by ++ program. The work n [7] s based on a curve fttng (lnear approxmaton) algorthm. Due to the use of look-uptable (.e., ROM mappng), the desgn takes only 1 cc to produce the output, but the crude approxmaton algorthm results n large error (e.g. maxmum absolute error s 0.09 wth only 3 dgt accuracy). There are 16 partton regons used to acheve such accuracy and a complex error-correcton crcutry s requred at the end. The work n [17] s based on a dgtal recurrence algorthm. Wth the use of large look-up-tables and complex mappng, the error s largely mnmzed, but at the expense of low operatonal frequency and reduced throughput (e.g. latency s 18 cc), whch makes the scheme unsutable for tme crtcal applcatons. The decmal dgt accuracy s 14, whch s stll lower than the proposed algorthm. [17] also dscusses brefly the extenson to 64-bt desgn, but the cost of actual FPGA mplementaton s not reported. As a result, we have estmated the cost from the 7-dgt core. In [22], the authors have presented an teratve scheme wth a sequental multpler unt that results n consumng relatvely low recourses; however the sequental nature of the archtecture and the un-ppelned operaton lmt the performance by yeldng a very large latency and low throughput. ompared to all exstng desgns, the proposed scheme has much lesser computaton error and hgher dgt accuracy (very naturally as t uses hgher precson), lesser hardware cost, and hgher frequency of operaton. ompared to [22], the hardware cost of the proposed 64- bt scheme s hgher (and so s the estmated power consumpton) because of the two followng reasons: (1) use of a combnatonal multpler; (2) use of a much hgher dgt precson. However, the (mnmum) latency s 4 cc whch makes the proposed scheme very sutable for tme and accuracy crtcal applcatons. The computaton algorthm s generalzed and scalable, whch means that the archtecture can be extended for decmal128 format wthout causng large ncrease n the complexty and hardware cost ths s another advantage of the proposed scheme. As an example, for complance wth the IEEE754-2008 decmal64 format, [17] requres a sgnfcant ncrease n two LUTs from 14-dgt to 34- dgt, and moderate ncrease n other processng blocks wth a large ncrease n latency by at least two tmes. VI. ONLUSION The paper presents a fast algorthm and effcent mplementaton for computng decmal logarthm usng 64-bt floatng-pont arthmetc that comples wth the IEEE754-2008 standard. The algorthm s based on a dgt-by-dgt teratve computaton that does not requre look-up tables, curve fttng, decmal-bnary converson, or dvson algorthms. The fnal logarthmc output s very accurate wth a maxmum absolute error of 3.53x - 14 ; no correcton or roundng crcutry s requred that makes the scheme sutable for tmng and accuracy crtcal applcatons. The archtecture s generalzed and scalable can be extended for decmal128 format. Future research s drected towards such extenson, as well as the VLSI mplementaton of the algorthm.
1854 JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 Base Btlength (bnary) TABLE II. HARDWARE OMPARISONS OF DIFFERENT LOG ONVERTERS Scheme Slces Logc cells Freq. (MHz) Power (mw) Latency (cc) No. of dgt accuracy Max. abs. error Detry et al. 2 31 ROM 881 1,762 11.4 -- 30 -- -- [12] 1 Detry et al. [13] 1 2 31 ROM 1,893 2,736 14.5 -- 16 -- -- Dongdong 32 urve 999 1,998 50.9 8 1 3 0.9 x -1 et al.[7] fttng Dongdong 32 Dgt 2,842 5,684 47.7 -- 18 14 0.66 x -14 et al.[17] 2 recurrence Ramn 32 Iteratve 526 1,053 44 79 40 0.1 x -11 et al. [22] 3 Dongdong 64 4 Dgt 4,603 9,602 -- -- 36 -- -- et al.[17] recurrence Proposed 64 Iteratve 3,376 6,752 51 125.7 4 16 3.53 x -14 1 usng Xlnx VrtexII FPGA (xc2v00-4) devce; 2 the max. abs. error reported s based on theoretcal calculaton; 3 usng VrtexII FPGA (xc2v00-6) devce; 4 the estmated hardware count for 64-bt, snce the actual cost was not reported AKNOWLEDGMENT The authors would lke to acknowledge the Natural Scence and Engneerng Research ouncl of anada (NSER) for ts support to ths research work. REFERENES [1] M. owlshaw, Decmal Floatng-Pont: Algorsm for omputers, Proc. of the IEEE Symposum on omputer Arthmetc, pp. 4-111, 2003. [2] Wkpeda, [Onlne], Avalable: http://en.wkpeda.org/wk/logarthm, November 2009. [3] IBM Power6, IBM orporaton, May 2007. [4] J. Mtchell, omputer Multplcaton and Dvson Usng Bnary Logarthms, IRE Trans. Electron. omputer, pp. 512-517, 1962. [5] D. Kostopoulos, An Algorthm for the omputaton of Bnary Logarthms, IEEE Trans. on omputers, vol. 40, no. 11, pp. 1267-1270, 1991. [6] The IEEE Standard for Floatng-Pont Arthmetc (IEEE 754-2008), IEEE omputer Socety, Aug 2008. [7] D. hen, Y. ho, L hen, D. Teng, K. Wahd, S. Ko, A Novel Decmal-to-decmal Logarthmc onverter, Proc. of the IEEE Int. Symposum on rcuts and Systems, pp. 688-691, 2008. [8] H. Lo and J. hen, A Hardwred Generalzed Algorthm for Generatng the Logarthm Base-k by Iteraton, IEEE Trans. omputer, vol. -36, pp.1363 1367, 1987. [9] T. Lang and A. Nannarell, A Radx- ombnatonal Multpler, Proc. of the Aslomar onference on Sgnals, Systems and omputers, pp. 313-317, 2006. [] H.. Neto and M. P. Vestas, Decmal Multpler on FPGA Usng Embedded Bnary Multplers, Proc. of the Int. onf. on Feld Programmable Logc and Applcatons, pp.197-202, 2008. [11] M. Ercegovac, Radx-16 Evaluaton of ertan Elementary Functons, IEEE Trans. on omputers, vol. -22(6), pp. 561 566, 1973. [12] J. Detrey, F. Dnechn, and X. Pujol, Return of the Hardware Floatng-Pont Elementary Functon, Proc. of the IEEE Symposum on omputer Arthmetc, pp. 161-168, 2007. [13] J. Detrey and F. de Dnechn, A Parameterzable Floatngpont Logarthm Operator for FPGAs, Proc. of the 39th Aslomar onf. on Sgnals, Systems & omputers, pp. 1186-1190, 2005. [14] P. T. P. Tang, Table-drven Implementaton of the Logarthm Functon n IEEE Floatng-pont Arthmetc, AM Trans. on Mathematcal Software, vol. 16(4), pp. 378 400, 1990. [15]. Wrathall and T.. hen, onvergence Guarantee and Improvements for a Hardware Exponental and Logarthm Evaluaton Scheme, Proc. of the IEEE Symposum on omputer Arthmetc, pp. 175 182, 1978. [16] W. Wong and E. Goto, Fast Hardware-based Algorthms for Elementary Functon omputatons usng Rectangular Multplers, IEEE Trans. on omputers, vol. 43(3), pp. 278 294, 1994. [17] Dongdong hen, Yu Zhang, Younhee ho, Moon Ho Lee, Seok-Bum Ko, A 32-bt Decmal Floatng-Pont Logarthmc onverter, Proc. of the IEEE Symposum on omputer Arthmetc, pp. 195-203, 2009. [18] J. Rebacz, E. Oruklu, and J. Sane, Performance Evaluaton of Mult-Operand Fast Decmal Adders, Proc. of the IEEE Int. Mdwest Symposum on rcuts and Systems, pp. 535-538, 2009. [19] E. Schwarz and S. arlough, Power6 decmal dvde, Proc. of the IEEE Int. onf. on Applcaton-specfc Systems, Archtectures and Processors, pp. 128-133, 2007. [20]. Webb, IBM z: The next-generaton manframe mcroprocessor, IEEE Mcro, vol. 28, no. 2, pp. 19 29, 2008. [21] A. Tsang and M. Olschanowsky, A study of database 2 customer queres, IBM Santa Teresa Laboratory, San Jose, A, USA, Tech. Rep. TR-03.413, Apr. 1991. [22] R. Tajallpour, D. Teng, S-B Ko, and K. Wahd, On the Fast omputaton of Decmal Logarthm, Proc. of the IEEE Int. onf. on omputer and Informaton Technology, pp. 32-36, 2009. [23] BgDecmal, [Onlne], Avalable: http://java.sun.com/j2se/1.5.0/docs/ap/java/math/bgdec mal.html, November 2009 [24] R. Kenney and M. Schulte, Hgh-speed multoperand decmal adders, IEEE Trans. on omputers, vol. 54, no. 8, pp. 953 963, 2005. [25] I. D. astellanos and J. E. Stne, ompressor trees for decmal partal product reducton, Proc. of the 18th AM Great Lakes Symposum on VLSI, pp. 7 1, 2008.
JOURNAL OF OMPUTERS, VOL. 5, NO. 12, DEEMBER 20 1855 [26] A. Bayrakc and A. Akkas, Reduced delay BD adder, Proc. of the IEEE Int. onf. n Applcaton-specfc Systems, Archtectures and Processors, pp. 266 271, 2007. [27] L. Dadda, Multoperand parallel decmal adder: A mxed bnary and bcd approach, IEEE Trans. on omputers, vol. 56, no., pp. 1320 1328, 2007. [28] D. hen, Y. Zhang, D. Teng, K. Wahd, M. Lee, and S-B. Ko, A New Decmal Antlogarthmc onverter, Proc. of the IEEE Int. Symposum on rcuts and Systems, pp. 445-448, 2009. [29] A. Duale, M. Decker, H. Zpperer, M. Aharon, and T. Bohzc, Decmal Floatng-Pont n z9: An Implementaton and Testng Perspectve. Journal on IBM Res. and Dev., Jan 2007. [30] M. owlshaw, Densely Packed Decmal Encodng, IEEE omputers and Dgtal Technques, pp. 2-4, 2002. [31] The IEEE Standard for Bnary Floatng-Pont Arthmetc (IEEE754-1985), IEEE omputer Socety. Ramn Tajallpour completed hs B.Sc. n Electrcal Engneerng from Azad South Tehran Unversty n 2006. After graduaton wth help of two other as a group made a testng rng nstrument whch acheved a patent number and already s usng at well-known company, SAIPA automoble manufacturng company n Iran. Then n 2007, he joned to Qom power staton n Iran as an automaton expert and worked about 1.5 years. He started hs M.Sc. program wth Dr. Khan A. Wahd at Electrcal Engneerng department of Unversty of Saskatchewan n fall 2008. He s currently workng n Dgtal Systems Research Group and hs research nterest s n the feld of vdeo and mage compresson and processng, dgtal desgnng, FPGA, and real-tme embedded systems. Md. Ashraful Islam Md Ashraful Islam: Receved hs B.Sc. degree n Electrcal and Electronc Engneerng from the Bangladesh Unversty of Engneerng & Technology, Bangladesh n 2005.He s currently an M.Sc. canddate n Electrcal & omputer Engneerng Department at Unversty of Saskatchewan. Pror to that, he worked as a Lecturer n Southeast Unversty, Bangladesh, as a Rado planner n Grameenphone Ltd, Bangladesh & Telecom Malaysa n Bangladesh respectvely. He s a member of Dgtal Systems Research Group, at the ollege of Engneerng, Unversty of Saskatchewan. Hs research nterests nclude hgh-performance dgtal crcut, FPGA and ASI desgn, and VLSI archtectures for mage processng. He has desgned and co-desgned two I chps of area and power effcent archtectures n 0.18um MOS (TSM) technology that has been reported to hs publcatons. Khan A. Wahd earned hs B.Sc. degree from Bangladesh Unversty of Engneerng and Technology (BUET) n 2000. He receved hs M.Sc. (2003) and Ph.D. (2007) from the Unversty of algary. He was the recpent of numerous prestgous awards and scholarshps ncludng the most dstngushed Kllam Scholarshp and the NSER anada Graduate Scholarshp for hs doctoral research. Dr. Wahd has been workng as an Assstant Professor n the Department of Electrcal and omputer Engneerng at the Unversty of Saskatchewan snce July 2007. He has authored over 40 peer-revewed journal and nternatonal conference papers n the feld of dgtal arthmetc technques, FPGA and ASI desgn, real-tme embedded systems, vdeo and mage compresson, and bomedcal magng systems. He has been servng as a revewer for the IEEE Transactons on rcut and Systems for Vdeo Technology, Bomedcal Engneerng Onlne, EURASIP Journal on Sgnal Processng, and Elsever Journal on omputers and Electrcal Engneerng snce 2006. He s a regstered as a Professonal Engneer n the provnce of Saskatchewan, anada, and a Member of the Insttute of Electrcal and Electroncs Engneers (IEEE).