Logcal Effort of Carry Propagate Adders Davd arrs and Ivan Sutherland arvey Mudd College / Sun Mcrosystems Laboratores E. Twelfth St. Claremont, CA Davd_arrs@hmc.edu / Ivan.Sutherland@sun.com Abstract - A wde assortment of carry propagate adders offer varyng area-delay tradeoffs. Wrng and choce of crcut famly also affect the sze and performance. Ths paper uses the method of Logcal Effort to characterze the effects of archtecture, crcut famly, and wre capactance on adder delay. Domno logc offers about a % speedup on most valency- adders. Although Kogge- Stone adders are fastest n the absence of wre, other archtectures such as varants on the Sklansky adder offer regular layouts and better delay n the presence of wrng capactance. I. INTRODUCTION Fast adders are wdely used n CMOS crcut desgn. The lterature descrbes many adders ncludng rpple carry, carry lookahead, carry select [], carry skp [], carry ncrement [, ], Sklansky (condtonal sum) [], Brent-Kung [], Kogge- Stone [], Ladner-Fscher [], an-carlson [], and Knowles []. Each archtecture offers dfferent tradeoffs between delay, area, and wrng complexty. Analytcal delay models help desgners evaluate these tradeoffs, but smply countng logc levels s nadequate because crcut delay also depends on fanout and wre capactance. uang and Ercegovac [] used an RC delay model to evaluate the effect of archtecture and wrng capactance on the Sklansky, Kogge-Stone, and Knowles adder archtectures. The method of Logcal Effort [] bulds on the RC delay model to offer a convenent shorthand for understandng the effects of fanout and gate szng on delay. Dao and Oklobdzja [, ] appled ths method to a few adders and concluded that logcal effort predcted absolute delays wthn -% of SPICE. Ths paper apples logcal effort to understand the delay of eght dfferent adder archtectures that can be expressed as prefx computatons accordng to the notaton of []. The results show how adder delay depends on the number of nputs, the adder archtectures, the cost of nterconnect, and the crcut style. The model shows that most adder archtectures can use unform gate szes to acheve regular layout wth neglgble performance loss. An excepton s the Sklansky archtecture that has hghly rregular fanouts. Ths leads to a proposal for helper gates to construct very fast adders wth regular layouts and low wrng cost. II. LOGICAL EFFORT OF CIRCUIT BUILDING BLOCKS The three basc buldng blocks for an adder are the btwse Propagate/Generate (PG) cells, the group PG cells, and the sum XORs. gh performance datapath adders often buld these cells from domno gates whle statc CMOS s preferable when desgn smplcty and power consumpton take precedence over utmost performance. Statc CMOS btwse gates wll compute generate as G = A B and propagate as P = A + B. The sum s computed as S = ( A B) G :. Domno desgns requre monotonc nputs to the sum XOR. Ths s best done by calculatng btwse and group kll sgnals (K) and usng XOR for propagate so that P, G, and K are -of- hot. Defne the group PG cell nput comng from bts :k as the upper nput and that from k-:j as the lower nput. There are two types of group PG cells. Followng the notaton of [], we call the cells black cells and gray cells. Black cells compute both G :j and P :j as defned n EQ (). Gray cells compute only G :j. Black cells are requred when the cell output drves the upper nput of another group PG cell. The smpler gray cell may be used when the output drves only lower nputs or sum logc. Consder four crcut styles: nonnvertng statc CMOS, nvertng statc CMOS, footless domno, and footed domno. Fg shows the basc cell desgns. Invertng statc CMOS gates consst of a sngle stage of logc for each cell (except that the fnal XOR requres an nput nverter). Alternatng stages use alternatng polartes of nputs and outputs. Black cells contan both the group G and P gates whle gray cells have only the G gate. Nonnvertng statc CMOS gates add an output nverter to the btwse and group statc gates. Therefore, only the AND- OR and AND functons are requred for group G and P, respectvely. Footless domno gates computng -of- hot P, G, and K sgnals are shown n the second column. Each conssts of a dynamc gate followed by an I-skew nverter. Keepers and secondary precharge transstors are not shown. The group logc s shown for a black cell; a gray cell omts the P output. In the domno desgn, K : = G : so monotonc true and complementary versons of the carry sgnals are avalable at each fnal XOR. Footed domno gates are dentcal except for an extra seres clocked evaluaton transstor and greater transstor wdths to compensate. Transstors are annotated wth wdths measured n arbtrary unts so that each pulldown stack has unt effectve resstance. Table lsts the logcal effort and parastc delay of each cell nput for each crcut famly. The logcal effort LE s the rato of the nput capactance of the gate nput to the nput capactance ( unts) of an nverter wth the same unt effectve resstance. The parastc delay PD s estmated by countng the total transstor wdth on the output node, ---//$. IEEE
Fg Adder crcut buldng blocks III. ADDER ARCITECTURES Btwse A Invertng Statc CMOS B A G B B A A B P A _h Footless Domno A _l B _h A _h B _l A _l tny G P K P ' Adders are dstngushed by the arrangement of cells n the group PG logc. Fg shows eght such archtectures for N=. The upper box contans the btwse PG logc and the Table Logcal effort and parastc delay of adder crcut blocks Group Sum XOR G k-:j P :k G :k G :k G k-:j P :k G :j P :k P k-:j P :k P k-:j P :j G k-:j P k-:j P P G :k :k :k G:j P :j G P :k P k-:j :k G k-:j P :k P P G -: G -: G -: G -: G -: G -: P P S G :k G k-:j Pk-:j P :k P ' P G -: K k-:j K :k P P ' K -: G :j P :j K :j S _h S _l Cell Term Nonnvertng Invertng CMOS Footed Domno Footless Domno CMOS Btwse LEbt / / / * / / * / PDbt / + / / + / / + / Black Cell LEblackgu /. /./ * / / * / LEblackgl / / / * / / * / LEblackpu /. / / * / / * / LEblackpl /. / / * / / * / PDblackg / +. / / + / / + / PDblackp / + / / + / / + / Gray Cell LEgraygu /. /./ * / / * / LEgraygl / / / * / / * / LEgraypu / / / * / / * / PDgray / +./ / + / / + / Buffer LEbuf * / * / / * / * / / * / * / Sum XOR LExor / / / * / / * / PDxor / + / / + / / + / / + / assumng dffuson and gate capactance are approxmately equal. In domno and nonnvertng statc CMOS crcuts, the output nverter also contrbutes parastc delay. LE and PD are used n place of the usual symbols g and p to avod confuson wth generate and propagate. Notce that the black cell has four nputs: G :k, G k-:j, P :k, and P k-:j. These are denoted as the upper and lower generate and propagate sgnals, gu, gl, pu, and pl, respectvely, and each has a dfferent logcal effort. For nvertng statc CMOS crcuts, the logcal effort and parastc delay are the average of the two polartes. Some paths through the statc XOR gate nvolve only a sngle AOI stage whle others also nvolve the nverter. A conservatve estmate calculates the logcal effort for the sngle stage path based on the unts of nput capactance on the G nput. The parastc delay s largest for the two-stage path, consstng of / for the nverter to drve ts own dffuson parastcs and gate capactance of the second stage plus / for the dffuson parastcs on the second stage. In certan cases, buffers reduce the capactance presented by noncrtcal forks of the crcut. Assume these buffers have half the drve (twce the resstance) of an ordnary gate and hence half the nput capactance. For the purpose of branchng, the buffers therefore contrbute only half the capactance of a gate wth comparable logcal effort. If all cell szes are chosen to provde unt drve as wll be done n Secton, ths gves the correct delay through the path. If some cell szes are selected for mnmum delay, the logcal efforts should be the geometrc means of the efforts of the two polartes. In ths case, the average and geometrc mean are nearly dentcal, so the dstncton s unmportant. lower box contans the sum logc. In the mddle, the prefx tree s bult from black cells, gray cells, and whte buffers. The vertcal axs ndcates logc level and the crtcal path s ndcated wth a heavy lne. For example, the rpple carry adder n Fg a s slow for long addtons because the crtcal path propagates through N- gray cells. The crtcal path of each adder s descrbed n more detal n Table. Each row of the table corresponds to the delay of a cell. The delay has three components: an effort delay F based on the sze of the load, a parastc delay P based on the cell tself, and wre delay based on the length of the horzontal wres between cells (measured n columns traversed). For example, the rpple carry adder path begns wth nputs comng from a prevous unt; these nputs see loadng from the btwse PG cells (LE bt ) but ther parastc delay s not part of the adder delay. Then the P sgnal s computed and drves the upper propagate nput of a gray cell. The generate output of ths cell n turn drves the lower generate nput of the next cell and as well as the assocated sum XOR. Ths repeats N- tmes. Note that the fnal gray cell must drve both the S XOR and the C out gray cell, so the load s the same as on the other gray cells. Fnally, the sum XOR contrbutes a parastc delay. The effort delay drvng the next unt s not counted because an effort delay was already allocated on the prmary nputs. Several smplfyng assumptons have been made: All nputs arrve at the same tme wth equal drve. Only horzontal wres are counted n the wre load. Vertcal wres are assumed to be short enough to neglect (or lump nto the parastc gate delay). The A B term used to compute the fnal sum s not explctly shown and may use buffered versons of the nputs to contrbute neglgble loadng. Wres are assumed to be short enough that only capactance must be consdered, not wre RC delay. Ths assumpton s supported by [].
Note that n the Brent-Kung and an-carlson archtectures there s never more than one black or gray cell per par of bts n any gven row. If ppelnng s not requred, the adder may be condensed to half the wdth, shortenng the lateral wres as ndcated n the table. Fg Adder archtectures (c) Brent-Kung : : : : : : : : : : : : : : (a) Rpple Carry : : : : ::: : : : : : : : : : : : : : (d) Sklansky : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : (e) Kogge-Stone : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : (b) Carry Increment ::: : : : : : : : : : : : : : (f) an-carlson : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :::::: : : : : : : : : : : In most of the adder archtectures, the stage effort s farly constant throughout the adder f wre capactance s neglected. We wll see that ths means unform gate szes may be used throughout wth very lttle loss n performance. In the Sklansky graph, the fanout ncrease exponentally along the crtcal path. Ths leads to very poor performance unless cells have greater drve. One means to provde greater drve s to use larger gates n specfc locatons, but ths ncreases the number of cells to desgn and verfy and leads to rregular layout. When trans must clmb a steep grade wth a heavy load, multple locomotves are lnked together. The extra locomotves are called helpers. In the Sklansky graph, multple cells may be lnked together to provde more current to drve the large fanouts and long wres. Four such adders wth helpers are shown n Fg. Each s based on the Sklansky archtecture. They dffer n the number of columns requred and the space avalable for buffers n ppelned adders. : : : : : : : : : : : : : : : : (g) Knowles [,,,] : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : (h) Ladner-Fscher : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : :
Rpple Carry Carry Increment Brent-Kung Ladner-Fscher Sklansky Kogge-Stone an- Carlson Knowles [,,,] (a) elper a Table Adder crtcal paths F P wre repeats notes LEbt n/a n/a nput -> bt LEgraypu PDbt bt -> gray pu LEgraygl+LExor PDgray k = N- gray -> gray gl + xor LEbt n/a n/a nput -> bt LEgraygl+LExor PDbt bt -> gray gl and xor k LEgraygl+LExor PDgray k k =.. ~ N gray -> many gray gl and xor LExor PDgray gray -> xor LEbt n/a n/a nput -> bt LEgraypu PDbt bt -> gray pu LEgraygl+LEbuf PDgray k- k = M - gray -> gray gl + buf LEgraygl+LEbuf PDbuf M- buf -> gray gl + buf LEgraygl+LEbuf PDgray M-k- k = M - gray -> gray gl + buf LExor PDgray gray -> xor LEbt n/a n/a nput -> bt LEgraypu PDbt bt -> gray pu LEgraygl+LEbuf PDgray k- k = M - gray -> gray gl + buf LEgraygl+LEbuf PDgray M- gray -> gray gl + buf LEgraygl+LEbuf PDgray M-k- k = M - gray -> gray gl + buf LExor PDgray gray -> xor LEbt n/a n/a nput -> bt LEgraypu PDbt bt -> gray pu k LEgraygl+LEbuf PDgray k k = M - gray -> many gray gl + buf (-> k ) LExor + LEgraygl PDgray gray -> xor + cout LEbt n/a n/a nput -> bt N/ LEblackpl + LEblackpu PDbt bt N/ -> black pu and pl LEblackpl + LEblackpu PDblackp k k = M - black p -> black pu and pl LEgraypu PDblackp M- black p -> gray pu LExor PDgray gray -> xor LEbt n/a n/a nput -> bt N/ LEblackpl + LEblackpu PDbt bt N/ -> black pu and pl LEblackpl + LEblackpu PDblackp k- k = M - black p -> black pu and pl LEgraypu PDblackp M- black p -> gray pu LExor PDgray gray -> xor LEbt n/a n/a nput -> bt N/ LEblackpl + LEblackpu PDbt bt N/ -> black pu and pl LEblackpl + LEblackpu PDblackp k k = M - black p -> black pu and pl LEblackpl + LEgraypu PDblackp M- black p -> gray pu and black pl LEgraygl+ LEbuf PDgray M- gray -> gray gl and buf LExor PDgray gray -> xor Fg elper adders :::::: : : : : : : : : : : (b) elper b :::::: : : : : : : : : : : (c) elper. :: :: :: : : : : : : : : : : (d) elper : : : : : : : : : : : : : : : : IV. LOGICAL EFFORT DELAY MODEL The method of Logcal Effort provdes a smple method for determnng a lower bound on crtcal path delay n crcuts wth neglgble wre capactance. If the path has M stages, a path effort of F, and a parastc delay of PD, the delay (n ) acheved wth best transstor szes s /M D DF PD MF PD = + = + () where D s measured n unts of, the delay of an deal nverter wth no parastc capactance drvng an dentcal nverter. Delay s often normalzed to that of a fanout-of- nverter wth the converson FO. In a nm process, FO ps. To llustrate the delay model, consder an N=-bt rpple carry adder. Accordng to the data from the prevous sectons the least delay s gven below. Note that the nvertng desgn s faster because the extra nverters n the nonnvertng CMOS verson. Invertng CMOS: M = N+ = F = (LE bt )(LE graypu )(LE graygl + LE xor ) = (/)(/)(/ + /) = D F = () / =. PD = PD bt + PD gray + PD xor = (/) + (./) + (/ + /) =. D =. +. =. =. FO Nonnvertng CMOS: M = (N+) = F = (LE bt )(LE graypu )(LE graygl + LE xor ) = (/)(/)(/ + /) = D F = () / =. PD = PD bt + PD gray + PD xor = (/ +) + (/ + ) + (/ + /) = D =. + =. =. FO In general, achevng least delay requres usng dfferent transstor szes n each gate (although ths delay model has assumed that all transstors n a branch scale unformly). A regular layout wth consstent transstor szes n each type of cell s easer to buld but may sacrfce performance. Consder desgnng all cells to have an arbtrary unt drve (.e. output conductance). Defne an nverter wth unt drve to have unt nput capactance. For crcuts wth a sngle stage per cell (e.g. nvertng statc CMOS), the path effort delay s smply the sum of the effort delays of each stage: D F M = f () = The total delay s stll the sum of the path effort and parastc delays. In a -bt rpple carry adder bult from nvertng statc CMOS gates the delay s
Invertng CMOS: D F = LE bt + LE graypu + (LE graygl + LE xor ) = / + / + (/ + /) = D = +. =. =. FO In a crcut wth two stages per cell (e.g. nonnvertng statc CMOS or domno), let us desgn the frst stage to have unt drve. Choose the sze of the second stage for least delay. If the path has C = M/ cells and the effort of the th cell s F, the path effort delay s D F C = F () = In a -bt rpple carry adder bult from nonnvertng statc CMOS gates the delay s Invertng CMOS: D = LE + LE + LE + LE F bt graypu graygl xor = /+ /+ /+ / =.τ D =. + =. =. FO These delays are only slghtly slower than deal, justfyng the use of a regular layout. The two-stage cell delay estmate s optmstc because n a regular desgn the second stage sze wll be fxed for each cell. owever, the results from the snglestage cell estmate suggest the penalty s not large. orzontal wres add capactance to the load of each stage. Let the wre capactance be w unts per column spanned. w depends on the wdth of each column, the wdth and spacng between wres, and the sze of a unt transstor; n a tral layout n a nm process, w.. Whle there s no closed-form soluton for the mnmum-delay problem wth wre capactance, the delay assumng fxed cell szes s readly calculated by addng the wre capactance to the stage effort f or F n EQ () or (). V. RESULTS The adder delays were evaluated usng a MATLAB scrpt. Table lsts delay (n FO nverter delays) for varous adder archtectures and wdths assumng no wre capactance and nvertng statc CMOS cells. It compares the delay acheved usng best transstor szes wth the delay usng unform cell szes. Observe that the penalty for unform cell szes s small n all cases except carry ncrement and Sklansky (where the fanouts vary wldly from one stage to another). Ths justfes usng unform cell szes for most adders and for employng helpers on the Sklansky archtecture to drve the hgh fanouts. The remanng results are based on unform cell szes. Table evaluates the effect of adder sze by lstng the delay of nvertng statc CMOS and footed domno adders assumng wrng capactance w=.. Table evaluates the mpact of effect of crcut famly, agan assumng w=.. Table evaluates the mpact of wre capactance on nvertng statc CMOS adders. The Kogge-Stone, an-carlson, and Knowles adders requre a large number of parallel wrng tracks for wde adders. Ths generally entals packng the wres close together, ncreasng the couplng capactance on each wre. uang and Ercegovac [] found ths nearly doubles the wre capactance; therefore these archtectures may be evaluated usng the w=. column of Table compared aganst the w=. column for adders wth fewer wres. The crtcal paths of most archtectures (excludng Kogge- Stone, an-carlson, and Knowles) pass through a seres of gray cell lower generate nputs. These adders may be sped up wth asymmetrc gray cells that reduce the logcal effort LE graygl at the expense of the other nputs []. Ths provdes on average % speedup on the footed domno crcuts, but almost none on the statc CMOS crcuts where noncrtcal transstors must be enlarged to preserve unt drve and thus ncrease parastc delay. VI. CONCLUSIONS The logcal effort model facltates rapd comparson of a wde varety of adder archtectures usng multple crcut famles whle accountng for the costs of fanout and nterconnect. The Sklansky archtecture s slowed by ts hgh fanout along the crtcal path. Ths may be addressed at the expense of regularty by usng larger gates along the path. The helper archtectures proposed n ths paper gang together multple cells to drve the hgh fanout nodes whle mantanng regularty. Regular desgns wth unt drve work well n archtectures wth relatvely constant stage efforts,.e. all except Sklansky and carry ncrement. In the absence of wrng capactance, the Kogge-Stone adder s fastest because of ts low number of stages and low fanout. When nterconnect s consdered, the an-carlson and helper adders become most attractve. an-carlson requres only half the number of columns, whle helper adders are slghtly faster at drvng the long wres, especally when couplng capactance s consdered. Fast statc CMOS adders have a delay of about,,., and FO for,,, and -bt wdths, respectvely. Most adders have a relatvely low stage effort so the footed domno desgns are only about % faster than the nvertng statc CMOS archtectures because the hgh drve capablty of domno s not fully exploted. Ths supports the use of hgher-valency [] domno desgns. Asymmetrc domno gates acheve another % speedup. Invertng statc CMOS gates are also slghtly faster than ther nonnvertng counterparts except where hgh fanout capablty s needed; however, the dfference s much smaller than a method of countng logc levels would predct. The delays estmated from logcal effort are n good agreement wth the SPICE results of [], [], and []. owever, the best -bt footless domno adder delays of -
FO are stll dstnctly longer than the FO delays acheved by the Naffzger domno Lng adder []. The dfferences may be attrbuted to the fact that velocty saturaton makes tall domno gates slghtly faster than smple logcal effort models predct, the use of valency- cells and asymmetrc gates based to favor the crtcal path, and the logc level saved wth the Lng algorthm. The fracton of the delay attrbuted to wres s mportant but sgnfcantly less than n [] because ths study assumed layouts wth larger nput transstors and a narrower column ptch to reduce the mpact of wre capactance. REFERENCES A. Beaumont-Smth and C. Lm, Parallel prefx adder desgn, Proc. th IEEE Symp. Comp. Arth, pp. -, June. O. Bedrj, Carry-select adder, IRE Trans. Electronc Computers, vol. EC-, June, pp. -. R. Brent and. Kung, A regular layout for parallel adders, IEEE Trans. Computers, vol. C-, no., pp. -, March. N. Burgess, Accelerated carry-skp adders wth low hardware cost, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, vol., pp. -,.. Dao and V. Oklobdzja, Applcaton of logcal effort on delay analyss of -bt statc carry-lookahead adder, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, vol., pp. -,.. Dao and V. Oklobdzja, Applcaton of logcal effort technques for speed optmzaton and analyss of representatve adders, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, vol., pp. -,. T. an and D. Carlson, Fast area-effcent VLSI adders, Proc. th Symp. Comp. Arth., pp. -, Sept.. Z. uang and M. Ercegovac, Effect of wre delay on the desgn of prefx adders n deep submcron technology, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, vol., pp. -,. S. Knowles, A famly of adders, Proc th IEEE Symp. Comp. Arch., reprnted wth correctons n Proc. th IEEE Symp. Comp. Arth., pp. -, June. P. Kogge and. Stone, A parallel algorthm for the effcent soluton of a general class of recurrence relatons, IEEE Trans. Computers, vol. C-, no., pp. -, Aug.. R. Ladner and M. Fscher, Parallel prefx computaton, J. ACM, vol., no., pp. -, Oct.. M. Lehman and N. Burla, Skp technques for hgh-speed carry propagaton n bnary arthmetc unts, IRE Trans. Electron Computers, EC-, Dec., pp. -. S. Naffzger, A subnanosecond. m b adder desgn, Intl. Sold-state Crcuts Conf.,, pp. -. J. Sklansky, Condtonal-sum addton logc, IRE Trans. Electronc Computng, vol. EC-, June, pp. -. I. Sutherland, R. Sproull, and D. arrs, Logcal Effort, San Francsco: Morgan Kaufmann Publshers,. A. Tyag, A reduced-area scheme for carry-select adders, IEEE Trans. Computers, vol., no., pp. -, Oct.. N. Weste and D. arrs, CMOS VLSI Desgn, Addson-Wesley,. R. Zmmermann, Non-heurstc optmzaton and synthess of parallel-prefx adders, Proc. Intl. Workshop on Logc and Archtecture Synthess, pp. -, Grenoble, France, Dec.. Table Adder delays: w=; nvertng statc CMOS Mnmum Delay Unform Cell Sze Delay N = N = N = N = N = N = N = N = Rpple........ Increment........ B-K........ L-F........ Sklansky........ K-S........ -C........ Knowles........ elper a........ elper b........ elper......... elper........ Table Adder delays: w=.; unform cell sze Invertng Statc CMOS Footed Domno N = N = N = N = N = N = N = N = Rpple........ Increment........ B-K........ L-F........ Sklansky........ K-S........ -C........ Knowles........ elper a........ elper b........ elper......... elper........ Table Adder delays: w=.; unform cell sze () Invertng CMOS, () Nonnvertng CMOS, () Footed Domno, () Footless Domno N= N= () () () () () () () () Rpple........ Increment........ B-K........ L-F........ Sklansky........ K-S........ -C........ Knowles........ elper a........ elper b........ elper......... elper........ Table Adder delays: nvertng statc CMOS; unform cell sze N= N = w = / w = / w = / w = w = / w = / w = / w = Rpple........ Increment........ B-K........ L-F........ Sklansky........ K-S........ -C........ Knowles........ elper a........ elper b........ elper......... elper........