Logical Effort of Carry Propagate Adders

Similar documents
A Taxonomy of Parallel Prefix Networks

High Speed ADC Sampling Transients

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

High Speed, Low Power And Area Efficient Carry-Select Adder

A High-Speed Multiplication Algorithm Using Modified Partial Product Reduction Tree

Calculation of the received voltage due to the radiation from multiple co-frequency sources

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

MASTER TIMING AND TOF MODULE-

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

Implementation Complexity of Bit Permutation Instructions

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

Learning Ensembles of Convolutional Neural Networks

COMPARISON OF VARIOUS RIPPLE CARRY ADDERS: A REVIEW

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

Uncertainty in measurements of power and energy on power networks

MTBF PREDICTION REPORT

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

29. Network Functions for Circuits Containing Op Amps

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

A MODIFIED DIFFERENTIAL EVOLUTION ALGORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS

Introduction to Coalescent Models. Biostatistics 666 Lecture 4

HIGH PERFORMANCE ADDER USING VARIABLE THRESHOLD MOSFET IN 45NM TECHNOLOGY

Introduction to Coalescent Models. Biostatistics 666

PERFORMANCE EVALUATION OF BOOTH AND WALLACE MULTIPLIER USING FIR FILTER. Chirala Engineering College, Chirala.

Graph Method for Solving Switched Capacitors Circuits

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

Triferential Subtraction in Strain Gage Signal Conditioning. Introduction

antenna antenna (4.139)

Dynamic Power Consumption in Virtex -II FPGA Family

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

A Novel Soft-Switching Two-Switch Flyback Converter with a Wide Operating Range and Regenerative Clamping

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip-Flops

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

RC Filters TEP Related Topics Principle Equipment

Digital Transmission

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

ECE315 / ECE515 Lecture 5 Date:

Understanding the Spike Algorithm

Figure 1. DC-DC Boost Converter

Mismatch-tolerant Capacitor Array Structure for Junction-splitting SAR Analog-to-digital Conversion

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

Unit 1. Current and Voltage U 1 VOLTAGE AND CURRENT. Circuit Basics KVL, KCL, Ohm's Law LED Outputs Buttons/Switch Inputs. Current / Voltage Analogy

DIMENSIONAL SYNTHESIS FOR WIDE-BAND BAND- PASS FILTERS WITH QUARTER-WAVELENGTH RES- ONATORS

A Simple Yet Efficient Accuracy Configurable Adder Design

Figure 1. DC-DC Boost Converter

Power-Constrained Test Scheduling for Multi-Clock Domain SoCs

@IJMTER-2015, All rights Reserved 383

THE GENERATION OF 400 MW RF PULSES AT X-BAND USING RESONANT DELAY LINES *

Vectorless Analysis of Supply Noise Induced Delay Variation

Methods for True Power Minimization

SRAM Leakage Suppression by Minimizing Standby Supply Voltage

CMOS Implementation of Lossy Integrator using Current Mirrors Rishu Jain 1, Manveen Singh Chadha 2 1, 2

Chaotic Filter Bank for Computer Cryptography

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Sensors for Motion and Position Measurement

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

Opportunistic Beamforming for Finite Horizon Multicast

Review: Our Approach 2. CSC310 Information Theory

Performance Testing of the Rockwell PLGR+ 96 P/Y Code GPS receiver

Sizing and Placement of Charge Recycling Transistors in MTCMOS Circuits

Guidelines for CCPR and RMO Bilateral Key Comparisons CCPR Working Group on Key Comparison CCPR-G5 October 10 th, 2014

Parallel Prefix Han-Carlson Adder

INSTANTANEOUS TORQUE CONTROL OF MICROSTEPPING BIPOLAR PWM DRIVE OF TWO-PHASE STEPPING MOTOR

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

problems palette of David Rock and Mary K. Porter 6. A local musician comes to your school to give a performance

A Simple Satellite Exclusion Algorithm for Advanced RAIM

Microelectronic Circuits

A Proposal of Mode Shape Estimation Method Using Pseudo-Modal Response : Applied to Steel Bridge in Building

Control of Chaos in Positive Output Luo Converter by means of Time Delay Feedback

Performance Comparison of VLSI Adders Using Logical Effort 1

aperture David Makovoz, 30/01/2006 Version 1.0 Table of Contents

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

FAST ELECTRON IRRADIATION EFFECTS ON MOS TRANSISTOR MICROSCOPIC PARAMETERS EXPERIMENTAL DATA AND THEORETICAL MODELS

Chapter 13. Filters Introduction Ideal Filter

Design of Practical FIR Filter Using Modified Radix-4 Booth Algorithm

MODEL ORDER REDUCTION AND CONTROLLER DESIGN OF DISCRETE SYSTEM EMPLOYING REAL CODED GENETIC ALGORITHM J. S. Yadav, N. P. Patidar, J.

Optimization Frequency Design of Eddy Current Testing

Power Factor Correction with AC-DC Buck Converter

1 GSW Multipath Channel Models

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

Switched-Capacitor Filter Optimization with Respect to Switch On-State Resistance and Features of Real Operational Amplifiers

Multiple Error Correction Using Reduced Precision Redundancy Technique

Subarray adaptive beamforming for reducing the impact of flow noise on sonar performance

Section 5. Signal Conditioning and Data Analysis

Application of Intelligent Voltage Control System to Korean Power Systems

High Gain Soft-switching Bidirectional DC-DC Converters for Eco-friendly Vehicles

Block-wise Extraction of Rent s Exponents for an Extensible Processor

EE 201 Lab Lab 9. AC analysis. This week we look at some (relatively) simple AC circuits.

AFV-P 2U/4U. AC + DC Power Solutions. series. Transient Generation for Disturbance Tests. only. High Performance Programmable AC Power Source

THE ARCHITECTURE OF THE BROADBAND AMPLIFIERS WITHOUT CLASSICAL STAGES WITH A COMMON BASE AND A COMMON EMITTER

Adaptive System Control with PID Neural Networks

A method to reduce DC-link voltage fluctuation of PMSM drive system with reduced DC-link capacitor

Soft-Switched CCM Boost Converter with High Voltage Gain for High Power Applications

4.3- Modeling the Diode Forward Characteristic

Transcription:

Logcal Effort of Carry Propagate Adders Davd arrs and Ivan Sutherland arvey Mudd College / Sun Mcrosystems Laboratores E. Twelfth St. Claremont, CA Davd_arrs@hmc.edu / Ivan.Sutherland@sun.com Abstract - A wde assortment of carry propagate adders offer varyng area-delay tradeoffs. Wrng and choce of crcut famly also affect the sze and performance. Ths paper uses the method of Logcal Effort to characterze the effects of archtecture, crcut famly, and wre capactance on adder delay. Domno logc offers about a % speedup on most valency- adders. Although Kogge- Stone adders are fastest n the absence of wre, other archtectures such as varants on the Sklansky adder offer regular layouts and better delay n the presence of wrng capactance. I. INTRODUCTION Fast adders are wdely used n CMOS crcut desgn. The lterature descrbes many adders ncludng rpple carry, carry lookahead, carry select [], carry skp [], carry ncrement [, ], Sklansky (condtonal sum) [], Brent-Kung [], Kogge- Stone [], Ladner-Fscher [], an-carlson [], and Knowles []. Each archtecture offers dfferent tradeoffs between delay, area, and wrng complexty. Analytcal delay models help desgners evaluate these tradeoffs, but smply countng logc levels s nadequate because crcut delay also depends on fanout and wre capactance. uang and Ercegovac [] used an RC delay model to evaluate the effect of archtecture and wrng capactance on the Sklansky, Kogge-Stone, and Knowles adder archtectures. The method of Logcal Effort [] bulds on the RC delay model to offer a convenent shorthand for understandng the effects of fanout and gate szng on delay. Dao and Oklobdzja [, ] appled ths method to a few adders and concluded that logcal effort predcted absolute delays wthn -% of SPICE. Ths paper apples logcal effort to understand the delay of eght dfferent adder archtectures that can be expressed as prefx computatons accordng to the notaton of []. The results show how adder delay depends on the number of nputs, the adder archtectures, the cost of nterconnect, and the crcut style. The model shows that most adder archtectures can use unform gate szes to acheve regular layout wth neglgble performance loss. An excepton s the Sklansky archtecture that has hghly rregular fanouts. Ths leads to a proposal for helper gates to construct very fast adders wth regular layouts and low wrng cost. II. LOGICAL EFFORT OF CIRCUIT BUILDING BLOCKS The three basc buldng blocks for an adder are the btwse Propagate/Generate (PG) cells, the group PG cells, and the sum XORs. gh performance datapath adders often buld these cells from domno gates whle statc CMOS s preferable when desgn smplcty and power consumpton take precedence over utmost performance. Statc CMOS btwse gates wll compute generate as G = A B and propagate as P = A + B. The sum s computed as S = ( A B) G :. Domno desgns requre monotonc nputs to the sum XOR. Ths s best done by calculatng btwse and group kll sgnals (K) and usng XOR for propagate so that P, G, and K are -of- hot. Defne the group PG cell nput comng from bts :k as the upper nput and that from k-:j as the lower nput. There are two types of group PG cells. Followng the notaton of [], we call the cells black cells and gray cells. Black cells compute both G :j and P :j as defned n EQ (). Gray cells compute only G :j. Black cells are requred when the cell output drves the upper nput of another group PG cell. The smpler gray cell may be used when the output drves only lower nputs or sum logc. Consder four crcut styles: nonnvertng statc CMOS, nvertng statc CMOS, footless domno, and footed domno. Fg shows the basc cell desgns. Invertng statc CMOS gates consst of a sngle stage of logc for each cell (except that the fnal XOR requres an nput nverter). Alternatng stages use alternatng polartes of nputs and outputs. Black cells contan both the group G and P gates whle gray cells have only the G gate. Nonnvertng statc CMOS gates add an output nverter to the btwse and group statc gates. Therefore, only the AND- OR and AND functons are requred for group G and P, respectvely. Footless domno gates computng -of- hot P, G, and K sgnals are shown n the second column. Each conssts of a dynamc gate followed by an I-skew nverter. Keepers and secondary precharge transstors are not shown. The group logc s shown for a black cell; a gray cell omts the P output. In the domno desgn, K : = G : so monotonc true and complementary versons of the carry sgnals are avalable at each fnal XOR. Footed domno gates are dentcal except for an extra seres clocked evaluaton transstor and greater transstor wdths to compensate. Transstors are annotated wth wdths measured n arbtrary unts so that each pulldown stack has unt effectve resstance. Table lsts the logcal effort and parastc delay of each cell nput for each crcut famly. The logcal effort LE s the rato of the nput capactance of the gate nput to the nput capactance ( unts) of an nverter wth the same unt effectve resstance. The parastc delay PD s estmated by countng the total transstor wdth on the output node, ---//$. IEEE

Fg Adder crcut buldng blocks III. ADDER ARCITECTURES Btwse A Invertng Statc CMOS B A G B B A A B P A _h Footless Domno A _l B _h A _h B _l A _l tny G P K P ' Adders are dstngushed by the arrangement of cells n the group PG logc. Fg shows eght such archtectures for N=. The upper box contans the btwse PG logc and the Table Logcal effort and parastc delay of adder crcut blocks Group Sum XOR G k-:j P :k G :k G :k G k-:j P :k G :j P :k P k-:j P :k P k-:j P :j G k-:j P k-:j P P G :k :k :k G:j P :j G P :k P k-:j :k G k-:j P :k P P G -: G -: G -: G -: G -: G -: P P S G :k G k-:j Pk-:j P :k P ' P G -: K k-:j K :k P P ' K -: G :j P :j K :j S _h S _l Cell Term Nonnvertng Invertng CMOS Footed Domno Footless Domno CMOS Btwse LEbt / / / * / / * / PDbt / + / / + / / + / Black Cell LEblackgu /. /./ * / / * / LEblackgl / / / * / / * / LEblackpu /. / / * / / * / LEblackpl /. / / * / / * / PDblackg / +. / / + / / + / PDblackp / + / / + / / + / Gray Cell LEgraygu /. /./ * / / * / LEgraygl / / / * / / * / LEgraypu / / / * / / * / PDgray / +./ / + / / + / Buffer LEbuf * / * / / * / * / / * / * / Sum XOR LExor / / / * / / * / PDxor / + / / + / / + / / + / assumng dffuson and gate capactance are approxmately equal. In domno and nonnvertng statc CMOS crcuts, the output nverter also contrbutes parastc delay. LE and PD are used n place of the usual symbols g and p to avod confuson wth generate and propagate. Notce that the black cell has four nputs: G :k, G k-:j, P :k, and P k-:j. These are denoted as the upper and lower generate and propagate sgnals, gu, gl, pu, and pl, respectvely, and each has a dfferent logcal effort. For nvertng statc CMOS crcuts, the logcal effort and parastc delay are the average of the two polartes. Some paths through the statc XOR gate nvolve only a sngle AOI stage whle others also nvolve the nverter. A conservatve estmate calculates the logcal effort for the sngle stage path based on the unts of nput capactance on the G nput. The parastc delay s largest for the two-stage path, consstng of / for the nverter to drve ts own dffuson parastcs and gate capactance of the second stage plus / for the dffuson parastcs on the second stage. In certan cases, buffers reduce the capactance presented by noncrtcal forks of the crcut. Assume these buffers have half the drve (twce the resstance) of an ordnary gate and hence half the nput capactance. For the purpose of branchng, the buffers therefore contrbute only half the capactance of a gate wth comparable logcal effort. If all cell szes are chosen to provde unt drve as wll be done n Secton, ths gves the correct delay through the path. If some cell szes are selected for mnmum delay, the logcal efforts should be the geometrc means of the efforts of the two polartes. In ths case, the average and geometrc mean are nearly dentcal, so the dstncton s unmportant. lower box contans the sum logc. In the mddle, the prefx tree s bult from black cells, gray cells, and whte buffers. The vertcal axs ndcates logc level and the crtcal path s ndcated wth a heavy lne. For example, the rpple carry adder n Fg a s slow for long addtons because the crtcal path propagates through N- gray cells. The crtcal path of each adder s descrbed n more detal n Table. Each row of the table corresponds to the delay of a cell. The delay has three components: an effort delay F based on the sze of the load, a parastc delay P based on the cell tself, and wre delay based on the length of the horzontal wres between cells (measured n columns traversed). For example, the rpple carry adder path begns wth nputs comng from a prevous unt; these nputs see loadng from the btwse PG cells (LE bt ) but ther parastc delay s not part of the adder delay. Then the P sgnal s computed and drves the upper propagate nput of a gray cell. The generate output of ths cell n turn drves the lower generate nput of the next cell and as well as the assocated sum XOR. Ths repeats N- tmes. Note that the fnal gray cell must drve both the S XOR and the C out gray cell, so the load s the same as on the other gray cells. Fnally, the sum XOR contrbutes a parastc delay. The effort delay drvng the next unt s not counted because an effort delay was already allocated on the prmary nputs. Several smplfyng assumptons have been made: All nputs arrve at the same tme wth equal drve. Only horzontal wres are counted n the wre load. Vertcal wres are assumed to be short enough to neglect (or lump nto the parastc gate delay). The A B term used to compute the fnal sum s not explctly shown and may use buffered versons of the nputs to contrbute neglgble loadng. Wres are assumed to be short enough that only capactance must be consdered, not wre RC delay. Ths assumpton s supported by [].

Note that n the Brent-Kung and an-carlson archtectures there s never more than one black or gray cell per par of bts n any gven row. If ppelnng s not requred, the adder may be condensed to half the wdth, shortenng the lateral wres as ndcated n the table. Fg Adder archtectures (c) Brent-Kung : : : : : : : : : : : : : : (a) Rpple Carry : : : : ::: : : : : : : : : : : : : : (d) Sklansky : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : (e) Kogge-Stone : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : (b) Carry Increment ::: : : : : : : : : : : : : : (f) an-carlson : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :::::: : : : : : : : : : : In most of the adder archtectures, the stage effort s farly constant throughout the adder f wre capactance s neglected. We wll see that ths means unform gate szes may be used throughout wth very lttle loss n performance. In the Sklansky graph, the fanout ncrease exponentally along the crtcal path. Ths leads to very poor performance unless cells have greater drve. One means to provde greater drve s to use larger gates n specfc locatons, but ths ncreases the number of cells to desgn and verfy and leads to rregular layout. When trans must clmb a steep grade wth a heavy load, multple locomotves are lnked together. The extra locomotves are called helpers. In the Sklansky graph, multple cells may be lnked together to provde more current to drve the large fanouts and long wres. Four such adders wth helpers are shown n Fg. Each s based on the Sklansky archtecture. They dffer n the number of columns requred and the space avalable for buffers n ppelned adders. : : : : : : : : : : : : : : : : (g) Knowles [,,,] : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : (h) Ladner-Fscher : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : :

Rpple Carry Carry Increment Brent-Kung Ladner-Fscher Sklansky Kogge-Stone an- Carlson Knowles [,,,] (a) elper a Table Adder crtcal paths F P wre repeats notes LEbt n/a n/a nput -> bt LEgraypu PDbt bt -> gray pu LEgraygl+LExor PDgray k = N- gray -> gray gl + xor LEbt n/a n/a nput -> bt LEgraygl+LExor PDbt bt -> gray gl and xor k LEgraygl+LExor PDgray k k =.. ~ N gray -> many gray gl and xor LExor PDgray gray -> xor LEbt n/a n/a nput -> bt LEgraypu PDbt bt -> gray pu LEgraygl+LEbuf PDgray k- k = M - gray -> gray gl + buf LEgraygl+LEbuf PDbuf M- buf -> gray gl + buf LEgraygl+LEbuf PDgray M-k- k = M - gray -> gray gl + buf LExor PDgray gray -> xor LEbt n/a n/a nput -> bt LEgraypu PDbt bt -> gray pu LEgraygl+LEbuf PDgray k- k = M - gray -> gray gl + buf LEgraygl+LEbuf PDgray M- gray -> gray gl + buf LEgraygl+LEbuf PDgray M-k- k = M - gray -> gray gl + buf LExor PDgray gray -> xor LEbt n/a n/a nput -> bt LEgraypu PDbt bt -> gray pu k LEgraygl+LEbuf PDgray k k = M - gray -> many gray gl + buf (-> k ) LExor + LEgraygl PDgray gray -> xor + cout LEbt n/a n/a nput -> bt N/ LEblackpl + LEblackpu PDbt bt N/ -> black pu and pl LEblackpl + LEblackpu PDblackp k k = M - black p -> black pu and pl LEgraypu PDblackp M- black p -> gray pu LExor PDgray gray -> xor LEbt n/a n/a nput -> bt N/ LEblackpl + LEblackpu PDbt bt N/ -> black pu and pl LEblackpl + LEblackpu PDblackp k- k = M - black p -> black pu and pl LEgraypu PDblackp M- black p -> gray pu LExor PDgray gray -> xor LEbt n/a n/a nput -> bt N/ LEblackpl + LEblackpu PDbt bt N/ -> black pu and pl LEblackpl + LEblackpu PDblackp k k = M - black p -> black pu and pl LEblackpl + LEgraypu PDblackp M- black p -> gray pu and black pl LEgraygl+ LEbuf PDgray M- gray -> gray gl and buf LExor PDgray gray -> xor Fg elper adders :::::: : : : : : : : : : : (b) elper b :::::: : : : : : : : : : : (c) elper. :: :: :: : : : : : : : : : : (d) elper : : : : : : : : : : : : : : : : IV. LOGICAL EFFORT DELAY MODEL The method of Logcal Effort provdes a smple method for determnng a lower bound on crtcal path delay n crcuts wth neglgble wre capactance. If the path has M stages, a path effort of F, and a parastc delay of PD, the delay (n ) acheved wth best transstor szes s /M D DF PD MF PD = + = + () where D s measured n unts of, the delay of an deal nverter wth no parastc capactance drvng an dentcal nverter. Delay s often normalzed to that of a fanout-of- nverter wth the converson FO. In a nm process, FO ps. To llustrate the delay model, consder an N=-bt rpple carry adder. Accordng to the data from the prevous sectons the least delay s gven below. Note that the nvertng desgn s faster because the extra nverters n the nonnvertng CMOS verson. Invertng CMOS: M = N+ = F = (LE bt )(LE graypu )(LE graygl + LE xor ) = (/)(/)(/ + /) = D F = () / =. PD = PD bt + PD gray + PD xor = (/) + (./) + (/ + /) =. D =. +. =. =. FO Nonnvertng CMOS: M = (N+) = F = (LE bt )(LE graypu )(LE graygl + LE xor ) = (/)(/)(/ + /) = D F = () / =. PD = PD bt + PD gray + PD xor = (/ +) + (/ + ) + (/ + /) = D =. + =. =. FO In general, achevng least delay requres usng dfferent transstor szes n each gate (although ths delay model has assumed that all transstors n a branch scale unformly). A regular layout wth consstent transstor szes n each type of cell s easer to buld but may sacrfce performance. Consder desgnng all cells to have an arbtrary unt drve (.e. output conductance). Defne an nverter wth unt drve to have unt nput capactance. For crcuts wth a sngle stage per cell (e.g. nvertng statc CMOS), the path effort delay s smply the sum of the effort delays of each stage: D F M = f () = The total delay s stll the sum of the path effort and parastc delays. In a -bt rpple carry adder bult from nvertng statc CMOS gates the delay s

Invertng CMOS: D F = LE bt + LE graypu + (LE graygl + LE xor ) = / + / + (/ + /) = D = +. =. =. FO In a crcut wth two stages per cell (e.g. nonnvertng statc CMOS or domno), let us desgn the frst stage to have unt drve. Choose the sze of the second stage for least delay. If the path has C = M/ cells and the effort of the th cell s F, the path effort delay s D F C = F () = In a -bt rpple carry adder bult from nonnvertng statc CMOS gates the delay s Invertng CMOS: D = LE + LE + LE + LE F bt graypu graygl xor = /+ /+ /+ / =.τ D =. + =. =. FO These delays are only slghtly slower than deal, justfyng the use of a regular layout. The two-stage cell delay estmate s optmstc because n a regular desgn the second stage sze wll be fxed for each cell. owever, the results from the snglestage cell estmate suggest the penalty s not large. orzontal wres add capactance to the load of each stage. Let the wre capactance be w unts per column spanned. w depends on the wdth of each column, the wdth and spacng between wres, and the sze of a unt transstor; n a tral layout n a nm process, w.. Whle there s no closed-form soluton for the mnmum-delay problem wth wre capactance, the delay assumng fxed cell szes s readly calculated by addng the wre capactance to the stage effort f or F n EQ () or (). V. RESULTS The adder delays were evaluated usng a MATLAB scrpt. Table lsts delay (n FO nverter delays) for varous adder archtectures and wdths assumng no wre capactance and nvertng statc CMOS cells. It compares the delay acheved usng best transstor szes wth the delay usng unform cell szes. Observe that the penalty for unform cell szes s small n all cases except carry ncrement and Sklansky (where the fanouts vary wldly from one stage to another). Ths justfes usng unform cell szes for most adders and for employng helpers on the Sklansky archtecture to drve the hgh fanouts. The remanng results are based on unform cell szes. Table evaluates the effect of adder sze by lstng the delay of nvertng statc CMOS and footed domno adders assumng wrng capactance w=.. Table evaluates the mpact of effect of crcut famly, agan assumng w=.. Table evaluates the mpact of wre capactance on nvertng statc CMOS adders. The Kogge-Stone, an-carlson, and Knowles adders requre a large number of parallel wrng tracks for wde adders. Ths generally entals packng the wres close together, ncreasng the couplng capactance on each wre. uang and Ercegovac [] found ths nearly doubles the wre capactance; therefore these archtectures may be evaluated usng the w=. column of Table compared aganst the w=. column for adders wth fewer wres. The crtcal paths of most archtectures (excludng Kogge- Stone, an-carlson, and Knowles) pass through a seres of gray cell lower generate nputs. These adders may be sped up wth asymmetrc gray cells that reduce the logcal effort LE graygl at the expense of the other nputs []. Ths provdes on average % speedup on the footed domno crcuts, but almost none on the statc CMOS crcuts where noncrtcal transstors must be enlarged to preserve unt drve and thus ncrease parastc delay. VI. CONCLUSIONS The logcal effort model facltates rapd comparson of a wde varety of adder archtectures usng multple crcut famles whle accountng for the costs of fanout and nterconnect. The Sklansky archtecture s slowed by ts hgh fanout along the crtcal path. Ths may be addressed at the expense of regularty by usng larger gates along the path. The helper archtectures proposed n ths paper gang together multple cells to drve the hgh fanout nodes whle mantanng regularty. Regular desgns wth unt drve work well n archtectures wth relatvely constant stage efforts,.e. all except Sklansky and carry ncrement. In the absence of wrng capactance, the Kogge-Stone adder s fastest because of ts low number of stages and low fanout. When nterconnect s consdered, the an-carlson and helper adders become most attractve. an-carlson requres only half the number of columns, whle helper adders are slghtly faster at drvng the long wres, especally when couplng capactance s consdered. Fast statc CMOS adders have a delay of about,,., and FO for,,, and -bt wdths, respectvely. Most adders have a relatvely low stage effort so the footed domno desgns are only about % faster than the nvertng statc CMOS archtectures because the hgh drve capablty of domno s not fully exploted. Ths supports the use of hgher-valency [] domno desgns. Asymmetrc domno gates acheve another % speedup. Invertng statc CMOS gates are also slghtly faster than ther nonnvertng counterparts except where hgh fanout capablty s needed; however, the dfference s much smaller than a method of countng logc levels would predct. The delays estmated from logcal effort are n good agreement wth the SPICE results of [], [], and []. owever, the best -bt footless domno adder delays of -

FO are stll dstnctly longer than the FO delays acheved by the Naffzger domno Lng adder []. The dfferences may be attrbuted to the fact that velocty saturaton makes tall domno gates slghtly faster than smple logcal effort models predct, the use of valency- cells and asymmetrc gates based to favor the crtcal path, and the logc level saved wth the Lng algorthm. The fracton of the delay attrbuted to wres s mportant but sgnfcantly less than n [] because ths study assumed layouts wth larger nput transstors and a narrower column ptch to reduce the mpact of wre capactance. REFERENCES A. Beaumont-Smth and C. Lm, Parallel prefx adder desgn, Proc. th IEEE Symp. Comp. Arth, pp. -, June. O. Bedrj, Carry-select adder, IRE Trans. Electronc Computers, vol. EC-, June, pp. -. R. Brent and. Kung, A regular layout for parallel adders, IEEE Trans. Computers, vol. C-, no., pp. -, March. N. Burgess, Accelerated carry-skp adders wth low hardware cost, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, vol., pp. -,.. Dao and V. Oklobdzja, Applcaton of logcal effort on delay analyss of -bt statc carry-lookahead adder, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, vol., pp. -,.. Dao and V. Oklobdzja, Applcaton of logcal effort technques for speed optmzaton and analyss of representatve adders, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, vol., pp. -,. T. an and D. Carlson, Fast area-effcent VLSI adders, Proc. th Symp. Comp. Arth., pp. -, Sept.. Z. uang and M. Ercegovac, Effect of wre delay on the desgn of prefx adders n deep submcron technology, Proc. th Aslomar Conf. Sgnals, Systems, and Computers, vol., pp. -,. S. Knowles, A famly of adders, Proc th IEEE Symp. Comp. Arch., reprnted wth correctons n Proc. th IEEE Symp. Comp. Arth., pp. -, June. P. Kogge and. Stone, A parallel algorthm for the effcent soluton of a general class of recurrence relatons, IEEE Trans. Computers, vol. C-, no., pp. -, Aug.. R. Ladner and M. Fscher, Parallel prefx computaton, J. ACM, vol., no., pp. -, Oct.. M. Lehman and N. Burla, Skp technques for hgh-speed carry propagaton n bnary arthmetc unts, IRE Trans. Electron Computers, EC-, Dec., pp. -. S. Naffzger, A subnanosecond. m b adder desgn, Intl. Sold-state Crcuts Conf.,, pp. -. J. Sklansky, Condtonal-sum addton logc, IRE Trans. Electronc Computng, vol. EC-, June, pp. -. I. Sutherland, R. Sproull, and D. arrs, Logcal Effort, San Francsco: Morgan Kaufmann Publshers,. A. Tyag, A reduced-area scheme for carry-select adders, IEEE Trans. Computers, vol., no., pp. -, Oct.. N. Weste and D. arrs, CMOS VLSI Desgn, Addson-Wesley,. R. Zmmermann, Non-heurstc optmzaton and synthess of parallel-prefx adders, Proc. Intl. Workshop on Logc and Archtecture Synthess, pp. -, Grenoble, France, Dec.. Table Adder delays: w=; nvertng statc CMOS Mnmum Delay Unform Cell Sze Delay N = N = N = N = N = N = N = N = Rpple........ Increment........ B-K........ L-F........ Sklansky........ K-S........ -C........ Knowles........ elper a........ elper b........ elper......... elper........ Table Adder delays: w=.; unform cell sze Invertng Statc CMOS Footed Domno N = N = N = N = N = N = N = N = Rpple........ Increment........ B-K........ L-F........ Sklansky........ K-S........ -C........ Knowles........ elper a........ elper b........ elper......... elper........ Table Adder delays: w=.; unform cell sze () Invertng CMOS, () Nonnvertng CMOS, () Footed Domno, () Footless Domno N= N= () () () () () () () () Rpple........ Increment........ B-K........ L-F........ Sklansky........ K-S........ -C........ Knowles........ elper a........ elper b........ elper......... elper........ Table Adder delays: nvertng statc CMOS; unform cell sze N= N = w = / w = / w = / w = w = / w = / w = / w = Rpple........ Increment........ B-K........ L-F........ Sklansky........ K-S........ -C........ Knowles........ elper a........ elper b........ elper......... elper........