Performance Comparison of VLSI Adders Using Logical Effort 1

Similar documents
Reduced Swing Domino Techniques for Low Power and High Performance Arithmetic Circuits

Group 10 Group 9 Group 8 Group 7 Group 6 Group 5 Group 4 Group 3 Group 2 Group 1 Group 0 GG5 PG5 GG4 PG4. Block 3 Block 2 Block 1 Block 0

A Taxonomy of Parallel Prefix Networks

Design of High Speed and Low Power Adder by using Prefix Tree Structure

High Performance Low-Power Signed Multiplier

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Technical Paper. Samuel Naffziger. Hewlett-Packard Co., Fort Collins, CO

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

Design & Analysis of Low Power Full Adder

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

Parallel Prefix Han-Carlson Adder

Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

A High Speed Low Power Adder in Multi Output Domino Logic

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

A CASE STUDY OF CARRY SKIP ADDER AND DESIGN OF FEED-FORWARD MECHANISM TO IMPROVE THE SPEED OF CARRY CHAIN

Adder (electronics) - Wikipedia, the free encyclopedia

A Novel Approach For Designing A Low Power Parallel Prefix Adders

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

AREA AND POWER EFFICIENT CARRY SELECT ADDER USING BRENT KUNG ARCHITECTURE

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2

TECHNOLOGY scaling, aided by innovative circuit techniques,

Analysis of Parallel Prefix Adders

Power-Area trade-off for Different CMOS Design Technologies

A Comparison of Power Consumption in Some CMOS Adder Circuits

Topic 6. CMOS Static & Dynamic Logic Gates. Static CMOS Circuit. NMOS Transistors in Series/Parallel Connection

Investigation on Performance of high speed CMOS Full adder Circuits

Design and Implementation of Complex Multiplier Using Compressors

An Efficient Design of Low Power Speculative Han-Carlson Adder Using Concurrent Subtraction

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

ISSN Vol.03, Issue.07, September-2015, Pages:

Research Article Delay Efficient 32-Bit Carry-Skip Adder

Performance Enhancement of Han-Carlson Adder

DESIGN OF HIGH SPEED AND ENERGY EFFICIENT CARRY SKIP ADDER

Leakage Current Analysis

COMPREHENSIVE ANALYSIS OF ENHANCED CARRY-LOOK AHEAD ADDER USING DIFFERENT LOGIC STYLES

Design of 32-bit ALU using Low Power Energy Efficient Full Adder Circuits

Design of 32-bit Carry Select Adder with Reduced Area

A New Parallel Prefix Adder Structure With Efficient Critical Delay Path And Gradded Bits Efficiency In CMOS 90nm Technology

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements

Design and Implementation of High Speed Area Efficient Carry Select Adder Using Spanning Tree Adder Technique

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Design and Implementation of a Power and Area Optimized Reconfigurable Superset Parallel Prefix Adder

An energy efficient full adder cell for low voltage

Low Power Design of Successive Approximation Registers

II. Previous Work. III. New 8T Adder Design

SINGLE CYCLE TREE 64 BIT BINARY COMPARATOR WITH CONSTANT DELAY LOGIC

Modelling Of Adders Using CMOS GDI For Vedic Multipliers

Design and Estimation of delay, power and area for Parallel prefix adders

Design of Low-Power High-Performance 2-4 and 4-16 Mixed-Logic Line Decoders

Implementation and Performance Evaluation of Prefix Adders uing FPGAs

Survey of VLSI Adders

Domino Static Gates Final Design Report

Figure.1. Schematic of 4-bit CLA JCHPS Special Issue 9: June Page 101

High Speed and Energy Efficient Carry Skip Adder Operating Under A Wide Range of Supply Voltages Levels

A Highly Efficient Carry Select Adder

Comparative Analysis of Multiplier in Quaternary logic

MULTI DOMINO DOUBLE MANCHESTER CARRY CHAIN ADDERS FOR HIGH SPEED CIRCUITS

CHAPTER 3 NEW SLEEPY- PASS GATE

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

DESIGN AND ANALYSIS OF LOW POWER 10- TRANSISTOR FULL ADDERS USING NOVEL X-NOR GATES

Design of 8-4 and 9-4 Compressors Forhigh Speed Multiplication

A Novel Low Power, High Speed 14 Transistor CMOS Full Adder Cell with 50% Improvement in Threshold Loss Problem

Design of Efficient Han-Carlson-Adder

Contents. Preface. Abstract. 1 Introduction Overview... 1

Study and Analysis of CMOS Carry Look Ahead Adder with Leakage Power Reduction Approaches

METHODS FOR TRUE ENERGY- PERFORMANCE OPTIMIZATION. Naga Harika Chinta

Pass Transistor and CMOS Logic Configuration based De- Multiplexers

A Novel 128-Bit QCA Adder

A Low-Power 12 Transistor Full Adder Design using 3 Transistor XOR Gates

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

Output Waveform Evaluation of Basic Pass Transistor Structure*

DESIGN OF CARRY SELECT ADDER WITH REDUCED AREA AND POWER

Design and Analysis of CMOS based Low Power Carry Select Full Adder

Design and Analysis of Low-Power 11- Transistor Full Adder

IN ORDER to meet the constant demand for performance

Low Power, Area Efficient FinFET Circuit Design

Investigating Delay-Power Tradeoff in Kogge-Stone Adder in Standby Mode and Active Mode

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Low Power Design for Systems on a Chip. Tutorial Outline

precharge clock precharge Tpchp P i EP i Tpchr T lch Tpp M i P i+1

RECENT technology trends have lead to an increase in

DESIGN OF PARALLEL MULTIPLIERS USING HIGH SPEED ADDER

A Low Power and Area Efficient Full Adder Design Using GDI Multiplexer

A Literature Survey on Low PDP Adder Circuits

Design of Multiplier using Low Power CMOS Technology

High Speed NP-CMOS and Multi-Output Dynamic Full Adder Cells

Implementation of High Performance Carry Save Adder Using Domino Logic

Totally Self-Checking Carry-Select Adder Design Based on Two-Rail Code

DESIGN OF LOW POWER HIGH PERFORMANCE 4-16 MIXED LOGIC LINE DECODER P.Ramakrishna 1, T Shivashankar 2, S Sai Vaishnavi 3, V Gowthami 4 1

Implementation of Low Power High Speed Full Adder Using GDI Mux

A design of 16-bit adiabatic Microprocessor core

A HIGH SPEED & LOW POWER 16T 1-BIT FULL ADDER CIRCUIT DESIGN BY USING MTCMOS TECHNIQUE IN 45nm TECHNOLOGY

Performance Analysis of Advanced Adders Under Changing Technologies

Optimizing addition for sub-threshold logic

Low Power and High Performance Level-up Shifters for Mobile Devices with Multi-V DD

Transcription:

Performance Comparison of VLSI Adders Using Logical Effort 1 Hoang Q. Dao and Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory Department of Electrical and Computer Engineering University of California, Davis, CA 951 http://www.ece.ucdavis.edu/acsel {hqdao,vojin}@ece.ucdavis.edu Abstract. Application of logical effort on transistor-level analysis of different 4-bit adder topologies is presented. Logical effort method is used to estimate delay and impact of different adder topologies and to evaluate the validity of the results obtained using logical effort methodology. The tested adder topologies were Carry-Select, Han-Carlson, Kogge-Stone, Ling, and Carry-Lookahead adder. The quality of the obtained estimates was validated by circuit simulation using H-SPICE for 1.8V,.18mm Fujitsu technology. 1 Introduction Delay estimation is critical in development of efficient VLSI algorithms []. Unfortunately, delay estimates used are usually presented either in terms of gate delays or in terms of logic levels. Neither of these estimates allows us to properly evaluate different VLSI topologies. One such component, VLSI adder, is critical in the design of high-performance processors. Using gate delay is no longer adequate because gate delays are dependent on gate types, the number of inputs (fan-in), output load (fan-out), and particular implementation. Further, a particular VLSI implementation can use static or dynamic CMOS where logic function is usually packed into a complex logic blocks. Thus, the notion of logic gate and associated gate delay becomes artificial and misleading. In this analysis, we are evaluating the use of the logical effort method not only for the purpose of better delay estimation but also for evaluation of different adder topologies and their impact on design of VLSI adders. The logical effort (LE) analysis [1] models the gate delay using gate characteristics and its loading and compares the gate delay to t, the delay of a Fan-Out of 1 (FO1) inverter. This latter delay is normally known for a given technology and can serve to estimate the speed. When a gate is loaded, its delay varies linearly with the output load expressed in terms of fan-outs. LE also accounts for the effect of circuit 1 This work has been supported by SRC Research Grant No. 931.1, Fujitsu Laboratories of America and California MICRO 1-3 B. Hochet et al. (Eds.): PATMOS, LNCS 451, pp. 5 34,. Springer-Verlag Berlin Heidelberg

H.Q. Dao and V.G. Oklobdzija topology, by including path branching in the model. The delay estimation using LE method is quick and sufficiently accurate. In order to evaluate the efficiency and usefulness of LE we have chosen several diverse adder topologies and compared the estimated delay with the one obtained via simulation. The adders chosen for this analysis were: multiplexer-based adder (MXA) which is implemented as a static radix- 4-bit adder with conditional-sum in the final stage [3]; Han-Carlson consisting of static and dynamic radix- adders [5][]; Kogge- Stone, static and dynamic radix-, and dynamic radix-4 adder [7][8]; Naffziger s implementation of Ling s adder [9] in a dynamic radix-4 topology [1]; and a Carry- Look-ahead (CLA) adder implemented in dynamic radix-4 topology [4]. The multiplexer-based adder (MXA) takes advantage of its simplicity and speed of transmission-gate multiplexer implementation [3]. The sums are generated conditionally in groups of 4 bits. The carries to these groups are formed using radix- propagates and generates. The generate path is critical, passing through 9 stages including the total of 7 multiplexers. Thus, the transmission-gate multiplexer speed is a dominant factor determining the speed of this adder. The Han-Carlson and Kogge-Stone adders use similar radix- structure as MXA. However, they combine the carries with the half-sum signals in order to obtain the final results. Direct CMOS implementation of generate and propagate logic had been used, allowing usage of both static and dynamic gates. The Han-Carlson adder differs from the Kogge-Stone adder by not creating all the carries from the radix- structure. Instead, only even carries are created and odd carries are generated from even carries. Therefore, in terms of logic stages, Han-Carlson uses one extra stage while Kogge- Stone adder is equivalent in the number of stages to MXA. Ling s adder obtains high performance by exploiting wired-or gate property of emitter-coupled logic. With CMOS implementation, such advantage is lost. However, it was shown in [1] that high performance could be realized using radix-4 propagates and generates for carries and conditional sum. The CLA adder allows fast implementation, especially the dynamic radix-4 type [4]. CLA is a textbook example and it is most commonly used. However, with dynamic radix-4 implementation, its large transistor stack and many stages made it appear slow compared to other adders. Using logical effort method for quick optimization, these adders were evaluated and compared in [1] and extended next with the inclusion of radix- Han-Carlson and radix- Kogge-Stone adders. Section outlined the optimization conditions for the adders. The delay of adders using logical effort method was discussed in section 3. The results were compared with H-SPICE simulation in section 4. The conclusion of the work was given in section 5. Optimization Conditions All adders were optimized under the following conditions: maximum input size of mm, maximal allowable transistor size of mm and an equivalent load of 3mminverter. These conditions were set to get reasonable transistor sizes and loads to an adder.

Performance Comparison of VLSI Adders Using Logical Effort 7 The wiring capacitance was included. It was computed using the unit-length wiring capacitance and the 1-bit cell width. This width was determined from the preliminary layout of the most congested bit cell. The wire length was determined from the number of bits it spanned and the number of wires running in parallel. Using logical effort method, the adders were optimized according to the critical paths that were estimated from the adder topology. Delay effort in other paths was computed from the critical one. The optimization process was applied recursively to update the branch factors along the critical path. It finished after all transistor sizes converged and the final result recorded the adder delay. 3 Delay Effort of Adders The logical effort of gates was obtained from simulation. This adjustment was necessary for two reasons: first, pmos and nmos driving capability vary with technology, and secondly, better average per-stage delay can be achieved using the p- n ratio in the range of 1.4-1.. Thus, we needed to repeat the gate delay simulation in order to accurately model the delay; the drain and source areas of transistors were..3 4..7 8..11 1..15 1..19..3 4..7 8..31 3..35 3..39 4..43 44..47 48..51 5..55 5..59..3 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 PG4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 S4 a b a1 b1 g g g1 a b a b a3 a3 b3 a b b a b1 a1 a1 a b S G1 p3 p 1 g p1 p3 p3 1 S G1 p G3 P3 P3 G1 1 S PG Group Cin G3 P3 P3 Sum3 Sum Sum1 Sum Fig. 1. Multiplexer-based carry select adder: diagram and circuits [3]

8 H.Q. Dao and V.G. Oklobdzija included to match better with real layout. We used p-n ratio of 1.5 for the performance reason. Nonetheless, all gates continued to show linear delay with fanout. In addition, to accurately model the delay, the domino gates were broken into dynamic and static gates. First, the latter have different driving capability and needed to size differently. Second, domino gates can be very complex (for example, in CLA and Ling adder, group generates and group carries drive multiple inverters at different locations on its NMOS stack). Without such separation, it is very difficult to model its delay accurately. Han-Carlson diagram Kogge-Stone diagram 3......... 31 3 15 14 7 5 4 3 1 3... 31 3 9... 15 14 13... 7 5 4 3 1 L1 L L3 L4 L5 L Odd Sum......... L1 L L3 L4 L5 L Inv Sum......... a b a b p i g i-1 g i p i p i-1 p i g i-1 g i p i p i-1 P Cin g p Sum G P G P A i G i A i B i P i G i-1 G i P i-1 G G i G B i G i-1 G i G i-1 P i P i Fig.. Radix- Han-Carlson and Kogge-Stone adders: diagrams and circuits [5][][7] 3.1 Results The static radix- MXA consists of 9 stages and was implemented using static CMOS (Fig. 1). The radix- structure was chosen so that -input gates could be used. The generate signals were implemented with transmission-gate multiplexers, which were controlled by propagate and their complementary signals. In [3], single-ended propagate signals were implemented and inverters were needed to generate the complement signals. To avoid this delay penalty, complementary propagate signals were generated directly. The critical path was from bit-1 propagate through generate

Performance Comparison of VLSI Adders Using Logical Effort 9 paths to the MSB sum. Along this path, the fan-out was slightly larger than. The logical effort optimization achieved the total delay of 55.8t (11.4FO 4 ). 3 1 59............... 48 3 1 1 8 7 5 4 3 1 G4 P4 G1 P1 Co Sum G4 P4 A A A G3 G G1 G B B B A B P1 A B A B HSN G3 P P3 HS Sum STB Fig. 3. Radix-4 Kogge-Stone adder: diagrams and circuits [7] The radix- Han-Carlson adder (Fig. ) realizes even carries with propagate and generate signals of even bits. The odd-bit carries are generated at the end using even carries. The critical path goes from bit 1 through the generate path to the MSB sum, traversing 1 stages. The propagate paths had the equal number of stages but they were loaded less heavily than the most critical generate path. The fan-out along the critical path was less. The total delay was.5t (1.8FO 4 ) and 55.8t (11.4 FO 4 ) for static and dynamic implementation. The radix- Kogge-Stone adder is similar in architecture to the Han-Carlson. The difference is that propagate and generate signals of all bits are created in Kogge-Stone adder (Fig. ). This results in 9 stages, one less as compared to Han-Carlson adder. The cost, however, was in twice as many gates for propagate and generate signals and doubling of the number of wires. The critical path went through the generate signals, traversing 9 stages. The fanout was also less than. The total delay after optimization was 57.t (11.8FO 4 ) and 4.t (8.7FO 4 ) for static and dynamic implementation. The delay is better compared to Han-Carlson adder. The dynamic radix-4 Kogge-Stone adder was implemented in only stages, by using redundant logic in propagate and generate stages and strobe signals for final sum (Fig. 3). The cost was very high input and internal loading and large amount of wiring between stages. In addition, dynamic stages that followed were slow NOR gates. The critical path went through the generate path from bit to the MSB sum. The total delay is 3.1t (.FO 4 ). This is the best delay seen - showing the advantage of using fewer stages over its complexity.

3 H.Q. Dao and V.G. Oklobdzija The dynamic radix-4 CLA was realized in 1 stages or 8 domino gates (Fig. 4). The critical path was from bit through the generate path and higher-bit carries to the MSB of the sum. Fan-out of 3 was observed along generate and carry paths. The total delay is 54.3t (11.1FO 4 ) due to more loading and longer wires. b47 b3 b31 b1 C4 C4 C44 C3 C8 C b3 C3 b48 b15 C1 b C5 C8 C C5 C1 C4 C48 C Cin = C G P G 1 P 1 G P G 3 P 3 P 1: P : P 3: C G 1: G : G 3: C 1 C C 3 Fig. 4. Radix-4 CLA adder: diagrams and circuits [4] Naffziger s implementation of modified Ling s adder [1] utilizes Ling pseudocarries and propagate signals [9] in order to generate long carries and the conditionalsum adder for local carries (Fig. 5). The critical path was chosen through the long carry to the MSB Sum and it was realized in 9 stages, due to larger gate and wire loading. Local carry and sum paths have more stages than the critical path. They

Performance Comparison of VLSI Adders Using Logical Effort 31 were implemented with faster gates to avoid becoming critical. The total delay is 43.9t (9.FO 4 ). A A B G3 B A G4 A B A B P4 B LC G P1 P G1 G SumL LCH LCL C1L C1H CL K G C1H C1L CH SumH LCH LCL CH P CL Quadrant Pseudo-Carry Quadrant Propagate Long Carry Operands 4-b Pseudo-Carry 4-b Propagate 1-b Propagate 1-b Kill 1-b Generate Final Sum Result Dual Local Carry Fig. 5. Radix-4 modified Ling adder: diagrams and circuits [7] 3. Comparison Table 1 summarized the delay of adders using logical effort analysis. The delays are expressed in terms of inverter delay t and FO 4. The adders with fewer stages are consistently faster. Figure shows the total delay and number of stages. The delay was found to be linearly proportional to the number of stages in the critical path. It was capitalized into 1.FO 4 and.fo 4 per stage, respectively, for static and dynamic implementation.

3 H.Q. Dao and V.G. Oklobdzija Table 1. Adder delays using logical effort method Type Adder # Stages LE (t) # FO 4 MXA 9 55.8 11.4 KS 9 57. 11.8 HC 1.5 1.8 Static Dynamic KS-4 3.1. KS- 9 4. 8.7 Ling 9 43.9 9. HC 1 47.9 9.8 CLA 1 55.8 11.4 14 1 9 gates 9 1 1 1 9 9 1 Delay (FO4) 8 4 CS KS HC KS-4 KS- Ling HC CLA Fig.. Total delay from logical effort method and number of stages 4 Simulation Results The worst-case delay of each adder s critical path was simulated with H-SPICE using the.18mm, 1.8V CMOS at 7ƒC temperature. The results obtained were presented in Table. The results obtained using H-SPICE simulations are fairly consistent with the logical effort analysis in term of relative performance among adders. That is a good indicator and it confirms our belief that LE estimates should replace number of stages or gate counts as delay estimates when developing VLSI algorithms. Figure 7 showed the delays obtained using H-SPICE and a relative difference with logical effort results. The delay of adders remained dependent on the number of stages. In addition, the per-stage delay difference was degraded to 1.4FO 4 and.8fo 4 for static and dynamic implementation, respectively. Some inconsistency was observed between logical effort result and H-SPICE for MXA, which had larger errors compared to Kogge-Stone and Han-Carlson. The main error came from larger delay in the multiplexers than modeled. Because pmos-to-

Performance Comparison of VLSI Adders Using Logical Effort 33 nmos ratio of 1.5 was used, the rising signal was faster than the falling signal. So, multiplexer did not fully switch until the rising control to the multiplexer. Therefore, the multiplexer delay was always determined by the slow rising signal. It corresponded to the worst-case delay, not the average. Large errors were also seen in radix-4 dynamic adders. They used high-stack nmos and had many branches. Therefore were harder to model accurately, especially on parasitic delay. Table. Logical effort and simulation delay results Type Static Dynamic # Stages LE HSPICE HSPICE Diff. Adder (FO4) (FO4) (ps) (%) KS 9 11.8 1.9 853. -8.4 MXA 9 11.4 1.8 13 1.99 HC 1 1.8 13.3 13 3.49 KS-4. 7.4 581 17.11 KS- 9 8.7 9. 717 4.8 Ling 9 9. 9.5 74 5.34 HC 1 9.8 9.9 77.85 CLA 1 11.4 14. 117 19.3 1 14 HSPICE & Difference (FO4) 1 1 8 4 -.9 1.4.5 1.3.4.5.1.7 KS MXA HC KS-4 KS- Ling HC CLA Adders Fig. 7. Total delay with H-SPICE and delay difference Nonetheless, the relative performance among adders did not vary significantly. It was realized that having less stages in critical path helped to improve delay. Although less stage meant more complex gates that translated into worse per-stage delay, such delay degradation was offset by more delay reduction due to fewer stages. 5 Conclusion Use of Logical Effort method for performance comparison of different adder topologies was presented with wire capacitance included. Obtained results were

34 H.Q. Dao and V.G. Oklobdzija consistent with simulation and are encouraging. They show that incorporating Logical Effort into the analysis of VLSI adders can help find better adder topologies. References 1. I. Sutherland, B. Sproull, D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan Kaufmann Publisher, 1999.. V. G. Oklobdzija, E. R. Barnes, Some Optimal Schemes for ALU Implementation in VLSI Technology, Proceedings of 7th Symposium on Computer Arithmetic, June 4-, 1985, University of Illinois, Urbana, Illinois. 3. A. Farooqui, V. G. Oklobdzija, Multiplexer Based Adder for Media Signal Processing, 1998 Symposium on Circuits and Systems. 4. A. Naini, D. Bearden, W. Anderson, A 4.5nS 9-b CMOS Adder Design, in Proc. CICC, Feb. 199, pp. 5.5.1 5.5.4. 5. S. K. Mathew et al., Sub-5ps 4-b ALUs in.18mm SOI/Bulk CMOS: Design and Scaling Trends, Journal of Solid-State Circuits, Nov. 1.. T. Han, D. A. Carlson, Fast Area-Efficient VLSI Adders, 8th IEEE Symposium on Computer Arithmetic, Como, Italy, pp. 49 5, May 1987. 7. P. M. Kogge, H. S. Stone, A Parallel Algorithms for the Efficient Solution of a General Class of Recurrence Equations, IEEE Transactions on Computers, Vol. C-, No 8, Aug. 1973. p. 78 93. 8. J. Park et al., 47ps 4-Bit Parallel Binary Adder, Symposium on VLSI Circuits Digest of Technical Papers. 9. H. Ling, High Speed Binary Adder, IBM Journal of Research and Development, Vol. 5, No 3, May 1981, p. 15. 1. Naffziger, S., A Sub-Nanosecond.5 um 4 b Adder Design, 199 IEEE International Solid-State Circuits Conference, Digest of Technical Papers, San Francisco, February 8-1, 199. p. 3 3. 11. R. P. Brent, H. T. Kung, A Regular Layout for Parallel Adders, IEEE Trans., C-31(3), pp. 4, Mar 198. 1. H. Q. Dao, V. G. Oklobdzija, Application of Logical Effort Techniques for Speed Optimization and Analysis of Representative Adders, 35 th Annual Asilomar Conference on Signals, Systems and Computers, Pacific Grove, California, November 4 7, 1. 13. V. G. Oklobdzija, High-Performance System Design: Circuits and Logic, IEEE Press, 1999.