A Taxonomy of Parallel Prefix Networks

Similar documents
Parallel Prefix Han-Carlson Adder

Performance Enhancement of Han-Carlson Adder

An Efficient Design of Low Power Speculative Han-Carlson Adder Using Concurrent Subtraction

Analysis of Parallel Prefix Adders

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

A Novel Approach For Designing A Low Power Parallel Prefix Adders

Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2

A New Parallel Prefix Adder Structure With Efficient Critical Delay Path And Gradded Bits Efficiency In CMOS 90nm Technology

Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits

Design and Estimation of delay, power and area for Parallel prefix adders

A Novel Hybrid Parallel-Prefix Adder Architecture With Efficient Timing-Area Characteristic

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Performance Comparison of VLSI Adders Using Logical Effort 1

Design of High Speed and Low Power Adder by using Prefix Tree Structure

Design and Implementation of Hybrid Parallel Prefix Adder

Area Delay Efficient Novel Adder By QCA Technology

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

A Family of Parallel-Prefix Modulo 2 n 1 Adders

Area Efficient Speculative Han-Carlson Adder

Copyright. Vignesh Naganathan

Design and implementation of Parallel Prefix Adders using FPGAs

Design and Characterization of Parallel Prefix Adders using FPGAs

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

Design of Efficient Han-Carlson-Adder

64 Bit Pipelined Hybrid Sparse Kogge-Stone Adder Using Different Valance

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

ISSN:

FPGA Implementation of Multiplication and Accumulation Unit using Vedic Multiplier and Parallel Prefix adders in SPARTAN 3E

Power Efficient and High Speed Carry Skip Adder using Binary to Excess One Converter

Binary Adder- Subtracter in QCA

A Novel 128-Bit QCA Adder

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

FPGA IMPLEMENTATION OF 32-BIT WAVE-PIPELINED SPARSE- TREE ADDER

Survey of VLSI Adders

Design of Roba Mutiplier Using Booth Signed Multiplier and Brent Kung Adder

Implementation Of Radix-10 Matrix Code Using High Speed Adder For Error Correction

DESIGN AND IMPLEMENTATION OF 128-BIT QUANTUM-DOT CELLULAR AUTOMATA ADDER

Efficient Shift-Add Multiplier Design Using Parallel Prefix Adder

Design and Implementation of a Power and Area Optimized Reconfigurable Superset Parallel Prefix Adder

Low Power Parallel Prefix Adder Design Using Two Phase Adiabatic Logic

Design and Implementation of High Speed Area Efficient Carry Select Adder Using Spanning Tree Adder Technique

A CASE STUDY OF CARRY SKIP ADDER AND DESIGN OF FEED-FORWARD MECHANISM TO IMPROVE THE SPEED OF CARRY CHAIN

PRIORITY encoder (PE) is a particular circuit that resolves

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

Area-Delay Efficient Binary Adders in QCA

METHODS FOR TRUE ENERGY- PERFORMANCE OPTIMIZATION. Naga Harika Chinta

Design of Efficient 32-Bit Parallel PrefixBrentKung Adder

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Implementation and Performance Evaluation of Prefix Adders uing FPGAs

Design of Parallel Prefix Tree Based High Speed Scalable CMOS Comparator for converters

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Multiple Constant Multiplication for Digit-Serial Implementation of Low Power FIR Filters

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

IN ORDER to meet the constant demand for performance

Performance Analysis of Advanced Adders Under Changing Technologies


Trade-Offs in Multiplier Block Algorithms for Low Power Digit-Serial FIR Filters

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

CLAA, CSLA and PPA based Shift and Add Multiplier for General Purpose Processor

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

ISSN Vol.04,Issue.03, March-2016, Pages:

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

An Interconnect-Centric Approach to Cyclic Shifter Design

High Performance Vedic Multiplier Using Han- Carlson Adder

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

A Low-Power 12 Transistor Full Adder Design using 3 Transistor XOR Gates

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Pre Layout And Post Layout Analysis Of Parallel Counter Architecture Based On State Look-Ahead Logic

Digital Integrated CircuitDesign

A High Speed Low Power Adder in Multi Output Domino Logic

Design of an optimized multiplier based on approximation logic

Anitha R 1, Alekhya Nelapati 2, Lincy Jesima W 3, V. Bagyaveereswaran 4, IEEE member, VIT University, Vellore

High Performance Low-Power Signed Multiplier

Power-Area trade-off for Different CMOS Design Technologies

Design and Implementation of Complex Multiplier Using Compressors

ADVANCES in NATURAL and APPLIED SCIENCES

Timing and Power Optimization Using Mixed- Dynamic-Static CMOS

Design Of Arthematic Logic Unit using GDI adder and multiplexer 1

A NOVEL IMPLEMENTATION OF HIGH SPEED MULTIPLIER USING BRENT KUNG CARRY SELECT ADDER K. Golda Hepzibha 1 and Subha 2

Design of 8-4 and 9-4 Compressors Forhigh Speed Multiplication

An Efficient Higher Order And High Speed Kogge-Stone Based CSLA Using Common Boolean Logic

VLSI Design I; A. Milenkovic 1

Power Efficient Weighted Modulo 2 n +1 Adder

Implementation of High Performance Carry Save Adder Using Domino Logic

DESIGN OF EXTENDED 4-BIT FULL ADDER CIRCUIT USING HYBRID-CMOS LOGIC

High Speed and Energy Efficient Carry Skip Adder Operating Under A Wide Range of Supply Voltages Levels

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

International Journal of Advance Engineering and Research Development

Low depth, low power carry lookahead adders using threshold logic

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Index terms: Gate Diffusion Input (GDI), Complementary Metal Oxide Semiconductor (CMOS), Digital Signal Processing (DSP).

ADVANCED DIGITAL DESIGN OF CARRY SKIP ADDER WITH HYBRID METHOD FOR FIELD PROGRAMMABLE GATE ARRAY 1

Implementation of Efficient 5:3 & 7:3 Compressors for High Speed and Low-Power Operations

Introduction to CMOS VLSI Design (E158) Lecture 5: Logic

Transcription:

A Taxonomy of Parallel Prefix Networks David Harris Harvey Mudd College / Sun Microsystems Laboratories 31 E. Twelfth St. Claremont, CA 91711 David_Harris@hmc.edu Abstract - Parallel prefix networks are widely used in highperformance adders. Networks in the literature represent tradeoffs between number of logic levels, fanout, and wiring tracks. This paper presents a three-dimensional taxonomy that not only describes the tradeoffs in existing parallel pre fix networks but also points to a family of new networks. Adders using these networks are compared using the method of logical effort. The new architecture is competitive in latency and area for some technologies. A 4 G 4 B 4 P 4 A 3 G 3 G B 3 P 3 A 2 G 2 G 2: B 2 P 2 A 1 G 1 G 1: B 1 P 1 C in G P Precomputation Prefix Network G : I. INTRODUCTION A parallel prefix circuit computes N outputs {Y N,, Y 1 } from N inputs {X N,, X 1 } using an arbitrary associative two-input operator as follows [13] Y = X 1 1 Y = X X 2 2 1 Y = X X X 3 3 2 1 Y = X X X X N N N-1 2 1 Common prefix computations include addition, incrementation, priority encoding, etc. Most prefix computations precompute intermediate variables {Z N:N,, Z 1:1 } from the inputs. The prefix network combines these intermediate variables to form the prefixes {Z N:1,, Z 1:1 }. The outputs are postcomputed from the inputs and prefixes. For example, adders take inputs {A N,, A 1 }, {B N,, B 1 } and C in and produce a sum output {S N,, S 1 } using intermediate generate (G) and propagate (P) prefix signals. The addition logic consists of the following calculations and is shown in Fig. 1. G = AiB G = C ii : i i : Precomputation: ; in (2) P = A B P = Prefix: ii : i i G = G + P ig : i: j ik : ik : k -1: j P = P ip i: j ik : k-1: j (1) (3) Postcomputation: Si = Pi Gi - 1: (4) There are many ways to perform the prefix computation. For example, serial-prefix structures like ripple carry adders are compact but have a latency O(N). Single-level carry lookahead structures reduce the latency by a constant factor. Parallel prefix circuits use a tree network to reduce latency to C out C 4 S 4 C 3 S 3 C 2 S 2 Fig. 1. Prefix computation: 4-bit adder C 1 S 1 C Postcomputation O(log N) and are widely used in fast adders, priority encoders [3], and other prefix computations. This paper focuses on valency-2 prefix operations (i.e. those that use 2-input associative operators), but the results readily generalize to higher valency [1]. Many parallel prefix networks have been described in the literature, especially in the context of addition. The classic networks include Brent-Kung [2], Sklansky [11], and Kogge-Stone [8]. An ideal prefix network would have log 2 N stages of logic, a fanout never exceeding 2 at each stage, and no more than one horizontal track of wire at each stage. The classic architectures deviate from ideal with 2log 2 N stages, fanout of N/2+1, and N/2 horizontal tracks, respectively. The Han-Carlson family of networks [5] offer tradeoffs in stages and wiring between Brent-Kung and Kogge-Stone. The family [7] similarly offers tradeoffs in fanout and wiring between Sklansky and Kogge- Stone and the Ladner-Fischer family [1] offers tradeoffs between fanout and stages between Sklansky and Brent- Kung. The Kowalczuk, Tudor, and Mlynek prefix network [9] has also been proposed, but this network is serialized in the middle and hence not as fast for wide adders. This paper develops a taxonomy of parallel prefix networks based on stages, fanout, and wiring tracks. The area of a datapath layout is the product of the number of rows and columns in the network. The latency strongly depends on fanout and wiring capacitance, not just number of logic levels. Therefore, the latency is evaluated using the method of logical effort [12, 6]. The taxonomy suggests new families of networks with different tradeoffs. One of these networks has area comparable with the smallest -783-814-1/3/$17. 23 IEEE 2213

known network and latency comparable with the fastest known network. Section II reviews the parallel prefix networks in the literature. Section III develops the taxonomy, which reveals a new family of prefix networks. Performance comparison appears in Section IV and Section V concludes the paper. (a) Brent-Kung 15 14 13 12 11 15:14 13:12 11:1 15:8 1 9:8 9 8 7 7:6 7: 6 5 4 3 2 1: 1 II. PARALLEL PREFIX NETWORKS Parallel prefix networks are distinguished by the arrangement of prefix cells. Fig. 2 shows six such networks for N=16. The upper box performs the precomputation and the lower box performs the postcomputation. In the middle, black cells, gray cells, and white buffers comprise the prefix network. Black cells perform the full prefix operation, as given in EQ (3). In certain cases, only part of the intermediate variable is required. For example, in many adder cells, only the G i: signal is required, and the P i: signal may be discarded. Such gray cells have lower input capacitance. White buffers are used to reduce the loading of later non-critical stages on the critical path. The span of bits covered by each cell output appears near the output. The critical path is indicated with a heavy line. The prefix graphs illustrate the tradeoffs in each network between number of logic levels, fanout, and horizontal wiring tracks. All three of these tradeoffs impact latency; Huang and Ercegovac [4] showed that networks with large number of wiring tracks increase the wiring capacitance because the tracks are packed on a tight pitch to achieve reasonable area. Observe that the Brent-Kung and Han-Carlson never have more than one black or gray cell in each pair of bits on any given row. This suggests that the datapath layout may use half as many columns, saving area and wire length. III. TAXONOMY Parallel prefix structures may be classified with a three-dimensional taxonomy (l,f,t) corresponding to the number of logic levels, fanout, and wiring tracks. For an N- bit parallel prefix structure with L = log 2 N, l, f, and t are integers in the range [, L-1] indicating: Logic Levels: L + l Fanout: 2 f +1 Wiring Tracks: 2 t This taxonomy is illustrated in Fig. 3 for N=16. The actual logic levels, fanout, and wiring tracks are annotated along each axis in parentheses. The parallel prefix networks from the previous section all fall on the plane l + f + t = L- 1, suggesting an inherent tradeoff between logic levels, fanout, and wiring tracks. The Brent-Kung (L-1,,), Sklansky (,L-1,), and Kogge-Stone (,,L-1) networks occupy vertices. The Ladner-Fischer (L-2,1,) network saves one 11: 1 9: 114: 1 12: 11: 1: 9: 8: 7: 6: 4: 2: 1: : (b) Sklansky 15:14 13:12 11:1 9:8 7:6 1: 14:12 1:8 6:4 2: 15:8 14:8 13:8 12:8 114: 1 12: 11: 1: 9: 8: 7: 6: 4: 2: 1: : (c) Kogge-Stone 15:14 14:13 13:12 12:11 11:1 1:9 9:8 8:7 7:6 6:5 4:3 2:1 1: 14:11 13:1 12:9 1:7 9:6 8:5 6:3 5:2 4:1 2: 15:8 14:7 13:6 12:5 11:4 1:3 9:2 8:1 7: 6: 4: 114: 1 12: 11: 1: 9: 8: 7: 6: 4: 2: 1: : (d) Han-Carlson 15:14 13:12 11:1 9:8 7:6 1: 13:1 9:6 5:2 15:8 13:6 11:4 9:2 7: 1 14: 1 12:11:1: 9: 8: 7: 6: 4: 2: 1: : (e) [2,1,1,1] 15:14 14:13 13:12 12:11 11:1 1:9 9:8 8:7 7:6 6:5 4:3 2:1 1: 14:11 13:1 12:9 1:7 9:6 8:5 6:3 5:2 4:1 2: 15:8 14:7 13:6 12:5 11:4 1:3 9:2 8:1 7: 6: 4: 1 14: 1 12:11:1: 9: 8: 7: 6: 4: 2: 1: : (f) Ladner-Fischer 15:14 13:12 11:1 9:8 7:6 1: 15:8 13:8 7: 15:8 1 11: 9: 114: 1 12: 11:1: 9: 8: 7: 6: 4: 2: 1: : Fig. 2. Parallel prefix networks 2214

level of logic at the expense of greater fanout. The Han- Carlson (1,,L-2) network reduces the wiring tracks of the Kogge-Stone network by nearly a factor of two at the expense of an extra level of logic. In general, Han and Carlson describe a family of networks along the diagonal (l,, t) with l + t = L-1. Similarly, the family of networks occupy the diagonal (, f, t) with f + t = L-1 and Ladner-Fischer occupy the diagonal (l, f, ) with l + f = L-1. networks are described by L integers specifying the fanout at each stage. For example, the [8,4,2,1] and [1,1,1,1] network represents the Sklansky and Kogge-Stone extremes and [2,1,1,1] was shown in Fig. 2e. In general, a (, f, t) network corresponds to the network [2 f, 2 f-1,, 1, 1], which is the network closest to the diagonal. The taxonomy suggests yet another family of parallel prefix networks found inside the cube with l, f, t >. Fig. 4 shows such a (1,1,1) network. For N=32, the new networks would include (1,1,2), (1,2,1), and (2,1,1). IV. RESULTS Table 1 compares the parallel prefix networks under consideration. The delay depends on the number of logic levels, the fanout, and the wire capacitance. All cells are designed to have the same drive capability; this drive is arbitrary and generally greater than minimum. Networks with l > are sparse and require half as many columns of cells. The wire capacitance depends on layout and process and can be expressed by w, the ratio of wire capacitance per column traversed to input capacitance of a unit inverter. Reasonable estimates from a trial layout in a 18 nm process are w =.5 for widely spaced tracks and w = 1 for networks with a large number of tightly spaced wiring tracks. The method of logical effort is used to estimate the latency adders built with each prefix network, following the assumptions made in [6]. Tables 2-4 shows how the latency depends on adder size, circuit family, and wire capacitance. V. CONCLUSION This paper has presented a three-dimensional taxonomy of parallel prefix networks showing the tradeoffs between number of stages, fanout, and wiring tracks. The taxonomy captures the networks used in the parallel prefix adders described in the literature. It also suggests a new family of parallel prefix networks inside the cube. The new architecture appears to have competitive latency in many cases. REFERENCES 1 A. Beaumont-Smith and C. Lim, Parallel prefix adder design, Proc. 15 th IEEE Symp. Comp. Arith., pp. 218-225, June 21. 15:14 15:8 15 14 13:12 13:1 13:6 13 12 11:1 11:4 11 1 9:8 9:6 9:2 9 8 7:6 7: 1 14:112:11:1: 9: 8: 7: 6: 4: 2: 1: : Fig. 4. New (1,1,1) parallel prefix network 2 R. Brent and H. Kung, A regular layout for parallel adders, IEEE Trans. Computers, vol. C-31, no. 3, pp. 26-264, March 1982. 3 C. Huang, J. Wang, and Y. Huang, Design of highperformance CMOS priority encoders and incrementer/decrementers using multilevel lookahead and multilevel folding techniques, IEEE J. Solid-State Circuits, vol. 37, no. 1, pp. 63-76, Jan. 22. 4 Z. Huang and M. Ercegovac, Effect of wire delay on the design of prefix adders in deep submicron technology, Proc. 34 th Asilomar Conf. Signals, Systems, and Computers, vol. 2, pp. 1713-1717, 2. 5 T. Han and D. Carlson, Fast area-efficient VLSI adders, Proc. 8 th Symp. Comp. Arith., pp. 49-56, Sept. 1987. 6 D. Harris and I. Sutherland, Logical effort analysis of carry propagate adders, Proc. 37 th Asilomar Conf. Signals, Systems, and Computers, 23. 7 S., A family of adders, Proc. 15 th IEEE Symp. Comp. Arith., pp. 277-281, June 21. 8 P. Kogge and H. Stone, A parallel algorithm for the efficient solution of a general class of recurrence relations, IEEE Trans. Computers, vol. C- 22, no. 8, pp. 786-793, Aug. 1973. 9 J. Kowalczuk, S. Tudor, and D. Mlynek, A new architecture for an automatic generation of fast pipeline adders, Proc. European Solid-State Circuits Conf., pp. 11-14, 1991. 1 R. Ladner and M. Fischer, Parallel prefix computation, J. ACM, vol. 27, no. 4, pp. 831-838, Oct. 198. 11 J. Sklansky, Conditional-sum addition logic, IRE Trans. Electronic Computers, vol. EC-9, pp. 226-231, June 196. 12 I. Sutherland, R. Sproull, and D. Harris, Logical Effort, San Francisco: Morgan Kaufmann, 1999. 13 R. Zimmermann, Binary Adder Architectures for Cell-Based VLSI and their Synthesis, ETH Dissertation 1248, Swiss Federal Institute of Technology, 1997. 7 6 5:2 5 4 3 2 1: 1 2215

l (Logic Levels) f (Fanout) Sklansky Ladner- Fischer Ladner- Fischer 2 (6) Brent- Kung 3 (7) 3 (9) 2 (5) 1 (5) 1 (3) (2) (4) (1) Han- Carlson [4,2,1,1] New (1,1,1) 1 (2) [2,1,1,1] Han- Carlson 2 (4) Kogge- Stone 3 (8) t (Wire Tracks) Fig. 3. Taxonomy of prefix graphs 2216

Architecture Classification Logic Levels Max Fanout Track s Col s Brent-Kung (L-1,, ) L + (L 1) 2 1 N/2 Sklansky (, L-1, ) L N/2 + 1 1 N Kogge-Stone (,, L-1) L 2 N/2 N Han-Carlson (1,, L-2) L + 1 2 N/4 N/2 [2,1,,1] (, 1, L-2) L 3 N/4 N Ladner-Fischer (1, L-2, ) L + 1 N/4 + 1 1 N/2 (1, 1, 1) (1, 1, L-3) L + 1 3 N/8 N/2 Table 1. Comparison of parallel prefix network architectures N = 16 N = 32 N = 64 N = 128 Brent-Kung 1.4 / 9.9 13.7 / 13. 18.1 / 17.4 24.9 / 24.2 Sklansky 13. / 8.8 21.6 / 12.4 38.2 / 18.3 7.8 / 28.2 Kogge-Stone 9.4 / 7.4 12.4 / 1. 17. / 14.1 24.8 / 21.5 Han-Carlson 9.9 / 7.7 12.1 / 9.4 15.1 / 12. 19.7 / 16.1 9.7 / 7.9 12.7 / 1.3 17.3 / 14.5 25.1 / 21.8 [2,1,,1] Ladner-Fischer 1.6 / 8.4 15.2 / 1.8 23.8 / 14.5 4.4 / 2.3 (1, 1, 1) 1.7 / 8.1 12.9 / 9.8 15.9 / 12.4 2.5 / 16.5 Table 2. Adder delays: w=.5; inverting static CMOS / footed domino Inverting Static CMOS Noninverting Static CMOS Footed Domino Footless Domino Brent-Kung 13.7 / 18.1 16.8 / 21.8 13. / 17.4 1.7 / 14.6 Sklansky 21.6 / 38.2 16.3 / 23.4 12.4 / 18.3 1.5 / 15.9 Kogge-Stone 12.4 / 17. 13.4 / 18. 1. / 14.1 8.7 / 12.7 Han-Carlson 12.1 / 15.1 13.3 / 16.4 9.4 / 12. 7.9 / 1.3 [2,1,,1] 12.7 / 17.3 13.6 / 18.3 1.3 / 14.5 8.9 / 12.9 Ladner-Fischer 15.2 / 23.8 14.5 / 19.1 1.8 / 14.5 8.9 / 12.1 (1, 1, 1) 12.9 / 15.9 13.8 / 16.9 9.8 / 12.4 8.3 / 1.6 Table 3. Adder delays: w=.5; N = 32/64 w = w =.25 w =.5 w =.75 w = 1 Brent-Kung 11.4 / 13.4 12.5 / 15.7 13.7 / 18.1 14.8 / 2.4 15.9 / 22.7 Sklansky 18.5 / 31.9 2.1 / 35. 21.6 / 38.2 23.1 / 41.4 24.7 / 44.5 Kogge-Stone 9.3 / 1.7 1.9 / 13.9 12.4 / 17. 13.9 / 2.1 15.5 / 23.3 Han-Carlson 1.5 / 11.9 11.3 / 13.5 12.1 / 15.1 12.9 / 16.7 13.7 / 18.3 [2,1,,1] 9.6 / 11. 11.2 / 14.2 12.7 / 17.3 14.3 / 2.4 15.8 / 23.6 Ladner-Fischer 13.6 / 2.6 14.4 / 22.2 15.2 / 23.8 16. / 25.4 16.8 / 27. (1, 1, 1) 11.2 / 12.6 12.1 / 14.3 12.9 / 15.9 13.8 / 17.6 14.6 / 19.2 Table 4. Adder delays: inverting static CMOS; N = 32/64 2217