Pennsylvania State University. University Park, PA only simple two or three input gates (e.g., AND/NAND,

Similar documents
CHAPTER 2 LITERATURE STUDY

Mixed CMOS PTL Adders

Area-Time Efficient Digit-Serial-Serial Two s Complement Multiplier

ISSCC 2006 / SESSION 21 / ADVANCED CLOCKING, LOGIC AND SIGNALING TECHNIQUES / 21.5

Simulation of Transformer Based Z-Source Inverter to Obtain High Voltage Boost Ability

Application Note. Differential Amplifier

Chapter 2 Literature Review

Fuzzy Logic Controller for Three Phase PWM AC-DC Converter

The Discussion of this exercise covers the following points:

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

Three-Phase NPC Inverter Using Three-Phase Coupled Inductor

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design

Threshold Logic Computing: Memristive-CMOS Circuits for Fast Fourier Transform and Vedic Multiplication

Understanding Basic Analog Ideal Op Amps

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

High Speed On-Chip Interconnects: Trade offs in Passive Termination

CHAPTER 3 AMPLIFIER DESIGN TECHNIQUES

A Development of Earthing-Resistance-Estimation Instrument

Modeling of Conduction and Switching Losses in Three-Phase Asymmetric Multi-Level Cascaded Inverter

Experiment 3: Non-Ideal Operational Amplifiers

A Novel Back EMF Zero Crossing Detection of Brushless DC Motor Based on PWM

Experiment 3: Non-Ideal Operational Amplifiers

A COMPARISON OF CIRCUIT IMPLEMENTATIONS FROM A SECURITY PERSPECTIVE

Module 9. DC Machines. Version 2 EE IIT, Kharagpur

Soft switched DC-DC PWM Converters

5 I. T cu2. T use in modem computing systems, it is desirable to. A Comparison of Half-Bridge Resonant Converter Topologies

Math Circles Finite Automata Question Sheet 3 (Solutions)

Design And Implementation Of Luo Converter For Electric Vehicle Applications

Engineer-to-Engineer Note

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

To provide data transmission in indoor

Design and implementation of a high-speed bit-serial SFQ adder based on the binary decision diagram

Multi-beam antennas in a broadband wireless access system

Design and Development of 8-Bits Fast Multiplier for Low Power Applications

Localization of Latent Image in Heterophase AgBr(I) Tabular Microcrystals

A New Stochastic Inner Product Core Design for Digital FIR Filters

Three-Phase Synchronous Machines The synchronous machine can be used to operate as: 1. Synchronous motors 2. Synchronous generators (Alternator)

Kirchhoff s Rules. Kirchhoff s Laws. Kirchhoff s Rules. Kirchhoff s Laws. Practice. Understanding SPH4UW. Kirchhoff s Voltage Rule (KVR):

Electrically Large Zero-Phase-Shift Metamaterial-based Grid Array Antenna for UHF Near-Field RFID Readers

Synchronous Machine Parameter Measurement

Nevery electronic device, since all the semiconductor

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1)

First Round Solutions Grades 4, 5, and 6

Direct AC Generation from Solar Cell Arrays

DIGITAL multipliers [1], [2] are the core components of

DYE SOLUBILITY IN SUPERCRITICAL CARBON DIOXIDE FLUID

Synchronous Machine Parameter Measurement

Study on SLT calibration method of 2-port waveguide DUT

MOS Transistors. Silicon Lattice

Power-Aware FPGA Logic Synthesis Using Binary Decision Diagrams

Section Thyristor converter driven DC motor drive

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

EET 438a Automatic Control Systems Technology Laboratory 5 Control of a Separately Excited DC Machine

Experiment 3: The research of Thevenin theorem

Design and Modeling of Substrate Integrated Waveguide based Antenna to Study the Effect of Different Dielectric Materials

April 9, 2000 DIS chapter 10 CHAPTER 3 : INTEGRATED PROCESSOR-LEVEL ARCHITECTURES FOR REAL-TIME DIGITAL SIGNAL PROCESSING

A Practical DPA Countermeasure with BDD Architecture

Lecture 20. Intro to line integrals. Dan Nichols MATH 233, Spring 2018 University of Massachusetts.

Section 2.2 PWM converter driven DC motor drives

Algorithms for Memory Hierarchies Lecture 14

MULTILEVEL INVERTER TOPOLOGIES USING FLIPFLOPS

Analog computation of wavelet transform coefficients in real-time Moreira-Tamayo, O.; Pineda de Gyvez, J.

Engineer-to-Engineer Note

Eliminating Non-Determinism During Test of High-Speed Source Synchronous Differential Buses

Control of high-frequency AC link electronic transformer

The Design and Verification of A High-Performance Low-Control-Overhead Asynchronous Differential Equation Solver

Y9.ET1.3 Implementation of Secure Energy Management against Cyber/physical Attacks for FREEDM System

Example. Check that the Jacobian of the transformation to spherical coordinates is

Electronic Circuits I - Tutorial 03 Diode Applications I

DESIGN OF CONTINUOUS LAG COMPENSATORS

Effect of High-speed Milling tool path strategies on the surface roughness of Stavax ESR mold insert machining

(CATALYST GROUP) B"sic Electric"l Engineering

This is a repository copy of Effect of power state on absorption cross section of personal computer components.

Ultra Low Cost ACCELEROMETER

Ultra Low Cost ACCELEROMETER

Modeling of Inverter Fed Five Phase Induction Motor using V/f Control Technique

Lab 8. Speed Control of a D.C. motor. The Motor Drive

A Slot-Asynchronous MAC Protocol Design for Blind Rendezvous in Cognitive Radio Networks

MEASURE THE CHARACTERISTIC CURVES RELEVANT TO AN NPN TRANSISTOR

& Y Connected resistors, Light emitting diode.

Domination and Independence on Square Chessboard

Dokic: A Review on Energy Efficient CMOS Digital Logic

Exponential-Hyperbolic Model for Actual Operating Conditions of Three Phase Arc Furnaces

ECE 274 Digital Logic Fall 2009 Digital Design

Postprint. This is the accepted version of a paper presented at IEEE PES General Meeting.

Discontinued AN6262N, AN6263N. (planed maintenance type, maintenance type, planed discontinued typed, discontinued type)

A Channel Splitting Technique for Reducing Handoff Delay in Wireless Networks

THE present trends in the development of integrated circuits

Performance Comparison of Sliding Mode Control and Conventional PI Controller for Speed Control of Separately Excited Direct Current Motors

EE Controls Lab #2: Implementing State-Transition Logic on a PLC

10.4 AREAS AND LENGTHS IN POLAR COORDINATES

Geometric quantities for polar curves

Compared to generators DC MOTORS. Back e.m.f. Back e.m.f. Example. Example. The construction of a d.c. motor is the same as a d.c. generator.

CSI-SF: Estimating Wireless Channel State Using CSI Sampling & Fusion

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin

Soft-decision Viterbi Decoding with Diversity Combining. T.Sakai, K.Kobayashi, S.Kubota, M.Morikura, S.Kato

Substrate Integrated Evanescent Filters Employing Coaxial Stubs

Solutions to exercise 1 in ETS052 Computer Communication

Transcription:

High-throughput nd Low-power DSP Using locked-mos ircuitry Mnjit Borh Robert Michel Owens Deprtment of omputer Science nd Engineering Pennsylvni Stte University University Prk, PA 16802 Mry Jne Irwin Abstrct We rgue tht the clocked-mos ( 2 MOS) circuit fmily provides very high throughput nd low power lterntive to other existing circuit techniques for the fst developing mrket of portble electronics. By the virtue of self ltching gtes llowing very ne-grined pipelining, voidnce of prechrge nd short circuit power consumption, the 2 MOS circuit oers very good powerdely eciency. We support our clims through the design of n 8-bit unsigned binry multiplier with pipelining t the gte level which cn produce 500 million multiplictions per second consuming only 0.8 W power using 1.0 micron technology nd 3.3V power supply. By comprison the fstest previously existing pipelined multiplier hs throughput rte of 400 million multiplictions per second consuming 0.8 W power t 0.8 micron technology, 5V, using wve-pipelining. 1 Introduction Low power consumption nd high throughput re the two importnt requirements for portble nd rel time electronic equipment. Pipelining hs been used successfully to ttin high throughput in digitl signl processing (DSP) systems. When the throughput demnd is not very high, sttic MOS is considered the most power ecient circuit fmily. However, for very high throughput pplictions, sttic MOS tends to lose its power-dely eciency. This is due to the fct tht sttic MOS requires extr pipelined ltches which dd extr dely, limiting the throughput rte nd incresing the power consumption. If we consider the deepest level of pipelining, then ech pipeline block consists of this work ws prtilly supported by NSF grnt no. 8914587 DA- only simple two or three input gtes (e.g., AND/NAND, OR/NOR, XOR/XNOR), voiding long chins of trnsistors in series. The clock rte of pipelined circuit is determined by the slowest pipeline block. Non-clocked logic fmilies, like sttic MOS nd pss-trnsistor logic require seprte ltches dding extr dely to the clock cycle time nd incresing the power consumption. locked logic fmilies oer better power-dely chrcteristics thn non-clocked logic fmilies for deep pipelined circuits. In this pper we show tht 2 MOS circuits, due to their low power consumption nd bility to pply pipelining t much ner level, cn be used to build very high throughput circuits with low power consumption. We present the design of n 8-bit pipelined multiplier for unsigned numbers using 2 MOS to demonstrte the dvntges of 2 MOS logic for the domin. The multiplier, implemented in 1.0 micron technology, cn produce throughput t the rte of 500 million multiplictions per second with only 44 ns initil ltency consuming mere 0.8 W power including clock driving circuitry. The fstest existing pipelined multiplier is wve-pipelined using norml process complementry pss trnsistor logic (NPPL) nd hs throughput rte of 400 million multiplictions per second, consuming 0.8 W power with 0.8 micron technology[7]. We compre our design with severl other pipelined multiplier designs, including true single-phse logic bsed, PL-bsed, NMOS bsed nd qusi n-p domino logic bsed designs exploiting pipelining t vrious grnulrities. We briey describe the 2 MOS circuit fmily in section two. Section three detils the structure of the multiplier nd its bsic building blocks. Section four nlyzes its speed nd power consumption, vlidting the clims with SPIE simultion results. onclusions nd future reserch gols re presented in section ve. 2 The 2 MOS logic fmily The 2 MOS logic fmily ws rst proposed in 1973 for clcultor circuits s low-power nd smller re logic lterntive[8]. However, t tht time dynmic

logic, especilly logic requiring complicted clocking ws not considered very prcticl. With the dvnce in VLSI techniques the genertion of high-speed clocks with controlled skew hve become possible. Using very ne-grined pipelining (t the NAND/NOR gte level) 2 MOS circuits cn be built to produce very high throughput. The bsic principle of the 2 MOS circuit is simple: seprte the p nd the n blocks of the conventionl sttic MOS gte by two clock trnsistors T1 nd T2 (Figure 1()). The p-type block is enbled by T1 while the i1 i2 i1 () i2 i3 T1 T2 i3 Out Figure 1: () 2 MOS gte complementry clocks (c)consecutive gtes compute in lternte phses n-type block is enbled by T2. The two clock trnsistors re driven by two complementry clocks nd. The typicl wveforms of nd re given in Figure 1. When is low nd is high, the gte evlutes the logic function. When is high nd is low, the put of the gte is disconnected from the logic blocks nd the gte `holds' the put stte. During the period when the put of gte is on `hold', the computtion of the next gte cn tke plce. To llow this, consecutive gtes re driven by complementry clocks (Figure 1(c)). As result, wves of computtion cn propgte in pipelined mnner through the gtes. Thus, cting s self-ltched compute blocks, the 2 MOS circuit provides gte level pipelining t no extr cost. The dvntges of 2 MOS logic re mny: The inputs to gte re stble (on hold) before the gte strts computing. Therefore, short circuit current is eliminted { only one of the two logic blocks conduct t ny given time. The self ltching property of the 2 MOS circuit elimintes the need for ny extr in the (c) design. The dely of pipeline stge in 2 MOS circuit is determined by only single gte, which myinvolve only three trnsistors in series, enbles 2 MOS circuit to run t very high speed. Trnsistor/gte sizing in 2 MOS circuit is simpli- ed due to the fct tht it depends only on the fn- lod of the gte. 2 MOS does not require prechrge, nd it does not suer from the chrge-shring problem typicl of prechrge circuit. 2 MOS logic does not require complementry inputs (unlike PL or dul-ril logic) nd it cn generte put signls with trnsitions in both directions (in contrst to prechrge gtes), mking it comptible with conventionl MOS circuits. The put signls of 2 MOS hs complete swing of the voltge rnge, providing good noise immunity. There re, however, couple of issues tht need to be ddressed while using 2 MOS circuitry: 2 MOS uses complementry clocks, requiring two clock signls to be red to ech gte, resulting in n increse in the globl ring. reful clock ring with blnced pths nd ft-tree structure for clock drivers is used in this work to limit the clock skew. SPIE simultions show tht the circuit is cpble of tolerting skew of 0.3nS t 500MHz, 3.3V. A 2 MOS circuit my suer from cpcitive coupling. This hppens when signl line tht crosses over chrged signl chnges its voltge cusing smll cpcitive dischrge on the chrged signl. The voltge drop is usully very smll. Moreover, in pipelined design the plcement of the gtes re usully such tht gtes from the sme pipeline stge re plced in the sme slice. Therefore the lines tht cross over between slices belong to the sme pipeline stge which 'hold' nd 'evlute' t the sme time, hence cpcitive coupling is eliminted. Like sttic MOS, 2 MOS, gtes re inverting (e.g., NAND, NOR, INV); to obtin non-inverted signl fter n odd number of pipeline stges (or inverted signl fter even stges), sttic MOS inverters re used. Proper logic decomposition nd mpping cn be used to reduce such cses to lrge extent. Likeny other dynmic logic fmilies, 2 MOS does not llow power-down by disbling the clock.

2.1 Power-dely comprisons As mentioned erlier, when the throughput demnd for the circuit is very high, then pipelining should be pplied t the single gte level. Due to the requirement of extr ltches, non-clocked circuit fmilies show signicnt extr overhed in terms of dely nd power consumption. To evlute the power-dely trdeo offered by dierent logic fmilies for very deep pipelined circuits, we consider pipelined two-input NOR/OR gte implementtion using sttic MOS, pss-trnsistor logic nd 2 MOS. We lso used twotypes of ltches, the dt c1 c2 () Figure 2: () 2 MOS nd trnsmission-gte inverting ltches 2 MOS ltch nd the trnsmission-gte ltch(gure 2). The circuit consists of two input ltches driving the two inputs to the ctul logic gte which is lso ltched t the put(gure 3). The inputs to the input ltches i1 i2 l c t h l c t h _ dt Logic gte c1 c2 _ l c t h 10pF ll possible combintion of input vectors. Since ll the blocks require the sme number of clock trnsistors nd the clock ring is lso similr, we expect the powerconsumption of the clock circuitry to be the sme in ll the cses. Tble 1: Power-dely chrcteristics (2-input NOR/OR) gte type ltch type dely power power (ns) (10 5 W) dely 2 MOS 1.32 5.209 6.876 MOS 2 MOS 1.66 6.056 10.053 MOS trns-gte 1.87 8.332 15.581 pss-gte 2 MOS 2.2 7.487 16.471 Tble 1 shows dely, power nd power-dely product of the bsic pipeline block for the circuit implementtions. The fully 2 MOS circuit shows much superior clock-to-put dely compred to the other types of circuits. Moreover, the power consumption of the 2 MOS circuit is much smller thn the other implementtions which results in signicntly smller power-dely product for 2 MOS. 2.2 omprisons with other techniques True single phse logic proposed in [3] hs simpler clocking structure nd the dvntge of single phse clock. Even though the globl clock ring in true single phse logic is simpler thn 2 MOS, the number of clock trnsistors increse compred to 2 MOS (gure 4). The ctul logic lso becomes more complicted resulting in slower gte. SPIE simultions show tht the clock-to-put dely for true-singlephse 2-input NOR gte is 1.7 ns compred to 1.32 ns for 2 MOS while the power consumption with including the clock-driving circuitry is 8.185 10 5 W s compred to 5.20910 5 W for 2 MOS when simulted for circuit similr to gure 3. Moreover, the dely Figure 3: The sub-circuit used for comprison re derived from sttic MOS inverters nd the put of the ltched gte drives cpcitive lod of 10pF. For our discussion we ssume tht ll the trnsistors re uniformly sized to 4. We performed SPIE simultions to determine the mximum possible clock speed nd the power consumption for ech implementtion. We mesured the dely between the clock edge t the input ltches nd the signl t the put of the logic block before the put ltch. We lso mesured the verge power consumption when the circuit is simulted for () Figure 4: True-single-phse gtes: () n-block p- block clock drivers in true single phse logic re expected to

provide much shrper rise nd fll times which in turn mkes the clock genertion more dicult[5]. Wve pipelining[9] hs been used with complementry pss-trnsistor logic (PL) [10] to ttin high throughput with low power consumptions[1]. But s the uthors in [1] point, the design of PL bsed wvepipelined circuit is mostly dependent on blncing the dely long dierent pths nd setting the trnsistor width rtios to chieve proper logic threshold. All these properties vry with process, temperture nd other externl fctors. Therefore the performnce of such design my degrde signicntly in rel environment. Since the proper functioning of the circuit with dt overrun is dependent on the cumultive pth delys in the whole design, the design of lrge circuits using wvepipelining is very dicult. While implementing lrge system the wve speeds of dierent modules need to be mtched, which is nother chllenging requirement. 3 The 8-bit multiplier design Multipliers re essentil prts of digitl signl processing circuit nd it is the most time criticl component. Therefore pipelined multipliers re highly desirble which is evident from the bundnt literture on highly pipelined multiplier designs [4, 7, 1, 6, 2]. Erlier pipelined multipliers[2, 6,4]were bsed on using one full-dder stge s one pipelined unit. More recent pipelined multipliers, exploit pipelining t the hlfdder or XOR gte level [7, 1]. The obvious speed dvntge of ner level of pipelining is, however, ccompnied by the incresed number of pipelined nd expnsive clock circuitry which result in more power consumption. But the power-dely product comprison hs lwys been fvorble to the ner pipelined designs. Here we hve exploited the dvntge of 2 MOS logic gtes to design circuit with pipelining t the single NAND/NOR gte level, resulting in clock speed much fster thn the existing designs, while the lower power-dely chrcteristic of 2 MOS hs mde it possible to mintin firly low power consumption compred to ll existing designs. 3.1 The structure of the multiplier Arry rchitectures re commonly used for pipelined multiplier design due to their regulr structure nd esy interconnects. We designed pipelined rry multiplier using crry-sve dder rrys, s discussed in [2]. An rry of full-dder cells re used to ccumulte nd propgte the prtil sum nd crry. Ech row of full-dders lso dds new row of prtil products to the prtil sum nd crry. Therefore skewing re used to dely the inputs for the prtil product genertion logic such tht they re presented to the proper row t the proper time. For the nl dder stge, which ccumultes the prtil sum nd crry vlues into the nl product, we use tringulr vector merge rry of hlf-dder s described in [2] for the sme resons, i.e., reduction of pipeline ltency, reduction in the extr de-skewing nd regulr structure. 3.2 The structure of the bsic modules Our gol is to design the multiplier for mximum throughput. The bsic pipeline block of our design is limited to 2-input NAND/NOR gtes nd inverters. Thus, the stge dely of our design cn be s smll s the dely incurred in 2-input NOR gte. The circuit digrm of the hlf-dder nd the full-dder re shown in Figure 5. Notice tht the number of stges in full dder is four, requiring two clock cycles for it to compute. The isolted smll circles re sttic MOS inx y x y - sttic MOS inverter cycle seprtor phse seprtor () ci Figure 5: ircuit digrm of () hlf-dder nd fulldder verters used to produce inverted signls with dding extr pipeline stges. The bsic building block of the multiplier consists of full-dder nd n AND gte to compute the prtil product to be dded to the prtil sum nd crry signls pssed down from the previous row. The prtil product computtion is overlpped with the rst cycle of the fulldder computtion nd the new prtil product is used s the `ci' input in the second cycle of the full-dder. A register is lso included to pss on the multiplier bits to the next row for prtil product computtion. 3.3 Floor-pln nd ly The oor-pln of the 8-bit8-bit pipelined multiplier is shown in Figure 6. The dt ow is strictly verticl. The multiplicnd bits percolte down the crrysve dder rows nd prticipte in the prtil product genertion with the corresponding multiplier bits in the given row. The row of AND gtes t the top computes the rst two rows of the prtil product rry using c s s co

the lest signicnt two bits of the multiplier (y0, y1) with the corresponding multiplicnd bits. These prtil products re used in the next row (hlf dders) to compute the rst stge of crry-sve ddition. The most signicnt prtil product for ech row is computed by the `nd' block in the left most column. The put of x7 x6 x5 x3 x1 x0 o r y1 y2 y3 y4 y5 y6 y7 y0 x4 x2 p15 p8 p9 p10 p11 p12 p13 p14 deskewing hlf dder rry n d hlf dders nd full- dders 7 X 7 rry of d n skewing p0 p7 p1 p2 p3 p4 p5 p6 deskewing Figure 6: The oor-pln of the 8-bit8-bit multiplier the lst row of full-dders re input to the tringulr hlf-dder rry for computing the vector merge ddition producing the nl most signicnt product bits. The totl ltency of the 8-bit multiplier pipeline is 22 cycles. 4 Results nd nlysis The ly of the 2 MOS 88 multiplier is shown in Figure 7. The multiplier contins 6156 trnsistors nd hs silicon re of 0.90 mm 0.89 mm using 1.0 micron technology. This initil ly ws not optimized for re nd the re cn be reduced signicntly with better ly techniques. The clock driver nd distribution circuitry ccounts for b 70% of the totl power consumption of the multiplier. Therefore it is importnt to design the clock drivers properly to minimize the power consumption. We designed tree bsed MOS clock driver circuit to distribute the lod of the clock tree into three driver stges. The third stge of the driver tree consists of sixteen inverting buers which re driven by four inverting buers in the second stge nd they re in turn driven by single inverting buer. We believe tht more reduction in the power consumption cn be obtined by further optimiztion in the clock drivers. We simulted the multiplier core together with the clock driver tree using SPIE with 1.0 micron technology prmeters t 3.3V supply voltge t room temperture. The SPIE wveforms for the put of the Figure 7: Ly of the 8bit8bit multiplier Figure 8: SPIE put of the 8bit8bit multiplier with clock drivers t 500MHz, 3.3V

clock drivers nd the eight most signicnt product bits for the 8-bit8-bit multiplier running t 500 MHz re shown in gure 8. The verge power consumption of the 8-bit8-bit multiplier including the clock drivers is found to be 0.8 W t 500 MHz. Tble 2: Description of the existing multipliers nme precision omments Noll 8 8 1.0m nmos, 3V Htmin 8 8 2.5m MOS, 5V Lu 12 12 1.0m MOS,5V (qusi-domino) mult-ccum. Somsekhr 8 8 1.6m MOS,5V(true 1-) Ghosh 8 8 0.8 m MOS,5V(NPPL) This work 8 8 1.0 m MOS,3.3V( 2 MOS) Tble 2 lists some of the existing pipelined multiplier designs with respect to their precision nd process technology. A comprison of the performnce nd power consumption of the existing multipliers with the current work is presented in Tble 3. The lst column in Tble 3 shows the power-dely product for the multiplier designs. Our 2 MOS multiplier, simulted t 1.0 micron MOS, runs fster thn ny of the existing designs with lowpower consumption. Observe tht the 2 MOS circuit hs much superior power-dely product, which is highly desirble for portble DSP pplictions. Tble 3: omprison of dierent pipelined multipliers nme clk-rte power ltency powdely (MHz) (Wtts) (nsec) mw/mhz Noll 330 1.5 54.5 4.54 Htmin 70 0.25 228.57 3.57 Lu 200 1.3 65.0 6.5 Somsekhr 230 0.54 52.17 2.35 Ghosh 400 0.8 37.5 2.0 This work 500 0.8 44.0 1.6 5 onclusion Through the exmple of n 8-bit pipelined multiplier for unsigned numbers, we hve shown tht 2 MOS is n energy ecient logic fmily for very high-throughput pplictions. The multiplier presented in this pper is the fstest existing pipelined multiplier with throughput of 500 million multiplictions per second nd power consumption of 0.8 Wtt with 1.0 micron HP technology nd 3.3V supply voltge. The multiplier circuit is being fbricted s tiny-chip using 2.0 micron process for initil testing. The uthors re currently investigting the system level design of low-power nd high performnce DSP systems for portble electronic equipment using 2 MOS. References [1] D. Ghosh nd S. K. Nndy. A 400MHZ Wve- Pipelined 8 8-bit Multiplier in MOS Technology. In Proceedings of ID, pges 198{201, 1993. [2] Mehdi Htmin nd Glenn L. sh. A 70-MHz 8-bit8-bit Prllel Pipelined Multiplier in 2.5m MOS. IEEE Journl of Solid-Stte ircuits, sc- 21(4):505{513, August 1986. [3] Y. Ji-Ren, I. Krlsson, nd. Svensson. A True Single-Phse-lock Dynmic MOS ircuit Technique. IEEE Journl of Solid-Stte ircuits, sc- 22(5):899{901, October 1987. [4] Fng Lu nd Henry Smueli. A 200-MHz MOS Pipelined Multiplier-Accumultor Using Qusi- Domino Dynmic Full-Adder ell Design. IEEE Journl of Solid-Stte ircuits, 28(2):123{132, Februry 1993. [5] E. D. Mn nd M. Schobinger. Power Dissiption in the lock System of Highly Pipelined ULSI MOS ircuits. In Proc. of Interntionl Workshop on Low Power Design, pges 133{138, 1994. [6] T. G. Noll, D. Schmit-Lndsiedel, H. Klr, nd G. Enders. A Pipelined 330-MHz Multiplier. IEEE Journl of Solid-Stte ircuits, sc-21(3):411{416, June 1986. [7] D. Somsekhr nd V. Visvnthn. A 230-MHz Hlf-Bit Level Pipelined Multiplier Using True Single-Phse locking. IEEE Trnsctions on VLSI, 1(4):415{422, December 1993. [8] Y. Suzuki, K. Odgw, nd T. Abe. locked MOS lcultor ircuitry. IEEE Journl of Solid-Stte ircuits, sc-8(6):462{469, December 1973. [9] D.. Wong, G. De Micheli, nd M. Flynn. Designing High-Performnce Digitl ircuits Using Wve- Pipelining. IEEE Trnsctions on omputer-aided Design, 12(1):25{46, Jn 1993. [10] K. Yno, T. Ymnk, T. Nshid, M. Sito, K. Shimohigshi, nd Akihiro Shimizu. A 3.8-ns MOS 16 16-b Multiplier Using omplementry Pss-Trnsistor Logic. IEEE Journl of Solid-Stte ircuits, 25(2):388{395, April 1990.