REPORT DOCUMENTATION PAGE

Similar documents
International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

POWER minimization has become a primary concern in

Fast, Efficient, Recovering, and Irreversible

Boost Logic : A High Speed Energy Recovery Circuit Family

A Three-Port Adiabatic Register File Suitable for Embedded Applications

Energy Recovery for the Design of High-Speed, Low-Power Static RAMs

CHARGE-RECOVERY circuitry has the potential to reduce

Resonant Clock Design for a Power-efficient, High-volume. x86-64 Microprocessor

VOLTAGE scaling is one of the most effective methods for

A design of 16-bit adiabatic Microprocessor core

Towards An Efficient Low Frequency Energy Recovery Dynamic Logic

Design and Analysis of Energy Efficient MOS Digital Library Cell Based on Charge Recovery Logic

Adiabatic Logic Circuits for Low Power, High Speed Applications

Improved Two Phase Clocked Adiabatic Static CMOS Logic Circuit

DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

A Low-Power SRAM Design Using Quiet-Bitline Architecture

Chapter 3 DESIGN OF ADIABATIC CIRCUIT. 3.1 Introduction

Low-Power 4 4-Bit Array Two-Phase Clocked Adiabatic Static CMOS Logic Multiplier

Cascadable adiabatic logic circuits for low-power applications N.S.S. Reddy 1 M. Satyam 2 K.L. Kishore 3

Resonant Clock Circuits for Energy Recovery Power Reductions

Low Power Adiabatic Logic Design

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

A Variable-Frequency Parallel I/O Interface with Adaptive Power Supply Regulation

Energy-Recovery CMOS Design

IMPLEMENTATION OF ADIABATIC DYNAMIC LOGIC IN BIT FULL ADDER

0.18 μm CMOS Fully Differential CTIA for a 32x16 ROIC for 3D Ladar Imaging Systems

AC-1: A Clock-Powered Microprocessor

True Single-Phase Adiabatic Circuitry

A10-Gb/slow-power adaptive continuous-time linear equalizer using asynchronous under-sampling histogram

Clock-Powered CMOS: A Hybrid Adiabatic Logic Style for Energy-Efficient Computing

Performance Evaluation of Digital CMOS Circuits Using Complementary Pass Transistor Network

A 3-10GHz Ultra-Wideband Pulser

IREAP. MURI 2001 Review. John Rodgers, T. M. Firestone,V. L. Granatstein, M. Walter

DESIGN OF MULTIPLYING DELAY LOCKED LOOP FOR DIFFERENT MULTIPLYING FACTORS

A CMOS Phase Locked Loop based PWM Generator using 90nm Technology Rajeev Pankaj Nelapati 1 B.K.Arun Teja 2 K.Sai Ravi Teja 3

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements

Retractile Clock-Powered Logic

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

UT90nHBD Hardened-by-Design (HBD) Standard Cell Data Sheet February

Comparative Analysis of Low Power Adiabatic Logic Circuits in DSM Technology

Energy Efficient Design of Logic Circuits Using Adiabatic Process

UMAINE ECE Morse Code ROM and Transmitter at ISM Band Frequency

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS

LSI and Circuit Technologies for the SX-8 Supercomputer

Read/Write Stability Improvement of 8T Sram Cell Using Schmitt Trigger

A 10-Gb/s Multiphase Clock and Data Recovery Circuit with a Rotational Bang-Bang Phase Detector

Lecture 7: Components of Phase Locked Loop (PLL)

Design and Analysis of Energy Recovery Logic for Low Power Circuit Design

Power Optimized Energy Efficient Hybrid Circuits Design by Using A Novel Adiabatic Techniques N.L.S.P.Sai Ram*, K.Rajasekhar**

Lecture 160 Examples of CDR Circuits in CMOS (09/04/03) Page 160-1

Optimization of power in different circuits using MTCMOS Technique

Low Power Parallel Prefix Adder Design Using Two Phase Adiabatic Logic

Keywords : MTCMOS, CPFF, energy recycling, gated power, gated ground, sleep switch, sub threshold leakage. GJRE-F Classification : FOR Code:

High Performance ZVS Buck Regulator Removes Barriers To Increased Power Throughput In Wide Input Range Point-Of-Load Applications

/$ IEEE

An Analog Phase-Locked Loop

Power-Area trade-off for Different CMOS Design Technologies

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

Design of a Low Power 5GHz CMOS Radio Frequency Low Noise Amplifier Rakshith Venkatesh

REPORT DOCUMENTATION PAGE

CHAPTER 6 PHASE LOCKED LOOP ARCHITECTURE FOR ADC

SP 22.3: A 12mW Wide Dynamic Range CMOS Front-End for a Portable GPS Receiver

Design Of A Comparator For Pipelined A/D Converter

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

NEW WIRELESS applications are emerging where

ECE1352. Term Paper Low Voltage Phase-Locked Loop Design Technique

Hybrid QR Factorization Algorithm for High Performance Computing Architectures. Peter Vouras Naval Research Laboratory Radar Division

A Comparative Study of Power Dissipation of Sequential Circuits for 2N-2N2P, ECRL and PFAL Adiabatic Logic Families

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Lecture 11: Clocking

A Digital Clock Multiplier for Globally Asynchronous Locally Synchronous Designs

International Journal of Scientific & Engineering Research, Volume 4, Issue 6, June ISSN

Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

Ruixing Yang

A PSEUDO-CLASS-AB TELESCOPIC-CASCODE OPERATIONAL AMPLIFIER

DATASHEET HS-1145RH. Features. Applications. Ordering Information. Pinout

A Novel Low Power Optimization for On-Chip Interconnection

DESIGN OF ADIABATIC LOGIC BASED COMPARATOR FOR LOW POWER AND HIGH SPEED APPLICATIONS

Electronic Circuits EE359A

Low Power Multiplier Design Using Complementary Pass-Transistor Asynchronous Adiabatic Logic

MM QUALITY IXäSS&MÜ 4

An Enhanced Design Methodology for Resonant Clock. Trees

Bootstrapped ring oscillator with feedforward inputs for ultra-low-voltage application

A High-Driving Class-AB Buffer Amplifier with a New Pseudo Source Follower

IN the design of the fine comparator for a CMOS two-step flash A/D converter, the main design issues are offset cancelation

An Optimal Design of Ring Oscillator and Differential LC using 45 nm CMOS Technology

Design of Low Power CMOS Startup Charge Pump Based on Body Biasing Technique

Domino Static Gates Final Design Report

Highly Efficient Ultra-Compact Isolated DC-DC Converter with Fully Integrated Active Clamping H-Bridge and Synchronous Rectifier

REPORT DOCUMENTATION PAGE

P high-performance and portable applications. Methods for

UNIT-II LOW POWER VLSI DESIGN APPROACHES

A Novel Low-Power Scan Design Technique Using Supply Gating

Design of Low-Power High-Performance 2-4 and 4-16 Mixed-Logic Line Decoders

A Clock Generating System for USB 2.0 with a High-PSR Bandgap Reference Generator

6.776 High Speed Communication Circuits and Systems Lecture 14 Voltage Controlled Oscillators

Quadrature GPS Receiver Front-End in 0.13μm CMOS: The QLMV cell

Design for MOSIS Educational Program (Research) Testing Report for Project Number 89742

Transcription:

REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comment regarding this burden estimates or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188,) Washington, DC 20503. 1. AGENCY USE ONLY ( Leave Blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED 4. TITLE AND SUBTITLE 5. FUNDING NUMBERS 6. AUTHOR(S) 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) U. S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 10. SPONSORING / MONITORING AGENCY REPORT NUMBER 11. SUPPLEMENTARY NOTES The views, opinions and/or findings contained in this report are those of the author(s) and should not be construed as an official Department of the Army position, policy or decision, unless so designated by other documentation. 12 a. DISTRIBUTION / AVAILABILITY STATEMENT Approved for public release; distribution unlimited. 13. ABSTRACT (Maximum 200 words) 12 b. DISTRIBUTION CODE. 14. SUBJECT TERMS 15. NUMBER OF PAGES 16. PRICE CODE 17. SECURITY CLASSIFICATION OR REPORT UNCLASSIFIED NSN 7540-01-280-5500 18. SECURITY CLASSIFICATION ON THIS PAGE UNCLASSIFIED 19. SECURITY CLASSIFICATION OF ABSTRACT UNCLASSIFIED 20. LIMITATION OF ABSTRACT UL Standard Form 298 (Rev.2-89) Prescribed by ANSI Std. 239-18 298-102

Design Technologies for Energy-Efficient VLSI Systems Final Progress Report Prof. Marios C. Papaefthymiou, Principal Investigator 20 January 2007 U.S. Army Research Office Grant No. DAAD19-03-1-0122 Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109-2122 Approved for public release Distribution unlimited The views, opinions, and/or findings contained in this report are those of the author(s) and should not be construed as an official Department of the Army position, policy, or decision, unless so designated by other documentation. 1

Contents 1 Statement of the problems studied 3 2 Summary of the most important findings 3 2.1 Boost Logic.................................. 4 2.2 Charge-recovery ASIC design........................ 6 2.3 GHz-class resonant clocking......................... 10 2.4 Charge-recovery SRAM........................... 12 3 List of publications and technical reports 13 3.1 Papers published in peer-reviewed journals.................. 13 3.2 Papers published in non-peer-reviewed journals or in conference proceedings 14 3.3 Papers presented at meetings but not published in conference proceedings. 15 3.4 Manuscripts submitted but not published................... 15 3.5 Technical reports submitted to ARO..................... 15 4 List of all participating research personnel 15 5 Report of inventions 16 2

1 Statement of the problems studied This project has investigated novel design technologies for energy-efficient VLSI systems. Its primary focus has been on charge-recovery circuits. These circuits achieve higher energy effieiency than their conventional counterparts by steering currents to flow across devices with low voltage drops, while recycling undissipated energy in parasitic capacitors. Previous investigations into charge recovery have resulted in complex circuits and architectures that are impractical for high-speed design. This project has led to the discovery of practical low-complexity charge-recovery circuits which achieve high energy efficiency and achieve clock frequencies in excess of 1GHz. The results of this research have been validated through silicon prototyping and experimentation. For four of the inventions resulting from this project, the University of Michigan has filed utility and provisional patent applications with the US Patent and Trademark Office. 2 Summary of the most important findings The main contributions of this research project were the following. Boost Logic for GHz charge-recovery operation: Boost logic is a novel dynamic charge-recovery family that operates with a two-phase power-clock waveform. In post-layout Spice simulations of 16-bit multipliers in a 130nm bulk silicon process at 1GHz, Boost Logic implementations achieve 5-10 times higher energy efficiency than minimum-energy pipelined and voltage-scaled static CMOS at the expense of 2-3 times longer latency. In a fully-integrated test chip implemented using a 130nm bulk silicon process and on-chip inductors, chains of Boost Logic gates operate at clock frequencies up to 1.3GHz with a 1.5V supply. When resonating at 850MHz with a 1.2V supply, the Boost Logic test chip achieves 60% charge recovery. This Boost Logic is the fastest charge-recovery design reported to date. Charge-recovery ASIC design methodology: This ASIC methodology relies on a novel charge-recovery flip-flop design and a metal-only clock distribution network. By enabling the recovery of charge from the clock distribution network, this methodology yields ASIC designs with minimal clock power dissipation. A resonant-clocked ASIC for the Discrete Wavelet Transform has been designed using this methodology and industry-standard tools. On-chip circuitry is used to generate a single-phase resonant clock of sinusoidal shape. Correct operation has been confirmed experimentally for clock frequencies up to 300MHz, with measured clock power savings ranging between 60% and 75115MHz, depending on primary input activity. GHz-class resonant clocking: The potential of resonant clocking for energy efficient design at GHz-class operating frequencies has been evaluated through chip measurements of a 1.1GHz resonant clock distribution network in a 130nm bulk silicon process. Evergy savings in the order of 45% have been demonstrated. Low-power charge-recovery static memory (SRAMs): The proposed SRAM architecture relies on balanced loading to achieve high-efficiency charge recovery from 3

LOGIC Vdd M5 BOOST out inputs True Evaluation Tree M1 M4 M6 Vss Vdd M8 M2 M3 inputs Complementary Evaluation Tree M7 out Vss Figure 1: Schematic of Boost logic gate with pseudo-nmos pulldown. the bit lines. In Spice simulations of SRAM arrays in a 250nm bulk silicon process, the proposd architecture dissipates 27% less power than its conventional counterpart. The remainder of this section provides more details about each of our main contributions. 2.1 Boost Logic Charge-recovery architectures reduce energy dissipation by steering currents across devices with low voltage differences whilte recycling the energy stored in their capacitors [1, 4]. The efficient operation of these designs is the result of an energy-speed trade-off. This project has led to the discovery of Boost Logic, a high-speed charge-recovery circuit family that is capable of operating at GHz-class frequencies with high efficiency by trading off power dissipation for latency of operation [11, 12, 13, 14]. Boost Logic achieves significant energy savings over voltage-scaled static CMOS across a range of frequencies much higher than previously demonstrated in charge-recovery literature. A unique feature of Boost Logic that enables energy-efficient and high-throughput operation is an aggressively scaled, conventionally switching Logic stage that operates in tandem with a charge-recovery Boost stage. Logic performs the logical evaluation of a Boost Logic gate operating at an ultra-low DC supply voltage of approximately one threshold voltage. After Logic pre-resolves the differential outputs of a Boost Logic gate to the level of about one threshold voltage, Boost amplifies the difference between the output nodes to the full rail in an energy-efficient charge-recovery manner, providing a large overdrive to fanout gates and thereby reducing delay in their Logic stages. Figure 1 shows the structure of a Boost Logic gate. The Logic stage can be implemented in any transistor topology as long as it supports the use of clocked transistors M5 M8. These clocked transistors decouple the Logic stage from the output nodes when the Boost stage drives them. The pseudo-nmos implementation shown in Figure 1 trades off the voltage difference in the pre-resolved output nodes (pseudo-nmos gates do not swing to the full rail) for lower gate loading to achieve better performance at higher operating frequencies. At lower operating frequencies, the use of dual-rail CMOS topology in 4

1.2 Boost Logic Boost 1.1 1000m 900m 800m out φ Voltages 700m 600m out 500m 400m φ 300m 200m 100m 0 1n 2n 3n 4n Time Figure 2: Simulation waveform of Boost Logic inverter. Logic offers the advantages of full-rail evaluation, the lack of crowbar current, and reduced susceptibility to process variation. The DC power-supply rails are at voltages Therefore, the potential difference between the supply rails in Logic is. The Boost stage resembles back-to-back inverters, with the only difference being that and are replaced by and. Figure 2 shows the outputs of an inverter in a ring configuration of four Boost Logic inverters. During Logic, an initial potential difference is developed between the two complementary outputs. During Boost, Logic is deactivated, and the power-clock waveforms drive the outputs to the rails ( or ). These outputs in turn drive fanout Logic stages. As the power-clock phases swing back and their voltage difference approaches, the transistors in the Boost stage are in cutoff, isolating Boost from the outputs. At that time, Logic once again begins to evaluate. To compare the performance of Boost Logic designs with their conventional counterparts, we designed 16-bit carry-save multipliers in static CMOS and Boost Logic using a 130nm bulk silicon process. In the Boost multiplier, a two-phase power-clock waveform was obtained using an H-Bridge clock generator. The clock generator was driven at the!, so that the natural frequency of the LC system formed by the parasitic capacitance of the circuit and the inductor of the clock generator matched the target operating frequency of 1GHz. The static CMOS multiplier was pipelined and voltage-scaled for minimum power, achieving target clock frequency of 1GHz. The inductor value was set at 1GHz operation with a 1V supply and 8 cycles of latency. With regular, the latency of the Boost multiplier was 3 times longer (24 cycles), but its energy efficiency was 5 times higher (15.8pJ vs. 80.1pJ per cycle). With low, the latency of the Boost multiplier was 2 times longer (16 cycles), while its energy efficiency was almost 10 times higher (10.64pJ vs. 80.1pJ per cycle). In a fully integrated test-chip we implemented using a 130nm bulk silicon process and 5

Figure 3: Block diagram of Boost Logic test chip. Cl ockgeneratorswitches Boost Logic gate chains Cl ockgeneratorswitches Programmabl e Schmitt triggers Figure 4: Microphotograph of Boost Logic test chip. on-chip inductors, chains of Boost Logic gates operate at clock frequencies up to 1.3GHz with a 1.5V supply. Figure 3 shows a block diagram and Figure 4 shows a microphotograph of our test chip. The test structures on the chip were 8 chains of AND, OR, XOR, and INV gates. Each chain had 200 gates. An on-chip H-Bridge clock generator was used with a 2.4nH on-chip inductor. The four clock generator switches were driven at the target clock frequency. Figure 5 gives measured current and inferred power dissipation in the supply of the test chip over a range of operating frequencies from 700MHz to 1.1GHz. Correct operation was verified up to 1.3GHz. The natural frequency of the chip was measured at approximately 850MHz. At that frequency, energy per cycle was measured at 26pJ, yielding a 60% reduction in power dissipation over switching of the same capacitive load. 2.2 Charge-recovery ASIC design Another significant contribution of this project has been the development of an ASIC methodology for charge-recovery design. This methodology relies on a flip-flop archi6

50 40 Energy Current Energy Dissipation per Cycle(pJ) 45 40 35 30 35 30 25 20 Current(mA) 25 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 15 Operating Frequency (GHz) Figure 5: Measured current and corresponding per-cycle energy vs. frequency. tecture that can operate with a sinusoidal clock to yield a resonant-clocked ASIC with a metal-only clock distribution network. Figure 6: Microphotograph of the resonant clocked ASIC chip. To demonstrate the effectiveness of our charge-recovery ASIC design methodology, we have designed and tested a synthesized ASIC that performs a 7-bit Discrete Wavelet Transform. The chip has been fabricated in a 250nm bulk-cmos process through MOSIS. Comprising close to 4,000 gates, our ASIC is clocked by a resonant charge-recovering waveform of sinusoidal shape. Figure 6 shows a microphotograph of our resonant-clocked chip [17]. The lower left corner of the die contains our experimental energy-recovering design that consists of an ASIC core, an on-chip resonant clock generator, and some testing logic. The energy recovering flip-flops are driven by a resonant waveform generated using an off-chip surface-mount inductor and the on-chip power-clock generator. A schematic of the energy-recovering flip-flop used in our ASIC is shown in Figure 7. This flip-flop consists of a charge-recovery dynamic buffer that drives a pair of cross- 7

supply clock ϕ MP1 MP2 Q MN1 MN2 static output D static input Figure 7: Energy recovering sinusoidal flip-flop used in the resonant clocked ASIC chip. coupled NOR gates as the static latch element. Our flip-flop latches on rising pulses of power-clock. The input needs to be stable by the time power-clock is roughly half way to its peak, and should be held stable until power-clock is at the peak. The flip-flop draws more current from the power-clock when active (i.e., the data is changing), thus changing the effective load seen by the power-clock generator. reference clock D1 D2 D3 reference voltage SINGLE CYCLE CONTROLLER supply clock ϕ VDD PC L VSS Figure 8: Clock generator used in ASIC chip. Our chip includes a single-cycle feedback control resonant power-clock generator, shown in Figure 8, that is capable of reacting to changes in its load. The amplitude of the powerclock signal is sampled and compared against a reference level. The result of this comparison is used to decide, on a cycle-by-cycle basis, whether or not to turn on the main NMOS power-switch to pump more energy into the power-clock. This control is critical for achieving ultra-low dissipation when the ASIC is idling. Figure 9 shows the measured energy dissipation of the clock network in our resonantclocked ASIC chip at several frequencies between 100MHz and 300MHz. At each frequency point, the voltage was scaled down to the minimum required for correct operation. 8

Power Consumption (mw) 100 90 80 70 60 50 40 30 20 10 0 CCLKV^2f : clock network itself (CCLK = 112pF) P(CLK.R) *P(CLK.R) includes the power consumption of clock generator and Flip-Flops. 100 150 200 250 300 350 Operating Frequency (MHz) Figure 9: Measured power consumption resonant-clocked ASIC. The inductor and DC supplies were connected externally. For reference, we plot a quadratic curve fit to the function evaluated at each of the voltage, frequency pairs. This curve represents the dissipation required to drive the same clock capacitance if charge recovery techniques were not used. At =300MHz, the clock was overdriven using an inductance value larger than 1/C, resulting in suboptimal power dissipation at that frequency. At 205MHz, the measured clock power dissipation was 4.5mW, about 5 times less than required to drive the same clock capacitance with conventional means. These dramatic power savings are due to operation near the resonance of the inductor in conjunction with the clock-capacitance. 0 Measured Clock Spectrum 10 20 relative magnitude (db) 30 40 50 60 70 0 100 200 300 400 500 600 700 800 900 1000 frequency (MHz) Figure 10: Measured power-clock spectrum at 200MHz. In addition to reduced power dissipation, charge recovery circuitry has the potential to operate with substantially reduced electromagnetic interference. To provide empirical evidence in support of this largely unexplored fact, we analyzed the spectrum of the measured 9

Skew Detector 2 Skew Detector 1 Skew Detector 3 Figure 11: Microphotograph of GHz resonant clock distribution test chip. power-clock waveform when resonating at 200MHz. The spectrum obtained is shown in Figure 10, zoomed in on the region of interest from 0 to 1GHz. This data was obtained by recording 100,000 voltage samples at 100ps/sample at the off-chip inductor terminal. Assuming linear characteristics from the parasitic elements between the inductor terminal and the on-chip clock network, this data should be proportional to the actual clock signal on-chip. The graph shows the presence of substantially attenuated odd and even harmonics. Specifically, the first 3 harmonics are 22dB, 36dB, and 43dB below the fundamental, respectively. In contrast, the first harmonic of a square waveform at 600MHz is about 12dB below the fundamental. The spur at roughly 10MHz could be attributed to a periodicity in the datapath self-test activity, as it corresponds roughly to the spectrum of one of the selftest signature outputs. An alternate hypothesis is that it results from some coupling with one of the I/O pads slewing. 2.3 GHz-class resonant clocking Resonant clock distribution has the potential to reduce clock power and achieve low clock skew and jitter [3]. In this project, a two-phase resonant clock network with a programmable driver and loading has been designed and evaluated. This network uses a clock generator that is driven at a reference clock frequency. It also uses the size and duty cycle of the replenishing switches in the clock generator to adjust clock amplitude. Programmable loading allows for different balanced/imbalanced load configurations, enabling the investigation of clock amplitude, power, and skew at resonance and off resonance for operating frequencies in the 900MHz to 1.2GHz range. Included is on-chip circuitry for measuring skew and clock amplitude. Figure 2.3 gives a microphotograph of our resonant-clocked test chip. 4nH spiral inductors connected in parallel are placed symmetrically around the center of the H-tree clock network. A single central H-bridge clock generator is used to compensate for power losses and maintain clock amplitude using switches of programmable size that are driven by re- 10

power dissipation (mw ) 110 100 90 80 power dissipation 1.16V 1.13V 1.10V 1.07V 1.04V 1.01V relative power savings 1.16V 1.13V 1.10V 45 1.07V 1.04V 1.01V 40 35 30 relative power savings (% ) 70 25 60 20 850 900 950 1000 1050 1100 1150 1200 frequency (MHz) resonance Figure 12: Measured total power dissipation and efficiency vs. frequency. amplitude power dissipation clock amplitude (V) 1.1 1.08 1.06 1.04 1.02 1 w810d30 w630d44 w720d30 w580d44 14% w810d30 w630d44 w720d30 w580d44 95 90 85 80 75 power dissipation (mw) 0.98 70 0.96 850 900 950 1000 1050 1100 1150 1200 frequency (MHz) resonance Figure 13: Measured power dissipation and clock amplitude vs. frequency. plenishing pulses of programmable duty cycle. Figure 2.3 gives measured total power dissipation as a function of operating frequency. All data are obtained with a total capacitance of approximately 52pF per phase, yielding a resonant frequency of 990MHz. For each data point, switch size and duty cycle have been chosen to yield minimum power dissipation at the corresponding operating frequency while maintaining the same average amplitude over all 16 leaf nodes in the network. The curves report average results for 4 test chips. The maximum difference in measured amplitude and power among the 4 chips is less than 6%. Power dissipation is minimized when the system is driven at its resonant frequency. Maximum relative power savings of the resonant clock system over conventional are in the 45% range. Figure 2.3 gives measured clock amplitude and power dissipation as functions of frequency for four switch size/duty cycle configurations (average from 4 test chips). The results show that configurations with larger switch size and smaller replenishing duty cycle can dissipate less power, while maintaining the same amplitude. When driving frequency is 10% off resonance at 1.1GHz, power dissipation increases by 3(720 m, 30than the (580 m, 44(630 m, 44results. In general, larger switches reduce resistance in the clock generator and increase clock amplitude. Smaller duty cycle reduces current from Vdd to Vss, hence lowering power dissipation. 11

dummy bit line real bit line pair BLD BLT BLF cell Co Co Co wr DD wr & di DT DF wr & di power clock tree sense amp dout Figure 14: CLM column with dummy bit-line and charge recovery. 2.4 Charge-recovery SRAM The application of energy recovery to memory design is particularly compelling, due to the substantial switching capacitance in memory arrays. Early work on energy recovery memories has used multiple power-clocks [15, 9, 2, 8, 10, 16], resulting in designs with multiple-cycle latency and limited scalability. Low-complexity energy recovery memories with a single-phase power-clock have been reported in [7, 6]. These memories exhibit data/operation-dependent variations in the capacitive load presented to the power-clock, however, resulting in limited energy efficiency due to poor resonance. In this project, we have explored CLM, an energy recovery SRAM architecture that presents a constant capacitive load to the power-clock, regardless of memory operations or data access patterns [5]. In CLM, non-selective precharge is used to ensure a constant memory load during write operations, regardless of data pattern. Furthermore, when bit lines are disconnected from the power-clock during nonwrite cycles, dummy bit lines of equal capacitance are connected to the power-clock, maintaining a constant memory load. CLM provides single-cycle operations using a single-phase power-clock for low complexity and efficient high-speed operation. A schematic diagram of a bit-line in the proposed constant-load charge-recovery memory is shown in Figure 14. In this design, each bit-line pair BLT and BLF is shadowed by a dummy bit-line. Each bit-line is selectively connected to the column memory cells and a sense amplifier. The dummy bit-line has no connection with the column cells or the sense amplifier, however. During each write cycle, exactly one of the drivrs turns on, transferring charge between the system inductor and exactly one of the bit-lines BLT or BLF, respectively. During each non-write cycle, the driver DD turns on, connecting the dummy bit-line with the power-clock. The dummy bit-line is designed so that it presents approximately the same load on the power-clock as an actual bit-line. Thus, the load of the power-clock remains constant during write and non-write cycles, maintaining a constant amplitude and maximizing energy efficiency. 12

To assess the performance of our proposed energy recovery SRAM architecture, we have designed a 128 256 SRAM using a 250nm bulk silicon process. In Spice simulations with a 2.5V supply, CLM functions correctly at clock frequencies up to 400MHz. Using an ideal power-clock waveform, CLM achieves power reductions in excess of 37% over its conventional counterpart with a 42/58 write/non-write operation mix. Assuming lossless power-clock generation, the proposed SRAM dissipates 38% less power than its conventional counterpart at 400MHz, 2.5V. When the power dissipation of the power-clock generator is taken into account, overall power savings are 27%. 3 List of publications and technical reports 3.1 Papers published in peer-reviewed journals V. S. Sathe, J.-Y. Chueh, and M. C. Papaefthymiou. Energy-efficient GHz-class charge-recovery logic. IEEE Journal of Solid State Circuits, Vol. 42, No. 1, January 2007. X. Liu, Y. Peng, and M. C. Papaefthymiou. Practical repeater insertion for low power: What repeater library do we need? IEEE Transactions on Computer-Aided Design of Integrated Circuits, Vol. 25, No. 5, pp. 917-924, May 2006. X. Liu and M. C. Papaefthymiou. HyPE: Hybrid power estimation for IP-based systems-on-chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits, Vol. 24, No. 7, pp. 1089 1103, July 2005. S. Kim, C. Ziesler, and M. C. Papaefthymiou. Charge-recovery computing on silicon. IEEE Transactions on Computers, Vol. 54, No. 6, June 2005. X. Liu and M. C. Papaefthymiou. A Markov chain sequence generator for power macromodeling. IEEE Transactions on Computer-Aided Design of Integrated Circuits, Vol. 23, No. 7, pp. 1048 1062, July 2004. J. Kim and M. C. Papaefthymiou. Block-based multi-period refresh for energy efficient dynamic memory. IEEE Transactions on VLSI Systems, Vol. 11, No. 6, pp. 1006 1018, December 2003. X. Liu and M. C. Papaefthymiou. Design of a 20 Mb/s 256-state Viterbi decoder. IEEE Transactions on VLSI Systems, Vol. 11, No. 6, pp. 965 975, December 2003. S. Kim, C. Ziesler, and M. C. Papaefthymiou. Fine-grain real-time reconfigurable pipelining. IBM Journal of Research and Development, Vol. 47, No. 5/6, pp. 599 609, September/November 2003. 13

3.2 Papers published in non-peer-reviewed journals or in conference proceedings J.-Y. Chueh, V. Sathe, and M. C. Papaefthymiou. 900MHz to 1.2GHz two-phase resonant clock network with programmable driver and loading. In IEEE 2006 Custom Integrated Circuits Conference, September 2006. V. Sathe, J.-Y. Chueh, and M. C. Papaefthymiou. A 1.1GHz charge recovery logic. In International Solid-State Circuits Conference, February 2006. V. Sathe, C. Ziesler, and M. C. Papaefthymiou. GHz-class charge recovery logic. In International Symposium on Low-Power Electronics and Design, August 2005. V. Sathe, J.-Y. Chueh, J. Kim, C. Ziesler, S. Kim, and M. C. Papaefthymiou. Fast, efficient, recovering, and irreversible. In 1st Workshop on Reversible Computing of the 2005 ACM Computing Frontiers Conference, May 2005. J.-Y. Chueh, V. Sathe, and M. C. Papaefthymiou. Two-phase resonant clock distribution. In Proceedings of the 2005 IEEE International Symposium on VLSI, May 2005. V. Sathe, M. C. Papaefthymiou, and C. Ziesler. Boost Logic: A high-speed energy recovery circuit family. In Proceedings of the 2005 IEEE International Symposium on VLSI, May 2005. X. Liu, Y. Peng, and M. C. Papaefthymiou. RIP: An efficient hybrid repater insertion scheme for low power. In Proceedings of the 2005 Conference on Design, Automation, and Test in Europe, March 2005. V. S. Sathe, C. H. Ziesler, M. C. Papaefthymiou, S. Kim, and S. Kosonocky. A synchronous interface for SoCs with multiple voltage and clock domains. In 17th IEEE International SOC Conference, September 2004. D. Velenis, E. G. Friedman, and M. C. Papaefthymiou. Clock tree layout design for reduced delay uncertainty. In 17th IEEE International SOC Conference, September 2004. J. Kim and M. C. Papaefthymiou. Constant-load energy recovery memory for efficient high-speed operation. In International Symposium on Low-Power Electronics and Design, August 2004. X. Liu, Y. Peng, and M. C. Papaefthymiou. Practical repeater insertion for low power: What repeater library do we need? In Proceedings of the 41st ACM/IEEE Design Automation Conference, June 2004. J.-Y. Chueh, C. Ziesler, and M. C. Papaefthymiou. Empirical evaluation of timing and power in resonant clock distribution. In 2004 IEEE International Symposium on Circuits and Systems, May 2004. 14

J. Kim, M. C. Papaefthymiou, and A. Tayyab. An algorithm for geometric load balancing with two constraints. In Proceedings of the 18th International Parallel and Distributed Processing Symposium, April 2004. J.-Y. Chueh, C. Ziesler, and M. C. Papaefthymiou. Experimental evaluation of resonant clock distribution. In Proceedings of the 2004 IEEE International Symposium on VLSI, February 2004. C. Ziesler, J. Kim, V. Sathe, and M. C. Papaefthymiou. A 225MHz resonant clocked ASIC chip. In International Symposium on Low Power Electronics and Design, August 2003. 3.3 Papers presented at meetings but not published in conference proceedings N/A 3.4 Manuscripts submitted but not published V. Sathe, J. Kao, and M. C. Papaefthymiou. A 1.2GHz resonant-clocked 8-bit 14-tap FIR filter. 3.5 Technical reports submitted to ARO N/A 4 List of all participating research personnel Prof. Marios C. Papaefthymiou, PI Promoted to Professor of Electrical Engineering and Computer Science, effective September 2005. Director, Advanced Computer Architecture Laboratory, University of Michigan, September 2000 present. Conrad Ziesler, Graduate Research Assistant. Ph.D., April 2004. Ziesler declined an Assistant Professor position at the University of Minnesota to join a West Coast startup. He is currently with PA Semi, Inc. Joohee Kim, Graduate Research Assistant. Ph.D., August 2004. Kim assumed a position with Hynix Corporation. Juang-Ying Chueh, Graduate Research Assistant. Ph.D., April 2006. Chueh assumed a position with AMD Corporation. Visvesh Sathe, Graduate Research Assistant. Sujay Phadke, Graduate Research Assistant. 15

5 Report of inventions The following inventions were disclosed to the University of Michigan. Patent applications were filed as indicated. These inventions have been exclusively licensed by Cyclos Semiconductor, a venture-backed startup that commercializes ultra-low-power semiconductor technologies. Clock distribution network architecture for resonant clocked systems. File No. 3531, University of Michigan Technology Transfer Office, October 2006. US Provisional Patent Application, December 2006. Energy-recovering low-swing low-activity data bus. File No. 2929, University of Michigan Technology Transfer Office, August 2004. Automatic synchronization of resonant and legacy clock domains. File No. 2928, University of Michigan Technology Transfer Office, August 2004. Dual-frequency resonant clocking. File No. 2927, University of Michigan Technology Transfer Office, August 2004. Dynamic frequency and voltage scaling of resonant clocks. File No. 2926, University of Michigan Technology Transfer Office, August 2004. Energy recovery boost logic. File No. 2833, University of Michigan Technology Transfer Office, March 2004. US Provisional Patent Application, June 2004. US Patent Application, June 2005. Automatic tuning system for resonant clock generator. File No. 2803, University of Michigan Technology Transfer Office, February 2004. US Provisional Patent Application, June 2004. Low power flip-flop with gate enable and scan chain enable. File No. 2802, University of Michigan Technology Transfer Office, June 2004. US Provisional Patent Application, October 2005. US Patent Application entitled Ramped clock digital storage control was filed October 2006. 16

References [1] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and Y. Chou. Low-power digital systems based on adiabatic-switching principles. IEEE Transactions on VLSI Systems, 2(4):398 406, December 1994. [2] S. Avery and M. Jabri. A three-port adiabatic register file suitable for embedded applications. In International Symposium on Low Power Electronics and Design, pages 288 292. IEEE, August 1998. [3] S. C. Chan, P. J. Restle, K. L. Shepard, N. K. James, and R. L. French. A 4.6ghz resonant global clock distribution network. In IEEE International Solid-State Circuits Conference, pages 342 343, February 2004. [4] J. S. Denker. A review of adiabatic computing. In Proc. of the 1994 Symposium on Low Power Electronics/Digest of Technical Papers, pages 94 97, October 1994. [5] J. Kim and M. C. Papaefthymiou. Constant-load energy recovery memory for efficient high-speed operation. In Proc. of International Symposium on Low Power Electronics and Design, August 2004. [6] J. Kim and C. H. Ziesler. Fixed-load energy recovery memory for low power. In Proc. IEEE International Symposium on VLSI, February 2004. [7] J. Kim, C. H. Ziesler, and M. C. Papaefthymiou. Energy recovering static memory. In Proc. of International Symposium on Low Power Electronics and Design, August 2002. [8] J.H. Kwon, J. Lim, and S.I. Chae. A three-port nrerl register file for ultra-lowenergy applications. In International Symposium on Low Power Electronics and Design, pages 161 166. IEEE, August 2000. [9] Y. Moon and D. Jeong. An efficient charge recovery logic circuit. IEEE Journal of Solid-State Circuits, SC-31(4):514 522, April 1996. [10] K.W. Ng and K.T. Lau. A novel adiabatic register file design. Journal of Circuits, Systems, and Computers, 10(1):67 76, 2000. [11] V. Sathe, J.-Y. Chueh, and M. C. Papaefthymiou. A 1.1GHz charge recovery logic. In IEEE International Solid-State Circuits Conference, February 2006. [12] V. Sathe, J.-Y. Chueh, and M. C. Papaefthymiou. Energy-efficient GHz-class chargerecovery logic. Journal of Solid-State Circuits, January 2007. [13] V. Sathe, M. C. Papaefthymiou, and C. Ziesler. Boost logic: A high-speed energyrecovery circuit family. In Proc. of IEEE International Symposium on VLSI, May 2005. 17

[14] V. Sathe, M. C. Papaefthymiou, and C. Ziesler. A ghz-class charge recovery logic. In Proc. of International Symposium on Low-Power Electronics and Design, August 2005. [15] D. Somasekhar, Y. Ye, and K. Roy. An energy recovery static RAM memory core. In Symposium on Low Power Electronics, pages 62 63, 1995. [16] N. Tzartzanis, W.C. Athas, and L. Svensson. A low-power SRAM with resonantly powered data, address, word, and bit lines. In European Solid-State Circuits Conference, 2000. [17] C. Ziesler, J. Kim, V. Sathe, and M. C. Papaefthymiou. A 225MHz resonant clocked ASIC chip. In Proc. of International Symposium on Low-Power Electronics and Design, August 2003. 18