Energy-Recovery CMOS Design

Similar documents
Clock-Powered CMOS: A Hybrid Adiabatic Logic Style for Energy-Efficient Computing

AC-1: A Clock-Powered Microprocessor

Retractile Clock-Powered Logic

Energy Recovery for the Design of High-Speed, Low-Power Static RAMs

Chapter 3 DESIGN OF ADIABATIC CIRCUIT. 3.1 Introduction

Improved Two Phase Clocked Adiabatic Static CMOS Logic Circuit

A Three-Port Adiabatic Register File Suitable for Embedded Applications

A design of 16-bit adiabatic Microprocessor core

CMOS VLSI Design (A3425)

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

Memory Basics. historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities

Contents 1 Introduction 2 MOS Fabrication Technology

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

Low-Power Digital CMOS Design: A Survey

Low Power Design for Systems on a Chip. Tutorial Outline

Adiabatic Logic Circuits for Low Power, High Speed Applications

Implementation of Power Clock Generation Method for Pass-Transistor Adiabatic Logic 4:1 MUX

LOW POWER VLSI TECHNIQUES FOR PORTABLE DEVICES Sandeep Singh 1, Neeraj Gupta 2, Rashmi Gupta 2

Jan Rabaey, «Low Powere Design Essentials," Springer tml

P high-performance and portable applications. Methods for

Comparative Analysis of Adiabatic Logic Techniques

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab. Low Power Design Techniques II 2

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

EECS150 - Digital Design Lecture 19 CMOS Implementation Technologies. Recap and Outline

Announcements. Advanced Digital Integrated Circuits. Quiz #3 today Homework #4 posted This lecture until 4pm

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

Low Power Adiabatic Logic Design

LOW POWER CMOS CELL STRUCTURES BASED ON ADIABATIC SWITCHING

Lecture 17 Low-Power Design: Dynamic Body Bias Energy Recovery in CMOS SOI. Midterm project reports due this Friday

EE E6930 Advanced Digital Integrated Circuits. Spring, 2002 Lecture 7. Clocked and self-resetting logic I

Power-Area trade-off for Different CMOS Design Technologies

Comparison of adiabatic and Conventional CMOS

EEC 118 Lecture #12: Dynamic Logic

Performance Analysis of Energy Efficient and Charge Recovery Adiabatic Techniques for Low Power Design

EE 330 Lecture 43. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

Design of Low-Power High-Performance 2-4 and 4-16 Mixed-Logic Line Decoders

EECS 427 Lecture 22: Low and Multiple-Vdd Design

Electronic Circuits EE359A

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

An energy efficient full adder cell for low voltage

DESIGN OF ADIABATIC LOGIC BASED COMPARATOR FOR LOW POWER AND HIGH SPEED APPLICATIONS

Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

EE 330 Lecture 43. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

Low Power Parallel Prefix Adder Design Using Two Phase Adiabatic Logic

19. Design for Low Power

Power Efficient adder Cell For Low Power Bio MedicalDevices

Power Spring /7/05 L11 Power 1

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Performance Evaluation of Digital CMOS Circuits Using Complementary Pass Transistor Network

RECENT technology trends have lead to an increase in

Implementation of Low Power Inverter using Adiabatic Logic

Design and Analysis of Energy Recovery Logic for Low Power Circuit Design

EE 330 Lecture 42. Other Logic Styles Digital Building Blocks

A new 6-T multiplexer based full-adder for low power and leakage current optimization

Design and Analysis of Energy Efficient MOS Digital Library Cell Based on Charge Recovery Logic

Domino Static Gates Final Design Report

Lecture 3 Switched-Capacitor Circuits Trevor Caldwell

Lecture 16. Complementary metal oxide semiconductor (CMOS) CMOS 1-1

Digital Microelectronic Circuits ( ) CMOS Digital Logic. Lecture 6: Presented by: Adam Teman

12-nm Novel Topologies of LPHP: Low-Power High- Performance 2 4 and 4 16 Mixed-Logic Line Decoders

PERFORMANCE EVALUATION OF SELECTED QUASI-ADIABATIC LOGIC STYLES

IMPLEMENTATION OF ADIABATIC DYNAMIC LOGIC IN BIT FULL ADDER

Noise Tolerance Dynamic CMOS Logic Design with Current Mirror Circuit

CMOS VLSI Design (A3425)

Design of 32-bit ALU using Low Power Energy Efficient Full Adder Circuits

EFFICIENT LOW POWER DYNAMIC COMPARATOR FOR HIGH SPEED ADC s

DESIGN AND ANALYSIS OF LOW POWER 10- TRANSISTOR FULL ADDERS USING NOVEL X-NOR GATES

Design and Analysis of f2g Gate using Adiabatic Technique

Digital Design and System Implementation. Overview of Physical Implementations

Low Power 8-Bit ALU Design Using Full Adder and Multiplexer

A Survey of the Low Power Design Techniques at the Circuit Level

ECE1352. Term Paper Low Voltage Phase-Locked Loop Design Technique

International Journal of Engineering Trends and Technology (IJETT) Volume 45 Number 5 - March 2017

Adiabatic Logic Circuits: A Retrospect

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Negative high voltage DC-DC converter using a New Cross-coupled Structure

Design of Low Power Vlsi Circuits Using Cascode Logic Style

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

II. Previous Work. III. New 8T Adder Design

Operational Amplifiers

Chapter 6 Combinational CMOS Circuit and Logic Design. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Design of Multiplier using Low Power CMOS Technology

PHYSICAL STRUCTURE OF CMOS INTEGRATED CIRCUITS. Dr. Mohammed M. Farag

LR8509 Series 1.5MHz 600mA Synchronous Step-Down Converter

Design of Energy Efficient Arithmetic Circuits Using Charge Recovery Adiabatic Logic

Design & Analysis of Low Power Full Adder

A Variable-Frequency Parallel I/O Interface with Adaptive Power Supply Regulation

Towards An Efficient Low Frequency Energy Recovery Dynamic Logic

DESIGN OF A NOVEL CURRENT MIRROR BASED DIFFERENTIAL AMPLIFIER DESIGN WITH LATCH NETWORK. Thota Keerthi* 1, Ch. Anil Kumar 2

Lecture 19: Design for Skew

Design of Low power and Area Efficient 8-bit ALU using GDI Full Adder and Multiplexer

DESIGN AND SIMULATION OF A HIGH PERFORMANCE CMOS VOLTAGE DOUBLERS USING CHARGE REUSE TECHNIQUE

Novel Buffer Design for Low Power and Less Delay in 45nm and 90nm Technology

CS250 VLSI Systems Design. Lecture 3: Physical Realities: Beneath the Digital Abstraction, Part 1: Timing

Chapter 13: Introduction to Switched- Capacitor Circuits

Energy Efficient Design of Logic Circuits Using Adiabatic Process

Performance Analysis of Different Adiabatic Logic Families

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques

INF4420 Switched capacitor circuits Outline

Transcription:

Energy-Recovery CMOS Design Jay Moon, Bill Athas * Univ of Southern California * Apple Computer, Inc. jsmoon@usc.edu / athas@apple.com March 05, 2001 UCLA EE215B jsmoon@usc.edu / athas@apple.com 1

Outline Motivation Review of CMOS switching energetics Adiabatic charging Energy-Recovery CMOS Stepwise charging Clock-powered logic (CPL) Harmonic resonant charging Future Research UCLA EE215B jsmoon@usc.edu / athas@apple.com 2

Motivation high-performance & low-power computing It s becoming increasingly difficult to get rid of the heat generated by VLSI chips Battery life for portables UCLA EE215B jsmoon@usc.edu / athas@apple.com 3

Types of power dissipation Dynamic power dissipation Charging and discharging capacitances Short-circuit current Static power dissipation Sub-threshold currents Drain-junction leakage UCLA EE215B jsmoon@usc.edu / athas@apple.com 4

Capacitor energy equations Suppose at time t, a charge q is transferred from one plate to the other The potential v is q/c For a charge transfer increment of dq, the additional work is : q de = vdq = dq C For the total charge transfer Q : Q q 1 E = de = dq = 0 C 2 Q E = = CV 1 CV 2 2 Q C 2 UCLA EE215B jsmoon@usc.edu / athas@apple.com 5

CMOS switching energetics Interestingly (and thankfully) CMOS energetics can be analyzed and understood from the CMOS inverter. Charge is conserved Energy is conserved Neglect leakage current Neglect short-circuit current E PS =VQ=CV 2 PS V 0 0 C V UCLA EE215B jsmoon@usc.edu / athas@apple.com 6

The charging event E PS =VQ=CV 2 E HEAT =(1/2)CV 2 PS V 0 0 C V Power supply delivers a charge packet of size Q=CV E PS = CV V = CV 2 E C = (1/2)CV 2 E PS E C = (1/2)CV 2 = E HEAT This much energy is dissipated in the pfet UCLA EE215B jsmoon@usc.edu / athas@apple.com 7

The discharging event PS V V 0 E HEAT C 0 E PS =0Q=0 Power supply gets the charge at potential 0 E PS = 0 The energy on the capacitor goes from (1/2)CV 2 to 0 E C 0 = (1/2)CV 2 = E HEAT This much energy is dissipated in the nfet All of the charge is returned to the PS at potential 0 UCLA EE215B jsmoon@usc.edu / athas@apple.com 8

Complex gates and pass logic V V PS 0 0 C Circuit topology does not change energetics It s about the potential of the charge Not where the charge goes UCLA EE215B jsmoon@usc.edu / athas@apple.com 9

Power supply perspectives Inject charge at the highest allowed voltage V DD Recover returned charge at the lowest allowed voltage 0 Simple scheme of shorting capacitors to V DD or ground through switches Maximally wasteful from an energy conservation standpoint UCLA EE215B jsmoon@usc.edu / athas@apple.com 10

Power equation (1/2)CV 2 is dissipated to charge the capacitor (1/2)CV 2 is dissipated to discharge the capacitor CV 2 is dissipated per charge/discharge cycle If we cycle the capacitor F times per second : P = F CV 2 Power is the rate at which work is done Note that if you need to cycle a capacitor N times from a battery, doesn t matter if you do it fast or slow. The battery is just as dead either way UCLA EE215B jsmoon@usc.edu / athas@apple.com 11

Voltage scaling Energy decreases quadratically with the voltage E ~ V 2 DD Delay increases as the voltage reduces τ ~ V DD /(V DD -V TH ) 2 τ 3.3V / τ 2.0V = 0.3 E 3.3V / E 2.0V = 2.7 (assuming Vth = 1V) UCLA EE215B jsmoon@usc.edu / athas@apple.com 12

Voltage scaling effects PowerMill TM simulations of a 16-bit uprocessor UCLA EE215B jsmoon@usc.edu / athas@apple.com 13

Energy vs. Cycle time UCLA EE215B jsmoon@usc.edu / athas@apple.com 14

Adiabatic charging Charging from a variable-voltage source (e.g. linear ramp) V 0 T R C Assuming that R is the on-resistance of the switch, the dissipation for charging or discharging C is: E = (RC/T) CV 2 when T >> RC Energy can be traded for delay by increasing the charge transport time Model the FETs as simple resistors (R up and R dn ) UCLA EE215B jsmoon@usc.edu / athas@apple.com 15

Adiabatic-charging principle Conventional digital CMOS Adiabatic charging R up ξ(rc/t)cv 2 R up C V DD R dn C T R dn ξ(rc/t)cv 2 C E cycle = CV 2 E cycle = 2ξ(RC/T)CV 2 UCLA EE215B jsmoon@usc.edu / athas@apple.com 16

Energy-Recovery CMOS energy source energy-efficient clock driver clock-powered chip Exploit the on-chip capacitances of CMOS VLSI to reduce power dissipation below the conventional limit (FCV 2 ) using adiabatic charging and energy-recovery This research includes: Clock-energy recovery techniques Clock-powered logic balanced power versus speed Stepwise charging (charging recycling) technique for Low-power VLSI pin drivers LCD panels Harmonic resonant charging technique for Clock signal for conventional chip UCLA EE215B jsmoon@usc.edu / athas@apple.com 17

Stepwise charging V (N-1)V/N V C T 0 V/N charging steps C T C The load C is switched from 0 to V and vice-versa through N steps CT should be roughly 10 times larger than C Only one supply voltage is required Intermediate step voltages converge after a few cycles Dissipation for charging or discharging C is: E = (1/2)(CV 2 )/N The overhead for controlling the FETs needs to be considered UCLA EE215B jsmoon@usc.edu / athas@apple.com 18

2-Stepwise Driver in in d_in d_in t V/2 t p p C T n C L n UCLA EE215B jsmoon@usc.edu / athas@apple.com 19

2-Stepwise Driver in d_in t p V/2 C T (3) t p (1) n (2) (4) C L n Event 1 : 1/2C(V/2) 2 stored, 1/2C(V/2) 2 dissipated Event 2 : 1/2C(V/2) 2 added, 1/2C(V/2) 2 dissipated Event 3 : 1/2C(V/2) 2 recovered, 1/2C(V/2) 2 dissipated Event 4 : 1/2C(V/2) 2 dissipated Total dissipation : 1/2C(V/2) 2 * 4 = 1/2CV 2 UCLA EE215B jsmoon@usc.edu / athas@apple.com 20

Clock-powered logic Exploits adiabatic charging to reduce dissipation Uses clocks as global time-varying voltage sources The challenge is to use the clock to drive data nodes clock line 0 1 0 UCLA EE215B jsmoon@usc.edu / athas@apple.com 21

Clock-Powered logic design Need an efficient clock driver Innovate in the design of clock-steering logic Use conventional precharged, pass-transistor, static logic Use the clock-steering logic for high-capacitance nodes UCLA EE215B jsmoon@usc.edu / athas@apple.com 22

Resonant clock driver V dc off-chip inductor power pulse on-chip capacitive load Build-up energy in inductor Transfer it to the load as a pulse Recover the pulsed energy in the inductor Repeat the process UCLA EE215B jsmoon@usc.edu / athas@apple.com 23

The all-resonant clock driver a.k.a blip driver L V dc L ϕ 1 ϕ 2 Cϕ Cϕ Self-oscillating driver generates almost non-overlapping clock pulses Highly efficient because of all-resonant gate drive Trade-off between frequency stability and power efficiency UCLA EE215B jsmoon@usc.edu / athas@apple.com 24

Clocked buffers gate to channel capacitance used for bootstrapping ϕ 1 ϕ 2 V iso ϕ 1 ϕ 2 D in isolation transistor V bn clock-pass transistor V bn pull-down clamp transistor for noise immunity Clock-pass transistor is critical for speed and power performance Bootstrapping yields high conductance per gate capacitance Clock voltage swing can be decoupled from the logic voltage swing. Hot clocks : clock swings above supply UCLA EE215B jsmoon@usc.edu / athas@apple.com 25

Clocked buffers ϕ 1 ϕ 2 V iso 1 1 0 1 1 clock-pass transistor 0 V bn 0 ϕ 1 ϕ 2 V iso 1 1 0 1 clock-pass transistor 1 1+A 0 A V bn 0 A UCLA EE215B jsmoon@usc.edu / athas@apple.com 26

Clock-powered logic Eliminate pfets and complements of clocks (smaller circuits, simpler clock requirements) Precharge transistors are hot-clocked nfets Pass gates in latches are hot-clocked nfets Move more capacitive loads to the clock-powered paths Pass-transistor logic (e.g. in muxes) powered by clocks (not shown) ϕ 1 ϕ 2 V iso V iso C p ER latch ER latch precharged logic block UCLA EE215B jsmoon@usc.edu / athas@apple.com 27

The AC-1 processor experiment Objectives Design and implement low-power processor based on clock-powered logic and blip driver Evaluate significance of blip driver for low-power operation Compare clock-powered processor to conventional, static CMOS alternative Approach Select 16-bit ISA Design five-stage pipelined microarchitecture Use energy-recovery latches to inject and retract energy at large capacitive loads Design logic and latches using mostly-nmos circuit styles Include both conventional and blip drivers (for evaluation purposes) Desing a implementation of the same ISA using purely conventional static-cmos techniques UCLA EE215B jsmoon@usc.edu / athas@apple.com 28

AC-1 microarchitecture to PC_B 1 + E G + G C A B from I_B PLA control RF C A B ALU A from D_B to A_B B F RISC ISA (Bunda 93) 16-bit data 16-bit instructions 16 registers Conventional 5-stage pipeline Integer operations only (no multiply or divide) ϕ 2 fromir 3 fromir 0 fromir 0 1 fromir RD0 RD1 ϕ 1 ϕ 2 A B to D_B WRL ϕ 1 ϕ 2 ϕ 1 ϕ 2 ϕ 1 H UCLA EE215B jsmoon@usc.edu / athas@apple.com 29

AC-1 processor Clock-powered logic Resonant clock driver 16-bit data & instructions 16 registers 0.5um n-well CMOS 5-stage pipeline ~13K transistors UCLA EE215B jsmoon@usc.edu / athas@apple.com 30

AC-1c : a conventional processor Same target process Cascade library cells 30k transistors 5.5um 2 Uses gated clocks to reduce power dissipation Important differences Custom vs library cells Optimizations Clock gating in AC-1c (40%) UCLA EE215B jsmoon@usc.edu / athas@apple.com 31

Processor core summary AC-1 First generation clock-powered processor Mostly nmos logic style Hot clocks Custom layout AC-1c First generation conventional processor Static CMOS Cascade Epoch standard-cell library ACPL Second-generation clock-powered processor Static CMOS Low-swing clocks Custom low-power fixed-cell library Cascade Epoch for place and route DC-1 Second-generation conventional processor Static CMOS Single-phase clocking Custom low-power fixed-cell library Cascade Epoch for place and route UCLA EE215B jsmoon@usc.edu / athas@apple.com 32

Processor comparison 1.4 1.2 1 AC-1, no energy recovery AC-1/c ACPL, no energy recovery DC-1 AC-1, 6.5x energy recovery ACPL, 6.5x energy recovery mw/mhz 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 Frequency (MHz) UCLA EE215B jsmoon@usc.edu / athas@apple.com 33

Resonant clock drivers C small? controller C big resonant clock driver clock-powered chip The difficulty with clock-powered logic is in the clock driver Resonant circuits offer the highest efficiency Low-power techniques that minimize the switched capacitance in real time do not work well with resonant clock drivers The clocks will vary in phase, amplitude, and pulse width Stabilizing the clock load maximizes the capacitive load It s an open research topic UCLA EE215B jsmoon@usc.edu / athas@apple.com 34

Harmonic resonant charging Sinusoids Easy and efficient to generate Low overhead Hard to work with, very undigital Staircase Simple to generate and control High overhead Positive-going only Blips Advantages of the sinusoids Can be complementary Positive-going only Harmonic resonant driver We thought this would be hard (practically) Now think it is highly doable UCLA EE215B jsmoon@usc.edu / athas@apple.com 35

Harmonic resonator design UCLA EE215B jsmoon@usc.edu / athas@apple.com 36

Harmonic resonator results 2 nd Harmonic Resonator 85% Energy efficiency 10% slew rate of total cycle time 4 th Harmonic Resonator 80% Energy efficiency 6% slew rate of total cycle time UCLA EE215B jsmoon@usc.edu / athas@apple.com 37

Harmonic resonator result As R becomes smaller, slew rate decreases while power increases UCLA EE215B jsmoon@usc.edu / athas@apple.com 38

Harmonic resonator result Frequency of output signal doesn t change for 30% variation of load capacitance while energy efficiency suffers UCLA EE215B jsmoon@usc.edu / athas@apple.com 39

Future research Clock-powered logic and blip driver has been developed as a practical way of exploiting adiabatic charging for CMOS microprocessor How about Digital signal processor? Where power goes in DSP? Bus transaction vs. computation Energy-recovery SRAM, DRAM, SAM Capacitance variance is minimal because bitlines are dual Driving clock network using harmonic resonator UCLA EE215B jsmoon@usc.edu / athas@apple.com 40

References ACMOS Homepage (still alive) http://www.isi.edu/acmos For online paper archive http://www.isi.edu/acmos/acmospapers.html Books Rabaey, Pedram Ed. Low Power Design Methodology Chandrakasan, Brodersen Ed. Low Power CMOS Design Most recent paper is published in JSSC, Nov. 2000 pp1561-1570 UCLA EE215B jsmoon@usc.edu / athas@apple.com 41