Energy-Recovery CMOS Design Jay Moon, Bill Athas * Univ of Southern California * Apple Computer, Inc. jsmoon@usc.edu / athas@apple.com March 05, 2001 UCLA EE215B jsmoon@usc.edu / athas@apple.com 1
Outline Motivation Review of CMOS switching energetics Adiabatic charging Energy-Recovery CMOS Stepwise charging Clock-powered logic (CPL) Harmonic resonant charging Future Research UCLA EE215B jsmoon@usc.edu / athas@apple.com 2
Motivation high-performance & low-power computing It s becoming increasingly difficult to get rid of the heat generated by VLSI chips Battery life for portables UCLA EE215B jsmoon@usc.edu / athas@apple.com 3
Types of power dissipation Dynamic power dissipation Charging and discharging capacitances Short-circuit current Static power dissipation Sub-threshold currents Drain-junction leakage UCLA EE215B jsmoon@usc.edu / athas@apple.com 4
Capacitor energy equations Suppose at time t, a charge q is transferred from one plate to the other The potential v is q/c For a charge transfer increment of dq, the additional work is : q de = vdq = dq C For the total charge transfer Q : Q q 1 E = de = dq = 0 C 2 Q E = = CV 1 CV 2 2 Q C 2 UCLA EE215B jsmoon@usc.edu / athas@apple.com 5
CMOS switching energetics Interestingly (and thankfully) CMOS energetics can be analyzed and understood from the CMOS inverter. Charge is conserved Energy is conserved Neglect leakage current Neglect short-circuit current E PS =VQ=CV 2 PS V 0 0 C V UCLA EE215B jsmoon@usc.edu / athas@apple.com 6
The charging event E PS =VQ=CV 2 E HEAT =(1/2)CV 2 PS V 0 0 C V Power supply delivers a charge packet of size Q=CV E PS = CV V = CV 2 E C = (1/2)CV 2 E PS E C = (1/2)CV 2 = E HEAT This much energy is dissipated in the pfet UCLA EE215B jsmoon@usc.edu / athas@apple.com 7
The discharging event PS V V 0 E HEAT C 0 E PS =0Q=0 Power supply gets the charge at potential 0 E PS = 0 The energy on the capacitor goes from (1/2)CV 2 to 0 E C 0 = (1/2)CV 2 = E HEAT This much energy is dissipated in the nfet All of the charge is returned to the PS at potential 0 UCLA EE215B jsmoon@usc.edu / athas@apple.com 8
Complex gates and pass logic V V PS 0 0 C Circuit topology does not change energetics It s about the potential of the charge Not where the charge goes UCLA EE215B jsmoon@usc.edu / athas@apple.com 9
Power supply perspectives Inject charge at the highest allowed voltage V DD Recover returned charge at the lowest allowed voltage 0 Simple scheme of shorting capacitors to V DD or ground through switches Maximally wasteful from an energy conservation standpoint UCLA EE215B jsmoon@usc.edu / athas@apple.com 10
Power equation (1/2)CV 2 is dissipated to charge the capacitor (1/2)CV 2 is dissipated to discharge the capacitor CV 2 is dissipated per charge/discharge cycle If we cycle the capacitor F times per second : P = F CV 2 Power is the rate at which work is done Note that if you need to cycle a capacitor N times from a battery, doesn t matter if you do it fast or slow. The battery is just as dead either way UCLA EE215B jsmoon@usc.edu / athas@apple.com 11
Voltage scaling Energy decreases quadratically with the voltage E ~ V 2 DD Delay increases as the voltage reduces τ ~ V DD /(V DD -V TH ) 2 τ 3.3V / τ 2.0V = 0.3 E 3.3V / E 2.0V = 2.7 (assuming Vth = 1V) UCLA EE215B jsmoon@usc.edu / athas@apple.com 12
Voltage scaling effects PowerMill TM simulations of a 16-bit uprocessor UCLA EE215B jsmoon@usc.edu / athas@apple.com 13
Energy vs. Cycle time UCLA EE215B jsmoon@usc.edu / athas@apple.com 14
Adiabatic charging Charging from a variable-voltage source (e.g. linear ramp) V 0 T R C Assuming that R is the on-resistance of the switch, the dissipation for charging or discharging C is: E = (RC/T) CV 2 when T >> RC Energy can be traded for delay by increasing the charge transport time Model the FETs as simple resistors (R up and R dn ) UCLA EE215B jsmoon@usc.edu / athas@apple.com 15
Adiabatic-charging principle Conventional digital CMOS Adiabatic charging R up ξ(rc/t)cv 2 R up C V DD R dn C T R dn ξ(rc/t)cv 2 C E cycle = CV 2 E cycle = 2ξ(RC/T)CV 2 UCLA EE215B jsmoon@usc.edu / athas@apple.com 16
Energy-Recovery CMOS energy source energy-efficient clock driver clock-powered chip Exploit the on-chip capacitances of CMOS VLSI to reduce power dissipation below the conventional limit (FCV 2 ) using adiabatic charging and energy-recovery This research includes: Clock-energy recovery techniques Clock-powered logic balanced power versus speed Stepwise charging (charging recycling) technique for Low-power VLSI pin drivers LCD panels Harmonic resonant charging technique for Clock signal for conventional chip UCLA EE215B jsmoon@usc.edu / athas@apple.com 17
Stepwise charging V (N-1)V/N V C T 0 V/N charging steps C T C The load C is switched from 0 to V and vice-versa through N steps CT should be roughly 10 times larger than C Only one supply voltage is required Intermediate step voltages converge after a few cycles Dissipation for charging or discharging C is: E = (1/2)(CV 2 )/N The overhead for controlling the FETs needs to be considered UCLA EE215B jsmoon@usc.edu / athas@apple.com 18
2-Stepwise Driver in in d_in d_in t V/2 t p p C T n C L n UCLA EE215B jsmoon@usc.edu / athas@apple.com 19
2-Stepwise Driver in d_in t p V/2 C T (3) t p (1) n (2) (4) C L n Event 1 : 1/2C(V/2) 2 stored, 1/2C(V/2) 2 dissipated Event 2 : 1/2C(V/2) 2 added, 1/2C(V/2) 2 dissipated Event 3 : 1/2C(V/2) 2 recovered, 1/2C(V/2) 2 dissipated Event 4 : 1/2C(V/2) 2 dissipated Total dissipation : 1/2C(V/2) 2 * 4 = 1/2CV 2 UCLA EE215B jsmoon@usc.edu / athas@apple.com 20
Clock-powered logic Exploits adiabatic charging to reduce dissipation Uses clocks as global time-varying voltage sources The challenge is to use the clock to drive data nodes clock line 0 1 0 UCLA EE215B jsmoon@usc.edu / athas@apple.com 21
Clock-Powered logic design Need an efficient clock driver Innovate in the design of clock-steering logic Use conventional precharged, pass-transistor, static logic Use the clock-steering logic for high-capacitance nodes UCLA EE215B jsmoon@usc.edu / athas@apple.com 22
Resonant clock driver V dc off-chip inductor power pulse on-chip capacitive load Build-up energy in inductor Transfer it to the load as a pulse Recover the pulsed energy in the inductor Repeat the process UCLA EE215B jsmoon@usc.edu / athas@apple.com 23
The all-resonant clock driver a.k.a blip driver L V dc L ϕ 1 ϕ 2 Cϕ Cϕ Self-oscillating driver generates almost non-overlapping clock pulses Highly efficient because of all-resonant gate drive Trade-off between frequency stability and power efficiency UCLA EE215B jsmoon@usc.edu / athas@apple.com 24
Clocked buffers gate to channel capacitance used for bootstrapping ϕ 1 ϕ 2 V iso ϕ 1 ϕ 2 D in isolation transistor V bn clock-pass transistor V bn pull-down clamp transistor for noise immunity Clock-pass transistor is critical for speed and power performance Bootstrapping yields high conductance per gate capacitance Clock voltage swing can be decoupled from the logic voltage swing. Hot clocks : clock swings above supply UCLA EE215B jsmoon@usc.edu / athas@apple.com 25
Clocked buffers ϕ 1 ϕ 2 V iso 1 1 0 1 1 clock-pass transistor 0 V bn 0 ϕ 1 ϕ 2 V iso 1 1 0 1 clock-pass transistor 1 1+A 0 A V bn 0 A UCLA EE215B jsmoon@usc.edu / athas@apple.com 26
Clock-powered logic Eliminate pfets and complements of clocks (smaller circuits, simpler clock requirements) Precharge transistors are hot-clocked nfets Pass gates in latches are hot-clocked nfets Move more capacitive loads to the clock-powered paths Pass-transistor logic (e.g. in muxes) powered by clocks (not shown) ϕ 1 ϕ 2 V iso V iso C p ER latch ER latch precharged logic block UCLA EE215B jsmoon@usc.edu / athas@apple.com 27
The AC-1 processor experiment Objectives Design and implement low-power processor based on clock-powered logic and blip driver Evaluate significance of blip driver for low-power operation Compare clock-powered processor to conventional, static CMOS alternative Approach Select 16-bit ISA Design five-stage pipelined microarchitecture Use energy-recovery latches to inject and retract energy at large capacitive loads Design logic and latches using mostly-nmos circuit styles Include both conventional and blip drivers (for evaluation purposes) Desing a implementation of the same ISA using purely conventional static-cmos techniques UCLA EE215B jsmoon@usc.edu / athas@apple.com 28
AC-1 microarchitecture to PC_B 1 + E G + G C A B from I_B PLA control RF C A B ALU A from D_B to A_B B F RISC ISA (Bunda 93) 16-bit data 16-bit instructions 16 registers Conventional 5-stage pipeline Integer operations only (no multiply or divide) ϕ 2 fromir 3 fromir 0 fromir 0 1 fromir RD0 RD1 ϕ 1 ϕ 2 A B to D_B WRL ϕ 1 ϕ 2 ϕ 1 ϕ 2 ϕ 1 H UCLA EE215B jsmoon@usc.edu / athas@apple.com 29
AC-1 processor Clock-powered logic Resonant clock driver 16-bit data & instructions 16 registers 0.5um n-well CMOS 5-stage pipeline ~13K transistors UCLA EE215B jsmoon@usc.edu / athas@apple.com 30
AC-1c : a conventional processor Same target process Cascade library cells 30k transistors 5.5um 2 Uses gated clocks to reduce power dissipation Important differences Custom vs library cells Optimizations Clock gating in AC-1c (40%) UCLA EE215B jsmoon@usc.edu / athas@apple.com 31
Processor core summary AC-1 First generation clock-powered processor Mostly nmos logic style Hot clocks Custom layout AC-1c First generation conventional processor Static CMOS Cascade Epoch standard-cell library ACPL Second-generation clock-powered processor Static CMOS Low-swing clocks Custom low-power fixed-cell library Cascade Epoch for place and route DC-1 Second-generation conventional processor Static CMOS Single-phase clocking Custom low-power fixed-cell library Cascade Epoch for place and route UCLA EE215B jsmoon@usc.edu / athas@apple.com 32
Processor comparison 1.4 1.2 1 AC-1, no energy recovery AC-1/c ACPL, no energy recovery DC-1 AC-1, 6.5x energy recovery ACPL, 6.5x energy recovery mw/mhz 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 Frequency (MHz) UCLA EE215B jsmoon@usc.edu / athas@apple.com 33
Resonant clock drivers C small? controller C big resonant clock driver clock-powered chip The difficulty with clock-powered logic is in the clock driver Resonant circuits offer the highest efficiency Low-power techniques that minimize the switched capacitance in real time do not work well with resonant clock drivers The clocks will vary in phase, amplitude, and pulse width Stabilizing the clock load maximizes the capacitive load It s an open research topic UCLA EE215B jsmoon@usc.edu / athas@apple.com 34
Harmonic resonant charging Sinusoids Easy and efficient to generate Low overhead Hard to work with, very undigital Staircase Simple to generate and control High overhead Positive-going only Blips Advantages of the sinusoids Can be complementary Positive-going only Harmonic resonant driver We thought this would be hard (practically) Now think it is highly doable UCLA EE215B jsmoon@usc.edu / athas@apple.com 35
Harmonic resonator design UCLA EE215B jsmoon@usc.edu / athas@apple.com 36
Harmonic resonator results 2 nd Harmonic Resonator 85% Energy efficiency 10% slew rate of total cycle time 4 th Harmonic Resonator 80% Energy efficiency 6% slew rate of total cycle time UCLA EE215B jsmoon@usc.edu / athas@apple.com 37
Harmonic resonator result As R becomes smaller, slew rate decreases while power increases UCLA EE215B jsmoon@usc.edu / athas@apple.com 38
Harmonic resonator result Frequency of output signal doesn t change for 30% variation of load capacitance while energy efficiency suffers UCLA EE215B jsmoon@usc.edu / athas@apple.com 39
Future research Clock-powered logic and blip driver has been developed as a practical way of exploiting adiabatic charging for CMOS microprocessor How about Digital signal processor? Where power goes in DSP? Bus transaction vs. computation Energy-recovery SRAM, DRAM, SAM Capacitance variance is minimal because bitlines are dual Driving clock network using harmonic resonator UCLA EE215B jsmoon@usc.edu / athas@apple.com 40
References ACMOS Homepage (still alive) http://www.isi.edu/acmos For online paper archive http://www.isi.edu/acmos/acmospapers.html Books Rabaey, Pedram Ed. Low Power Design Methodology Chandrakasan, Brodersen Ed. Low Power CMOS Design Most recent paper is published in JSSC, Nov. 2000 pp1561-1570 UCLA EE215B jsmoon@usc.edu / athas@apple.com 41