EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 17 Low-Power Design: Dynamic Body Bias Energy Recovery in CMOS SOI Announcements Midterm project reports due this Friday Please post links on your project web page Homework #3 due after the Spring break 2 1
Class Material Last lecture Dynamic voltage scaling Leakage control Today s lecture Dynamic body bias Energy recovery techniques Silicon-on-insulator 3 Dynamic Body Bias Similar concept to dynamic voltage scaling Control loop adjusts the substrate bias to meet the timing Can be used just as runtime/sleep Limited range of threshold adjustments (<100mV) Limited leakage reduction (<10x) No delay penalty Can increase speed by forward bias Energy cost of charging/discharging the substrate capacitance 4 2
Dynamic Body Bias 5 VTCMOS Variants Courtesy of IEEE Press, New York. 2000 6 3
Substrate Bias in VT Kuroda, JSSC 11/96 7 Dynamic Body Bias PMOS bias NMOS bias V HIGH PMOS bias NMOS bias V LOW............ Dual-V T core 450mV FBB 450mV FBB 500mV RBB 500mV RBB V CC PMOS body NMOS body V SS PMOS body V CC V SS NMOS body Active mode Forward body bias (FBB) Local V CC tracking Idle mode Reverse body bias (RBB) Triple well needed Tschanz, ISSCC 03 8 4
Substrate Bias Effect VTH = VT 0 γ 2φF VBS VT 0 γ1v BS 9 Body Bias Layout ALU Sleep transistor LBGs ALU core LBGs Number of ALU core LBGs Number of sleep transistor LBGs PMOS device width 30 10 13mm Area overhead 8% ALU core LBGs Sleep transistor LBGs 10 5
Leakage Power Savings vs. Decap Normalized leakage power in idle mode 1 0.8 0.6 0.4 0.2 1.32V, 75 C Low-leakage 133nF decap on virtual V CC No decap on virtual V CC 40% 90% 1.32V 75 C 0 10ns 0.01 1ms 1 10ms 100ms 10000 10ms Idle time Minimize capacitance on virtual V CC Dual-V T core Virtual V CC Overhead: charging & discharging of virtual V CC capacitance 11 Decoupling Capacitor Placement Dual-V T core Oxide leakage Dual-V T core Longer time constant Reduced leakage Decap on full supply Performance Convergence time Oxide leakage savings Decap on virtual supply 12 6
Total Active Power Savings (Fixed activity: a = 0.05) Total power power savings savings 20% 15% 10% 5% Number of consecutive active cycles (T ON ) 0.5 5 50 500 5000 50000 PMOS sleep transistor (1.32V) Body bias (1.28V): active: FBB, idle: ZBB Max 18% Max 8% Power savings for T OFF > ~100 idle cycles 0% 10 100 1000 10000 100000 1000000 Number of Number consecutive of idle idle cycles cycles (T OFF ) Reference: 450mV FBB to core with clock gating, 13 1.28V, 4.05GHz, 75 C Body Biasing Body biasing with a local control loop can be used to lower the impact of process variations Used to limit die-to-die and within-die variations 14 7
Normalized Delay vs V DD & V TH 1.8? VTH= VDD =1.0 V Normalized Delay 1.4 1.0 ±0.15V ±0.05V 1.5 V 3.0 V 5.0 V Sakurai, Kuroda 0.6 0 0.2 0.4 0.7 1 0.5 VTH (V) 15 Self-Adjusting Threshold-Voltage Scheme (SATS) 16 8
SATS Experimental Results 17 Substrate Biasing Tschanz, JSSC 11/02 18 9
Effectiveness of Substrate Bias Die-to-die variations 19 Effectiveness of Substrate Bias Within-die variations 20 10
Dynamic Voltage Scaled Microprocessor VT VS User Logic PLL TX3900 External V DD 3.3V±10% Internal V DDL 0.8V~2.9V ±5% Power Dissipation (mw) 300 200 100 Measurement Theory CMOS: V DD =3.3V VS scheme: Internal V DD =optimize 0 0 10 20 30 40 Operating Frequency (MHz) Courtesy: Prof. Kuroda 21 Adapting V DD and V TH Miyazaki, ISSCC 02 22 11
Adapting V DD and V TH Miyazaki, ISSCC 02 23 Optimal EDP Contours 24 12
Sizing, Supply, Threshold Optimization Reference Design: D ref (V dd max,v th ref ) Topology Inv Add Dec (E Lk /E Sw ) ref 0.1% 1% 10% Large variation in optimal circuit parameters V dd opt, V th opt, w opt V dd max V th max V dd min V th min Technology parameters (V max dd, V ref th ) rarely optimal 25 Adiabatic Circuits R t r C ADIABATIC CHARGING E = (RC/t r )CV 2 (for t r >> RC) Applying slow input slopes reduces E below CV 2 Useful for driving large capacitors (Buffers) Power reduction > 4 for pad drivers (1 MHz) ISI 26 13
Adiabatic Computing Basic Concepts When charging a capacitor through RC-network with a slowly changing ramp, power dissipation is reduced by reducing the slope of the ramp. No switch should ever be enabled when a voltage is over it Make sure every node is reset to the original stage before performing the next operation! reversible computing -> take energy back to the source -> ensure that state is known 27 Adiabatic Computing Principles of storing and erasing information: Energy dissipation of the combinational logic can be made arbitrarily small by operating the circuit slowly enough Information can be loaded into memory circuits dissipating only arbitrarily small energy Information can be copied with arbitrarily small energy Erasing the last copy of a piece of information dissipates an irreducible finite amount of energy. Koller, Athas, PhysComp 92, Landauer, IBM J. ResDev 61 28 14
Six-Phase Charge Transfer One-bit delay Watkins, JSSC 12/67 29 Split-Level Charge Recovery Younis, Knight, IWLPD 94 30 15
Adiabatic Circuits A B C φ 0 φ 1 φ 2 Holding the inputs for the each stage until the output energy has been returned from Athas 31 Reversible Pipelines Make return path different In clk Logic return logic Out C Problem: always results in CV th2 /2 loss! 32 16
Reversible Pipelines φ 1 φ 3 d in φ 0 φ 2 φ i d i-1 d i φ 0 hold reset φ 1 d i+1 retrun φ 2 φ 3 33 Partially Erasable Latches Pck V 0 M 1 M 2 F 1 F 1 Output stays at V th M 3 M 4 Stored energy is ½ CV th2, vs. ½ CV 2 How to use this? 34 17
Partially Erasable Latches Pck M 1 M 2 F 1 F 1 Pck Pck 1 Pck Pck 1 F0 M 5 M 3 M 4 M 6 F0 Requires 4-clocks for interfacing Denker, ISLPED 94 35 Single Pck + Auxiliary Clock Pck M 1 M 2 F 1 2. 4µ 2. 4µ F 1 12. µ 12. µ CX M 5 M 6 CX 2. 4µ 12. µ 2. 4µ 12. µ F 0 2. 4µ 12. µ M 7 M 3 M 4 2. 4µ 2. 4µ 12. µ 12. µ M 8 2. 4µ 12. µ F0 Pck CX F0 0 1 Maksimovic et al, ISLPED 97 F1 1 1 0 0 A B 36 18
Clock Generation L Pck Principle V DD 2 Clk Q R C Logic Implementation L V G 0 Pck 1 f c Clk F 0 F 1 F 1 F2 F n 1 V B Enable Ck 720µ 12. µ Q F 0 F 0 Stage 1 F 1 F 1 Stage 2 F2 F 2 Stage n F n F n Clk 2 Q Q CX CX 37 Single Pck + Reference Voltages Cascading Gates: Kim, Papaefthymiou, ISLPED 98 38 19
Adiabatic mp Athas, et al, JSSC 12/97 Athas, et al, JSSC 11/00 39 Adiabatic mp 40 20
E-R Latch W/ dynamic logic W/ PTL 41 Other Ideas Charge recycling bus H. Yamauchi, et al, JSSC, 4/95 Adiabatic display driver J. Ammer, ISSCC 99 Various examples of charge-recycling logic 42 21
Silicon on Insulator (SOI) References: Chapter 5 by Shahidi, Assaderaghi, Antoniadis K. Bernstein, N.J. Rohrer, SOI Circuit Design Concepts, Kluwer 2000. K. Bernstein, ISSCC 00 SOI Tutorial 2001 Microprocessor Design Workshop, lectures by C.T Chuang and R. Preston Articles from Chandrakasan/Brodersen, IEEE Press 1998. 43 SOI Transistor Bernstein, ISSCC 00 44 22
SOI Devices Partially depleted (PD) Pros: Easier to manufacture (Si thickness) Scalable, tolerance to variations Decoupling V T from Si thickness Cons: Floating body effects: I-V kink, parasitic bipolar effect Fully depleted (FD) Pros: Significantly reduced floating body effects Sharper subthreshold S Cons: V T is a function of the charge in the body varies Manufacturability, compatibility with bulk CMOS 45 Soi Microprocessors Comp. Processor Freq. Technology Comment Source IBM 64b Power4 1.10 GHz PD/SOI 0.08 um Leff, 7LM Cu ISSCC 01 IBM 64b Power4 (Test Chip) 1.00 GHz PD/SOI 0.08 um Leff, 7LM Cu IEDM 99 IBM 64b Power4 1.00 GHz PD/SOI 0.08 um Leff, 7LM Cu Hot Chip 99 EE Times 99 IBM 64b PowerPC 660 MHz PD/SOI 0.08 um Leff, 7LM Cu Migration from 0.12 um Leff ISSCC 00 IBM 64b PowerPC 550 MHz PD/SOI 0.12 um Leff, 6LM Cu Bulk 450 MHz 0.12 um Leff ISSCC 99 IBM 32b PowerPC (PowerPC750) 580 MHz PD/SOI 0.12 um Leff, 6LM Cu Bulk 480 MHz 0.12 um Leff ISSCC 99 Samsung 64b DEC Alpha 600 MHz FD/SOI 0.25 um Lgate, 4LM Al Bulk 433 MHz 0.35 um Lgate ISSCC 99 DEC StrongArm-110 (Core Only) 230 MHz (Tester Limit) PD/SOI 0.35 um 20% Perf. Over Bulk IEDM 97 From C.T. Chuang 46 23
SOI Design Advantages: Less Capacitance (~5-40%) Lower power Reduced effective V T, short channel effects, body effect Layout simplicity (no wells, plugs, ) Disadvantages: History-dependent timing Increased device leakage Body effect issues Self heating Decoupling capacitance 47 SOI Timing Issues µs constants ps constants Courtesy of IEEE Press, New York. 2000 48 24
Floating Body Effects in PD SOI Neither S or D junction biased body floats. Effects: Threshold Variability Kink in output characteristics PD FD 49 Floating Body Effects in PD SOI (cont d) Parasitic Bipolar Transistor 50 25
SOI Circuit Considerations Static circuits history-dependent delay First switch (often slowest) vs. second switch (often fastest) Bernstein, ISSCC 00 51 Initial State Initial input at low nfet V B determined by back-to-back diodes pfet V B at V DD initially pfet V B before input falling transition determined by capacitive coupling Initial input at high nfet V B at GND initially nfet V B before input rising transition determined by capacitive coupling pfet V B determined by back-to-back diodes 52 26
First Two Transitions Courtesy of IEEE Press, New York. 2000 53 History-Dependent Delay Bernstein, ISSCC 00 54 27
History-Dependent Delay 55 History-Dependent Delay Convergence to steady state Noise Margins! 56 28
Dynamic Circuits in SOI Dynamic History Bipolar effect Less charge sharing 57 Parasitic Bipolar Dynamic lookahead adder Cumulative Effect of Parasitic Bipolar Current and Propagated Noise Cause Data Corruption after 3rd Stage in The Chain Parasitic Bipolar Current VDD VDD T7 T8 T9 T10 XC0 XPCH PCH C0 T5 T6 xci ci Propagated Noise from ND2 Previous Stage T1 T2 T3 gz gp gg ND1 T0 CLK VDD GND C.T. Chuang (M. Canada et al., ISSCC, 1999) 58 29
Pre-discharging Nodes Intermediate nodes discharged to prevent parasitic bipolar effect Bulk Design SOI Design CLK CLK X X A0 B0 Y OUT A0 B0 Y OUT A1 B1 A1 B1 C.T. Chuang (D. H. Allen et al., ISSCC, 1999) 59 Dynamic Circuit Techniques Conditional Feedback CLK Setup Inputs during Precharge A B FB_L OUT Pre-discharge Intermediate Node Cross-connected Inputs (Stack swizzling) E F F E CLK Re-order Pulldown Tree C.T. Chuang (D. H. Allen et al., ISSCC, 1999) 60 30
Pass-Transistor Logic in SOI Inverter: Keeper: 61 PTL in SOI (Assaderaghi 94): DTMOS Example: V T = 0.4V at 0 V; 0.17V at 0.5 V 62 31
DTMOS Body and gate tied together 63 DTMOS ID vs. VDS for normal and DT operation 64 32
DTMOS Subthreshold currents for SOI NMOS and PMOS transistors with bodies grounded vs. DTMOS 65 0.5V SOI Pass-Gate Logic CPL Buffer type A Gate-body connection (GBC) Buffer type B Input-body connection (IBC) Fuse, et al, ISSCC 96 66 33
0.5V SOI Pass-Gate Logic 67 34