Low-Power VLSI Seong-Ook Jung 2011. 5. 6. sjung@yonsei.ac.kr VLSI SYSTEM LAB, YONSEI University School of Electrical l & Electronic Engineering i
Contents 1. Introduction 2. Power classification 3. Power performance relationship 4. Low power design 1. Architecture and algorithm level 2. Block and logic level 3. Circuit level 4. Device level 5. OMAP processor 6. Summary 2 YONSEI Univ.
Introduction ti
Technology Scaling Technology scaling : Moore s law The number of transistors t that t can be placed on an integrated t circuit it has doubled approximately every 18 months [1] Microprocessor Hall of Fame, Intel, 2004 4 YONSEI Univ.
Development Trend Scaling (More Moore) More devices are integrated in a chip New scaling road map Not only geometrical scaling for 2D device, but also equivalent scaling for 3D device Beyond bulk CMOS FinFET, SOI Functional diversification ifi (More than Moore) Several functions are merged in a chip 5 YONSEI Univ. [2] ITRS (International Technology Roadmap for Semiconductors) 2009
SoC Performance SoC performance : exponentially increase!! Thanks to both device technology and design methodology [2] ITRS 2009 6 YONSEI Univ.
SoC Power Consumption Problem SoC power consumption : also severely increase After 15 years, x10 power is required [2] ITRS 2009 7 YONSEI Univ.
SoC Power Density Problem Power density : exponentially increase!! Power consumption per die area (W/cm 2 ) We would soon reach power densities of nuclear power plants or rocket nozzles in a few years!! [1] Microprocessor Hall of Fame, Intel, 2004 8 YONSEI Univ.
Process Variation Problem Process variation : Result of scaling Global variation and local variation Global variation Comes from fabrication, lot, wafer processes Different process corner (NMOS-PMOS : SS/SF/TT/FS/FF) Local variation Truly random variation between device with identical layout [3] Synopsis, 2005 9 [4] http://cnx.org YONSEI Univ.
Process Variation Problem Performance variation due to process variation Frequency difference 30% Leakage current difference x20 Process variation should be considered in SoC design [5] A. Devgan, Berkeley 10 YONSEI Univ.
Effect of the Process Variation Low Voltage / Low Power limitation I D W/L*(V DD -V TH ) α V TH variation I D variation Performance Variation!! Need more design margin due to process variation V DD Yield limitation Because of process variation, failure probability Yield 11 YONSEI Univ.
저전력 SoC Low power VLSI design!!! Low process variation (high yield) design [1] ITRS Roadmap 2009 12 YONSEI Univ.
Power Classification
Power Classification Power consumption of CMOS circuits P total = P dynamic + P static P dynamic = P sw + P sc 14 YONSEI Univ.
Switching Power I=C L dv/dt=c L ΔVf P sw =IV DD =C L ΔV V DD f In digital circuit, ΔV=V DD P sw =IV DD =C L V DD2 f P sw is due to the charge and discharge (output t transition) of the capacitors driven by the circuit according to input transition. P sw = C L V DD2 f 15 YONSEI Univ.
Short Circuit Power P sc is caused by the simultaneous conductance of PMOS and NMOS during input and output transitions. P sc = (β/12)(v DD -2V TH ) 3 (t 3 -t 1 ) 16 YONSEI Univ.
Static Power : P sub, P gate & P junc P sub Sub-V TH leakage : V GS < V TH P sub Exp[(V GS -V TH )/mv T ] V DD P gate Ideal MOSFET : I gate = 0 In short channel MOSFET, I gate exists because of thin T OX P WL (V /T ) 2 gate GS OX) V DD P junc Reverse PN junction leakage P junc Exp[V D /v T -1] V DD 17 YONSEI Univ. [6] K.M.Cao, BSIM4 Gate Leakage Model Including Source-Drain Partition, IEDM, 2000
Power Performance Relationship
V DD Reduction Power consumption equation P sw = C L V DD2 f P sc = (β/12) (V DD -2V TH ) 3 (t 3 -t 1 ) P sub Exp[(V GS -V TH )/mv T ]V DD P gate WL (V GS /T OX ) 2 V DD P junc Exp[V D/v T -1] V DD Case.1 : V DD All power consumption However Delay C L V DD /I D C L V DD /(V DD -V TH ) α If V DD, Delay Performance loss 19 YONSEI Univ.
V DD Scaling Limitation Low V DD limitation with process variation V DD.min = V T0 +Kσ(V T ) σ(v T ) : 1-sigma of V T variation T N 0.25 ox A (LW) -0.5 Significant increment of σ(v T ) with technology scaling (LW ) V DD scaling meets the limitation!! Process variation tolerant circuit design technique is required!! 20 YONSEI Univ. [7] K.Itoh, Adaptive Circuits for the 0.5-V Nanoscale CMOS Era, ISSCC, 2009
High V TH Power consumption equation P sw = C L V DD2 f P sc = (β/12) (V DD -2V TH ) 3 (t 3 -t 1 ) P sub Exp[(V GS -V TH )/mv T ]V DD P gate WL (V GS /T OX ) 2 V DD P junc Exp[V D/v T -1] V DD Case.2 : V TH P sc and especially, P sub However Delay C L V DD /I D C L V DD /(V DD -V TH ) α If V TH, Delay Performance e o a loss 21 YONSEI Univ.
Low Frequency Power consumption equation P sw = C L V DD2 f P sc = (β/12) (V DD -2V TH ) 3 (t 3 -t 1 ) P sub Exp[(V GS -V TH )/mv T ]V DD P gate WL (V GS /T OX ) 2 V DD P junc Exp[V D/v T -1] V DD Case.3 : f P sw However Throughput f Performance loss 22 YONSEI Univ.
Tradeoff Tradeoff between low power and high h performance Low power design : - power reduction without performance degradation 23 YONSEI Univ.
Low power design
Low Power Design Methodology To make low power SoC Architecture and algorithm levels Parallelism, Pipeline Block and logic levels V DD / Frequency scheduling by monitoring workload (AVFS) Temperature management to reduce leakage current Circuit level Circuit type (Dynamic, static, ) Circuit technique (Dual V DD, Dual V TH, MTCMOS, Device level Control the process parameter Halo doping, retrograde well Low leakage new device SOI, FinFET 25 YONSEI Univ.
Architecture and Algorithm Levels
Parallelism P P ref P par ref < A simple adder comparator DP > C ref C V 2 ref f f ref V P par C par V 2 par f par 2 2 par par par 1 par ( N ) 2 2 Cref fref Vref N Vref V < Parallel implementation> N: # of parallelism : a slight increase in capacitance due to the extra routing 27 YONSEI Univ. [8] A.P. Chandrakasan, Minimizing power consumption in digital CMOS circuits, Proc. of IEEE,995
Pipeline < A simple adder comparator DP > < Pipeline implementation> P P C 2 2 ref C refv ref f ref pipe pipe pipe pipe P pipe ref C C pipe ref f f pipe ref V V 2 pipe 2 ref V (1 ) V 2 pipe 2 ref P V f N: # of pipeline stage : a slight increase in capacitance due to the extra latch 28 YONSEI Univ. [8] A.P. Chandrakasan, Minimizing power consumption in digital CMOS circuits, Proc. of IEEE,995
Circuit it Level
Circuit Level Low Power Techniques Low power techniques Multiple channel length Stacked transistor Dual V DD Dual V TH MTCMOS (Multi Threshold voltage CMOS) DVS (Dynamic Voltage Scaling) : open-loop / closed loop 30 YONSEI Univ.
Critical Path Critical Path : The worst case delay path Determines SoC s maximum performance # of critical path << # of non-critical path Fast non-critical path is just wasteful By increasing non-critical path s delay, we may achieve power reduction because of tradeoff relation between power & performance 31 YONSEI Univ.
Multiple Channel Length Threshold voltage roll-off Longer L Higher Vt Low leakage with low performance Used in non-critical path 32 YONSEI Univ.
Stacked Transistor V M level V M >0d due to leakage current. Negative V GS_MN1 Positive V SB_MN1 Increase in V TH by body effect P e ( V V ) V mvt sub V dd Large reduction in I sub gs th Primary input vector control to utilize the stack effect in the standby mode 33 YONSEI Univ.
Dual V DD Basic idea V DDL Logic gates off the critical path V DDH Logic gate on the critical path Reduce power without degrading the performance Shaded : VDDL Non-shaded: VDDH 34 YONSEI Univ.
Dual V DD : Design Issue & Target Issue Static ti current flow at a V DDH gate if it is directly drive by a V DDL gate Level converter is needed Overhead of area and power V SG >0 Static Current Design target For a give circuit, choose gates for V DDL application to minimize power consumption while maintaining performance with consider level converter. 35 YONSEI Univ.
Dual V TH Voltages HVt LVt Assigned to transistors in noncritical path. Leakage saving in both standby and active modes Assigned to transistors in critical path Maintained performance 36 YONSEI Univ.
MTCMOS : Basic MTCMOS : Multiple Threshold voltage CMOS Low power & low Energy E ToT = E STD + E ACT = P static * t STD + P dynamic * t ACT Portable device : t STD >> t ACT Basic circuit scheme Two different Vt HVt (0.5~0.6V) LVt (0.2~0.3V) Two operating mode Active Standby 37 YONSEI Univ.
MTCMOS : Scheme Active mode SL=1 / SL=0 V DDV V DD / V GNDV V GND LVt operating frequency Standby mode SL=0 / SL=1 V DDV & V GNDV = floating HVt leakage 38 YONSEI Univ.
MTCMOS : Constraint Performance constraint according to Normalized foot/head switch size : W H /W L Normalized cap on VDDV/VGNDV : C V /C O Area penalty Relatively small because Head/Footswitches are shared by all logic gates on a chip p(g (global foot switch) 39 YONSEI Univ.
DVFS : Basic Concept Basic concept P dynamic = CV DD2 f V DD and frequency scaling simultaneously V DD scaling A best way to get low P dynamic because P dynamic V DD 2 Frequency scaling Operating frequency = throughput Not all task requires maximum throughput By controlling the frequency, SoC improves energy efficiency 40 YONSEI Univ. [10] T.Burd, A Dynamic Voltage Scaled Microprocessor System, JSSC, 2000
DVFS : Open loop VS. Closed Loop Open loop system Can not adapt to PVT variations Need more design margin Example Enhanced SpeedStep technology of Intel Closed loop system Can adapt to PVT variations Need less design margin Example Intelligent Energy Management technology of ARM SmartReflex2 of TI OMAP processor [11] Enhanced Speed Step technology, Intel 41 YONSEI Univ.
DVFS (SONY, PDA) Block Diagram Closed loop system 42 YONSEI Univ. [12] M.Nakai, Dynamic Voltage and Frequency Management for a Low-Power Embedded Microprocessor, JSSC, 2005
Delay Synthesizer Structure Composed not only a simple transistor delay factor, but also wire delay and rise/fall delay Gate delay component : one of nominal gate length and another of long gate length RC delay component : wires from each of the four metal layers and its total length is 14mm 43 YONSEI Univ.
Delay Synthesizer Effect 44 YONSEI Univ.
Operation (DVC+DFC) Operation procedure Low High : The main logic clock frequency is changed after the DVC confirms the voltage has increased enough High Low : Both the DVC reference clock and the system clock are changed simultaneously 45 YONSEI Univ.
Device Level
Device Level Low Power Technique FinFET FinFET : Vertical structure Planar MOSFET width = FinFET height σ(v T ) T ox N A 0.25 (LW) -0.5 As scaling goes on, variation of planar MOSFET get worse V DD scaling is impossible However, FinFET s σ(v T ) doesn t degraded FinFET width doesn t occupy the active area As scaling goes on, L*W of FinFET can be maintained V DD scaling is possible low power!! 47 YONSEI Univ. [7] K.Itoh, Adaptive Circuits for the 0.5-V Nanoscale CMOS Era, ISSCC, 2009
OMAP Processor
OMAP Processor OMAP Processor Dual core platform Multimedia hardware accelerators for video and graphics Frame buffers Various dedicated and general purpose interfaces Power saving mode Idle (Clock stopped) Retention for low leakage Fast re-start and power-off mode Power gating technique ISSCC05, 138-139 49 YONSEI Univ.
Power Domains 5 power domains Processor core 1 Processor core 2 Hardware accelerator (Graphic) Always on Rest of the chip (including the interconnects and various peripherals) ISSCC05, 138-139 50 YONSEI Univ.
Power Gating Power gating Global mesh built with the highest metal layer distributes power and ground across the chip Local mesh is broken to reflect the power domain partitioning Power switch makes connection between global l mesh and local l mesh according to operating modes and switch control If power domain is on, its power switches connect its local plane to the global plane., i.e., the constant power supply Otherwise that plane drifts to a potential near ground Power switch Embedded in power domains by placing power switches at a regular pitch in a staggered manner by placing power switches around hard Ips Header switch 90um PMOS with 200uA current driving capability at worst case Multiple fingers and redundant vias ISSCC05, 138-139 51 YONSEI Univ.
Embedded Power Domains Other power management cells Retention ti flip-flops Constantly powered buffers to transport critical signals through a power domain potentially off Isolation cells to prevent the propagation of a non-state ISSCC05, 138-139 52 YONSEI Univ.
Power Switching Control Current surges and dynamic IR drop Two-pass turn-on mechanism Weak PMOS to sinks low current for power restore: Turn-on first Strong PMOS to deliver current for normal operation: Turn-on next ISSCC05, 138-139 53 YONSEI Univ.
Current Surge and Power Restore ISSCC05, 138-139 54 YONSEI Univ.
Leakage Current Reduction In off mode Leakage current comes from power switches and power management cells 4 power switches per Kgate ~40X leakage reduction ISSCC05, 138-139 55 YONSEI Univ.
SRAM Retention Footer and header diodes In active mode, the diodes are bypassed During retention mode, one diode is enabled and Field across the array is reduced Reverse body bias Leakage saving (x2) ISSCC05, 138-139 56 YONSEI Univ.
Dual Gate Length Dual gate length Standby mode: 30% leakage reduction Active mode: active leakage current saving: very useful if many blocks are idle in active mode Vdd scaling during the slow active mode 300mV scaling: 2X leakage reduction ISSCC05, 138-139 57 YONSEI Univ.
Summary
Summary Green SoC design Low power & process variation tolerant SoC design P = P sw + P sc + P sub + P gate + P junc P dynamic P static Power and performance : Trade-off Low power design Architecture and algorithm level : parallelism, pipe line Block and logic level : workload monitoring, V DD /frequency scheduling Circuit level Long channel : Reduce I leak by using V TH roll off (V TH ) Stacked MOSFET : Reduce I leak by using body effect (V TH ) & negative V GS Dual V DD : Use low V DD at non-critical path Dual V TH : Use low V TH at non-critical path MTCMOS: Use high V TH sleep TR (low leakage in stand-by mode) & low V TH logic (high TH performance in active mode) DVFS : Reduce dynamic power by controlling both V DD & frequency Device level : FinFET TH 59 YONSEI Univ.