Low-Power VLSI Seong-Ook Jung 2013. 5. 27. sjung@yonsei.ac.kr VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering Contents 1. Introduction 2. Power classification & Power performance relationship 3. Low power design 1. Architecture and algorithm level 2. Circuit level 3. Device level 4. NTV operation 4. Summary 2 YONSEI Univ. 1
Introduction Technology Scaling Technology Scaling : Moore s law The number of transistors that can be placed on an integrated circuit has doubled approximately every 18 months [1] http://en.wikipedia.org 4 YONSEI Univ. 2
Development Trend Scaling (More Moore) More devices are integrated in a chip New scaling road map Not only geometrical scaling for 2D device, but also equivalent scaling for 3D device Beyond bulk CMOS FinFET, SOI Functional Diversification (More than Moore) Several functions are merged in a chip 5 YONSEI Univ. [2] ITRS (International Technology Roadmap for Semiconductors) 2011 SoC Performance SoC Performance : Exponentially Increases!! Thanks to both device technology and design methodology 6 YONSEI Univ. [2] ITRS (International Technology Roadmap for Semiconductors) 2011 3
SoC Power Consumption Problem SoC Power Consumption : Also Severely Increases After 15 years, about x10 power is required 7 YONSEI Univ. [2] ITRS (International Technology Roadmap for Semiconductors) 2011 Process Variation Problem Process Variation : Result of Scaling Global variation and local variation Global variation Comes from fabrication, lot, wafer processes Different process corner (NMOS-PMOS : SS/SF/TT/FS/FF) Local variation Truly random variation between device with identical layout 8 YONSEI Univ. [3] Synopsis, 2005 [4] http://cnx.org 4
Process Variation Problem Performance Variation due to Process Variation Frequency difference 30% Leakage current difference x20 Process variation should be considered in SoC design [5] A. Devgan, Leakage Issues in IC Design: Part 3, IBM 9 YONSEI Univ. Effect of the Process Variation Limitation for Low Voltage / Low Power Operation I D W/L*(V DD -V TH ) α V TH variation I D variation Performance Variation!! Need more design margin due to process variation V DD Yield Limitation Process variation failure probability Yield 10 YONSEI Univ. 5
Importance of Low Power VLSI Design Low power VLSI design!!! Process variation tolerant design 11 YONSEI Univ. [2] ITRS (International Technology Roadmap for Semiconductors) 2011 State-of-the-arts Low Power VLSI in Commercialized Product [6] http://www.techinsights.com 12 YONSEI Univ. 6
SPEC of State-of-the-arts Low Power VLSI HKMG & DVFS? In this class, we will study about a variety of low power design techniques [1] http://en.wikipedia.org/ 13 YONSEI Univ. Power Classification & Power Performance Relationship 7
Power Classification Power Consumption of CMOS Circuits P total = P dynamic + P leakage = P sw + P sc 15 YONSEI Univ. Switching Power : P sw I=C L dv/dt=c L Vf P sw =IV DD =C L VV DD f In digital circuit, V=V DD P sw =IV DD =C L V DD2 f P sw is due to the charge and discharge (output transition) of the capacitors driven by the circuit according to input transition. P sw = C L V DD2 f 16 YONSEI Univ. 8
Short Circuit Power : P sc When V TN < V IN < V DD V TP V IN = V TN @ t 1 V IN = V DD V TP @ t 3 P sc is caused by the simultaneous conductance of PMOS and NMOS during input and output transitions. P sc = (β/12)(v DD -2V TH ) 3 (t 3 -t 1 ) 17 YONSEI Univ. Leakage Power : P sub, P gate & P junc P sub Ideal MOSFET : I sub = 0 In short channel MOSFET, I sub exists when V GS < V TH P sub Exp[(V GS -V TH )/mv T ] V DD P gate Ideal MOSFET : I gate = 0 In short channel MOSFET, I gate exists because of thin T OX P gate WL (V GS /T OX ) 2 V DD P junc Reverse PN junction leakage P junc Exp[V D /v T -1] V DD 18 YONSEI Univ. [7] K.M.Cao, BSIM4 Gate Leakage Model Including Source-Drain Partition, IEDM, 2000 [8] http://www.altera.com/ 9
Power Vs. Performance Power consumption equation P sw = C L V DD2 f P sc = (β/12) (V DD -2V TH ) 3 (t 3 -t 1 ) P sub Exp[(V GS -V TH )/mv T ] V DD P gate WL (V GS /T OX ) 2 V DD P junc Exp[V D /v T -1] V DD Case.1 : V DD All power consumption However Delay C L V DD /I D C L V DD /(V DD -V TH ) α Thus, V DD Delay Performance loss Case.2 : V TH P sc and especially, P sub However Delay C L V DD /I D C L V DD /(V DD -V TH ) α Thus, V TH Delay Performance loss Case.3 : f P sw However Throughput f Performance loss 19 YONSEI Univ. Tradeoff w.r.t V TH Tradeoff between power and performance Low power design : Power reduction without performance degradation 20 YONSEI Univ. [9] S. Mutoh, Review of low-voltage CMOS LSI technology as a standard in the 21st century, 1998 10
Low Power Design - Architecture and Algorithm Levels Parallelism Lower V DD and frequency are used at the expense of area penalty By adopting parallelism Power ~ x 0.36 Area ~ x 3.4 22 YONSEI Univ. [10] A.P. Chandrakasan, Minimizing power consumption in digital CMOS circuits, Proc. of IEEE, 1995 11
Pipeline By inserting additional pipeline latch, logic depth of critical path is reduced and thus logic can be operated with slower rate By adopting pipeline Power ~ x 0.39 Area ~ x 1.3 23 YONSEI Univ. [10] A.P. Chandrakasan, Minimizing power consumption in digital CMOS circuits, Proc. of IEEE, 1995 Low Power Design - Circuit Level 12
Critical Path Critical Path : The Worst Case Delay Path Determines SoC s maximum performance # of critical path << # of non-critical path Fast non-critical path is just wasteful By increasing non-critical path s delay, we may achieve power reduction because of tradeoff relation between power & performance 25 YONSEI Univ. Dual V DD Basic Idea V DDL Logic gates off the critical path V DDH Logic gate on the critical path Reduce power without degrading the performance Shaded : V DDL Non-shaded: V DDH 26 YONSEI Univ. [11] K. Usami, Automated low-power technique exploiting multiple supply voltages applied to a media processor, JSSC, 1998 13
Dual V TH High-V TH Assigned to transistors in noncritical path. Leakage saving in both standby and active modes Low-V TH Assigned to transistors in critical path Maintained performance 27 YONSEI Univ. [12] J. T. Kao, Dual-Threshold Voltage Techniques for Low-Power Digital Circuits, JSSC, 2000 MTCMOS MTCMOS : Multiple Threshold voltage CMOS Basic Circuit Scheme Two different V TH High-V TH (0.5~0.6V) / Low-V TH (0.2~0.3V) Two operating mode Active / Standby Operation Active mode SL=1 / SL=0 V DDV V DD / V GNDV V GND Low-V TH operating frequency Standby mode SL=0 / SL=1 V DDV & V GNDV = floating High-V TH leakage 28 YONSEI Univ. [13] Anis, M, Dynamic and leakage power reduction in MTCMOS circuits, Proc. of IEEE, 2002 14
DVFS : Basic Concept Basic Concept P dynamic = CV DD2 f V DD and frequency scaling simultaneously V DD scaling A best way to get low P dynamic because P dynamic V DD 2 Frequency scaling Operating frequency = throughput All tasks do not require maximum throughput By controlling the frequency, SoC improves energy efficiency 29 YONSEI Univ. [14] G. Dhiman, Analysis of Dynamic Voltage Scaling for System Level Energy Management, hotpower, 2008 SONY Microprocessor DVFS Block 30 YONSEI Univ. [15] M.Nakai, Dynamic Voltage and Frequency Management for a Low-Power Embedded Microprocessor, JSSC, 2005 15
DVFS Block Diagram DVFS Closed loop system DVC : V DD control circuit DFC : Frequency control circuit 31 YONSEI Univ. Delay Synthesizer Structure Composed not only a simple transistor delay factor, but also wire delay and rise/fall delay Gate delay component : one of nominal gate length and another of long gate length RC delay component : wires from each of the four metal layers and its total length is 14mm 32 YONSEI Univ. 16
Delay Synthesizer Effect 33 YONSEI Univ. Operation (DVC+DFC) Operation Procedure Low High : The main logic clock frequency is changed after the DVC confirms the voltage has increased enough High Low : Both the DVC reference clock and the system clock are changed simultaneously 34 YONSEI Univ. 17
Performance Enhancement 35 YONSEI Univ. Low Power Design - Device Level 18
High-K & Metal Gate High-K & Metal Gate SiO 2 -> High-K material Thick oxide can be used without performance degradation Gate leakage is substantially decreased Low power & High performance!! Poly gate -> Metal gate Poly gate is not suitable to high-k material High switching voltage is required Metal gate solves the problem of poly gate Low power!! [16] http://www.automationnotebook.com 37 YONSEI Univ. FinFET Characteristics Vertical structure FinFET effective width = fin thickness + 2 fin height Scaling As scaling goes on, variation of planar MOSFET get worse. V DD scaling is impossible. 3D FinFET Fin However, FinFET s V TH variation can be reduced. Fully depleted device Superior short channel control Undoped body No random dopant fluctuation V DD scaling is possible low power!! 38 YONSEI Univ. [17] T. Chiarella, "Migrating from Planar to FinFET for Further CMOS Scaling: SOI or Bulk?, ESSDERC, 2009, 19
Low Power Design - Near-Threshold Voltage (NTV) Operation Low Voltage Operation Low Voltage Operation Near- and sub-v TH digital circuit design has been focused on low power consumption. Sub-V TH Operation Sub-V TH operation is suitable only for specific applications which do not consider performance. Near-V TH Operation Near-V TH operation is suitable for applications which use the DVFS, such as AP in cell phone. Balanced trade-off between power and performance R. G. Dreslinski, Proc. IEEE, 2010 40 YONSEI Univ. [18] R. G. Dreslinski, "Near-Threshold Computing: Reclaiming Moore s Law Through Energy Efficient Integrated Circuits, Proc. of IEEE, 2010 20
NTV Application Samsung and Intel s Products Samsung : ARM Cortex-A7 processor Intel : IA-32 processor 41 YONSEI Univ. Samsung s NTV Samsung mentioned the NTV operation in Samsung s press release on Dec. 19. 2012. Samsung Electronics Co., Ltd. announced that it reached another milestone in the development of 14nm FinFET process technology with the successful tape-out of multiple development vehicles in collaboration with its key design and IP partners. As part of its 14nm FinFET development process, Samsung, and its ecosystem partners ARM, Cadence, Mentor and Synopsys taped out multiple test chips ranging from a full ARM Cortex -A7 processor implementation to a SRAM-based chip capable of operation near threshold voltage levels as well as an array of analog IP. Samsung used Synopsys tools optimized for FinFET devices to implement additional IP on this vehicle, including low power SRAMs intended to operate with the power supply close to threshold voltage levels. The move from twodimensional transistors to three-dimensional transistors introduces several new IP and EDA tool challenges including modeling. The multi-year collaboration between Samsung and Synopsys has delivered foundational modeling technologies for 3D parasitic extraction, circuit simulation and physical designrule support of FinFET devices. [19] http://www.samsung.com/global/business/semiconductor/news-events/press-releases/detail?newsid=12461 42 YONSEI Univ. 21
Intel s NTV Intel presented 3 papers applying the NTV operation in ISSCC 2012. [20] Intel Labs at ISSCC 2012, Intel Corporation, 2012 43 YONSEI Univ. Issues of NTV operation Performance Variation In near-v TH region, the dependencies of driving current on V TH, V DD, and temperature approach exponential, which significantly increases the performance variation. 44 YONSEI Univ. [18] R. G. Dreslinski, "Near-Threshold Computing: Reclaiming Moore s Law Through Energy Efficient Integrated Circuits, Proc. of IEEE, 2010 22
Summary Summary State-of-the-arts VLSI Low power & process variation tolerant design P = P sw + P sc + P sub + P gate + P junc P dynamic P static Power and performance : Trade-off Low power design Architecture and algorithm level : parallelism, pipe line Circuit level Long channel Stacked Dual V DD Dual V TH MTCMOS DVFS Device level : FinFET, HKMG NTV operation 46 YONSEI Univ. 23