Jan Rabaey, «e Design Essentials," Springer 2009 http://web.me.com/janrabaey/lowpoweressentials/home.h tml Dimitrios Soudris, Christian Piguet, and Costas Goutis, Designing CMOS Circuits for Low POwer, Kluwer Academic Publishers, Sept. 2002 A. Chandrakasan and R. Brodersen, CMOS Design, Kluwer Academic Publishers, 1995 J. Rabaey and M. Pedram, Ed., Design Methodologies, Kluwer Academic Publishers, 1995 Design Course 2
High Volume Manufacturing Technology Node (nm) Integration Capacity (10 9.T) Delay = CV/I scaling 2004 2006 2008 2010 2012 2014 2016 2018 Σχήμα 4: Επερτόμενες τετνολογίες κατασκεσής ολοκληρωμένων κσκλωμάτων. Energy/Logic Op scaling 90 65 45 32 22 16 11 8 2 4 8 16 32 64 128 256 0.7 ~0.7 >0.7 Delay scaling will slow down >0.3 5 >0.5 >0.5 Energy scaling will slow down Variability Medium High Very High [Shekhar Borkar, Intel 2006 3
Emerging Technologies for fabricating integrated systems Design Course 4
To present low power estimation and optimization design methodologies and techniques at all design levels: System/Behavioral Level Architecture/Register-Transfer Level Logic Level Circuit/Transistor Level Aspects from System-On-Chip Design Course 5
Power is the rate at which energy is delivered or exchanged» electrical energy is converted to heat energy during operation Power Dissipation - rate at which energy is taken from the source (V dd ) and converted into heat Design Course 6
Large Market of Portable devices e.g. laptops, mobile phones Achieve larger transistor integration Teraflops Research Chip contains 1.9 bimillion transistors Need for green computers 10% of total electrical energy consumed by PCs Design Course 7
Battery Technology Improvements Design Course 8
Power Evolution over Technology Generations Design Course 9
Design Course 10
Design Course
Design Course 12
Design Course 13
Design Course 14
Design Course 15
Design Course 16
Design Course 17
Reduce chip capacitance through process scaling ==> Expensive Reduce Voltage levels from 5V 3.3V 2V ==> Industry is hard to move (microprocessors, memory,...) Better Circuit Techniques ==> Gated clocks, Power-Down of non-operational units Example: IBM 80 MHz PowerPC RISC (3 W @ 3.3V) Power Management Logic determines activity on per cycle basis Clocks of idle blocks are turned off 12-30% savings Doze - Nap and Sleep mode (5 mw) Design Course 18
Pentium-1: 15 Watt (5V - 66MHz) Pentium-2: 8 Watt (3.3V- 133 MHz) Design Course 19
Design Course 20
Year 1999 2002 2005 2008 2011 2014 Feature size (nm) 180 130 100 70 50 35 Logic trans/cm 2 6.2M 18M 39M 84M 180M 390M Cost/trans (mc) 1.735.580.255.110.049.022 #pads/chip 1867 2553 3492 4776 6532 8935 Clock (MHz) 1250 2100 3500 6000 10000 16900 Chip size (mm 2 ) 340 430 520 620 750 900 Wiring levels 6-7 7 7-8 8-9 9 10 Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5 High-perf pow (W) 90 130 160 170 175 183 Battery pow (W) 1.4 2 2.4 2.8 3.2 3.7 Design Course 21
Design Course 22
The power consumption in digital CMOS circuits P avg = P dynamic + P short-circuit + P leakage Dynamic Power Consumption Charging and Discharging Capacitors Short Circuit Currents Short Circuit Path between Supply Rails during Switching Leakage (Static) Leaking diodes and transistors Design Course 23
2 dynamic L dd P C V N f where V DD supply voltage, C L capacitance, N is the average number of transitions per clock cycle, and f frequency operation V dd V dd V dd Charging current IN OUT OUT OUT C L C L C L Discharging current (a) (b) (c) Design Course 24
Dynamic Power Consumption (2) For technologies up to 0.35 m, the dynamic consumption is about 80% of the total consumption Goal ===> reduce dynamic power consumption reduction capacitance reduction of supply voltage reduction of frequency reduction of switching activity or combination of above factors Design Course 25
Power = Energy/transition * transition rate = C L * V dd 2 * f 0 1 = C L * V dd 2 * P 0 1 * f = C EFF * V dd 2 * f Power Dissipation is Data Dependent Function of Switching Activity C EFF = Effective Capacitance = C L * P 0 1 Design Course 26
Power Consumption is Data Dependent Example: Static 2 Input NOR Gate Assume: P(A=1) = 1/2 P(B=1) = 1/2 Then: P(Out=1) = 1/4 P(0 1) = P(Out=0).P(Out=1) = 3/4 1/4 = 3/16 C EFF = 3/16 * C L Design Course 27
The short circuit power, is caused by the direct path from the power supply to ground, during the transition phase where K is a constant that depends on the transistor sizes and the technology, Vt is the threshold voltage of the nmos and pmos transistors, is the rise or fall time of the input signal, N is the average number of transitions in the inverter s output, and f is the clock frequency. 3 Pshort circuit K ( Vdd 2Vt ).. N. f Design Course 28
the reverse-bias diode leakage at the transistor drains and the sub-threshold current through an turnedoff transistor channel Log I D gate p+ p+ n-type substrate + V dd leakage current reversed-biased diode (drain-substrate) The leakage of a reverse-biased pmos transistor. 10-3 10-5 10-7 10-9 10-11 10-13 10-15 0 Subthreshold region Saturated region Decreasing V DS, V dd 0.5 1 1.5 2 V GS, volts Subthreshold leakage with respect to gate-source voltage Design Course 29
30
the reverse-bias diode leakage at the transistor drains and the sub-threshold current through an turnedoff transistor channel Log I D gate p+ p+ n-type substrate + V dd leakage current reversed-biased diode (drain-substrate) The leakage of a reverse-biased pmos transistor. 10-3 10-5 10-7 10-9 10-11 10-13 10-15 0 Subthreshold region Saturated region Decreasing V DS, V dd 0.5 1 1.5 2 V GS, volts Subthreshold leakage with respect to gate-source voltage Dimitrios Soudris, DUTH 31
Power consumption of transfer and storage over datapath operations both in hardware [Men95] and software [Tiw94, Gon96]. 33 relative energy/operation 1 3.6 4.4 9 10 relative energy 0.4 0.2 0.0 16-bit carry-select 16-bit Multiplier 8x128x16 SRAM (read) 8x128x16 SRAM (write) External I/O Access 16 bit Memory Access Storage Interconnect clocks Other RISC components Dimitrios Soudris, 32
Increasing power savings System level 10-20 x Behavior level RT level 2-5 x Logic level Transistor level 20-50% Layout level Design Course 33
Algorithm Transformation to exploit concurrency Architecture Parallelism and Pipelining Circuit/Logic Transistor Sizing, Fast Logic Structures Technology Threshold Voltage Reduction, Feature Size scaling Design Course 34
Syste m U Partitioning, Powe r-down, powe r state s Algorithm C omple xity, C oncurre ncy, Re gularity, Locality, Data re pre se ntation Archite cture C oncurre ncy, Instruction se t se le ction, Signal corre lations, Data re pre se ntation, Data Encoding C ircuit/logic Transistor siz ing, Logic optimiz ation, Powe r down, Layout O ptimiz ation Te chnology Advance d packaging, SO I Design Course 35
System Specifications System Specifications System-Level Design System-Level Design System-Level Analysis/Estimation Power models f or System -level com ponents Architecture-Level Design Logic-Level Design Architecture-Level Design Architecture-Level Analysis/Estimation Power models f or macrocells, control logic Logic-Level Design Circuit-Level Design / Layout synthesis Logic-Level Analysis/Estimation Power models f or gates, cells (a) Circuit-Level Design / Layout synthesis Circuit-Level Analysis/Estimation (b) Design Course 36
7.50 7.00 6.50 6.00 multiplier clock generator 2.0 m technology T d = C L * V dd I NORMALIZED DELAY 5.50 5.00 4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00 adder ring oscillator adder (SPICE) microcoded DSP chip T d(vdd=2) = (2) * (5-0.7) 2 T d(vdd=5) I ~ (V dd - V t ) 2 4 (5) * (2-0.7) 2 2.00 4.00 6.00 V dd (volts) Relatively independent of logic function and style. Design Course 37
NORMALIZED POWER-DELAY PRODUCT 1.5 1.00 0.70 0.50 0.30 0.20 0.15 0.1 0.07 0.05 0.03 quadratic dependence 51 stage ring oscillator 8-bit adder 1 2 5 Vdd (volts) P x t d = E t = C L * V dd 2 E (Vdd=2) = E (Vdd=5) (C L ) * (2) 2 (C L ) * (5) 2 E (Vdd=2) 0.16 E (Vdd =5) Strong function of voltage (V 2 dependence). Relatively independent of logic function and style. Power Delay Product Improves with lowering V DD. Design Course 38
Delay I D 2V t V dd V t = 0 V t = 0.2 V GS Reduces the Speed Loss, But Increases Leakage Interesting Design Approach: DESIGN FOR P Leakage == P Dynamic Design Course 39
Lower Capacitance Small W/L s Higher Voltage Higher Capacitance Large W/L s Lower Voltage Larger sized devices are useful only when interconnect dominated. Minimum sized devices are usually optimal for low-power. Design Course 40
Global bus architecture Local bus architecture Shared Resources incur Switching Overhead Design Course 41
Power Consumption is Data Dependent Example: Static 2 Input NOR Gate Assume: P(A=1) = 1/2 P(B=1) = 1/2 Then: P(Out=1) = 1/4 P(0 1) = P(Out=0).P(Out=1) = 3/4 1/4 = 3/16 C EFF = 3/16 * C L Design Course 42
Design Course 43
A X B Z Reconvergence P(Z=1) = P(B=1). P(X=1 B=1) Becomes complex and intractable real fast Design Course 44
V DD M p Out In 1 In 2 In 3 PDN M e Power is Only Dissipated when Out=0! C EFF = P(Out=0).C L Design Course 45
Example: Dynamic 2 Input NOR Gate Assume: P(A=1) = 1/2 P(B=1) = 1/2 Then: P(Out=0) = 3/4 C EFF = 3/4 * C L Switching Activity Is Always Higher in Dynamic Circuits Design Course 46
Switching Activity for Precharged Dynamic Gates P 0 1 = P 0 Design Course 47