Low-Power CMOS VLSI Design ( 范倫達 ), Ph. D. Department of Computer Science, National Chiao Tung University, Taiwan, R.O.C. Fall, 2017 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/
Outline Introduction Low-Power Process-Level Design (Ignore here) Low-Power Logic/Circuit-Level Design Low-Power Algorithm/Architecture-Level Design Low-Power System-Level Design Conclusion References VLSI-DSP-6-2
Low Power Design An Ongoing and Important Discipline Historical figure of merit for VLSI design Performance (circuit speed and system quality) Chip area (circuit cost). But now, Power dissipation is now an important metric in VLSI design. No single major source for power savings across all design levels Required a new way of THINKING!!! Companies lack the basic power-conscious culture and designers need to be educated in this respect. Overall Goal - To reduce power dissipations but maintaining adequate throughput rate. VLSI-DSP-6-3
Motivation - Microprocessor VLSI-DSP-6-4
Low Power Competitive Reasons Battery Powered Systems Extend battery life Reduce weight and size High-Performance Systems Cost Package (chip carrier, heat sink, card slots, ) Power Systems (supplies, distribution, regulators, ) Fans (noise, power, reliability, area, ) Operating cost to customer Re-start issue. Reliability Failure rate increases by 4X for T @ 110C vs 70C Size and Weight VLSI-DSP-6-5
The Power Crisis: Portability PDA, Cellular Phone, Notebook Computer,etc. Expected Battery Lifetime increase Over next 5 years: 30-40% VLSI-DSP-6-6
A Multimedia Terminal: The Infopad Present day battery technology (year 1990) 20 lbs for 10hrs VLSI-DSP-6-7
VLSI Signal Processing System Design Space Cost Performance Test Power Area System Level Algorithm Level Architecture Level Logic Level Circuit Level Process Level VLSI-DSP-6-8
Low Power System Design Space System Algorithm Architecture Logic/Circuit Process Power budgeting, S/H partitioning, power management, core selection Algorithmic reduction, data transformation, CSE, low-complexity operation Parallelism, pipelining, re-timing, unfolding, signal ordering, glitch minimization, data representation, resource allocation, multi-clock Logic style, arithmetic, glitch/noise minimization, re-sizing, adaptive voltage scaling, multi-vdd, multi-vth, multi-clock, layout, power-driven P&R Low-power device, alternative technology, multi- Vth VLSI-DSP-6-9
Outline Introduction Low-Power Process-Level Design (Ignore here) Low-Power Logic/Circuit-Level Design Low-Power Algorithm/Architecture-Level Design Low-Power System-Level Design Conclusion References VLSI-DSP-6-10
Where Does Power Go in CMOS? Source of power dissipation P = P dynamic + P short-circuit + P leakage + P static Definitions: Dynamic/switching power: P = αcv 2 f Charging and discharging parasitic capacitors α : switching activity factor Short circuit power P = I sc V Direct path between supply rail during switching Leakage power P = I leakage V Reverse bias diode leakage Sub-threshold conduction Static power P = I static V Each input node is connected to fixed stable voltage VLSI-DSP-6-11
Dynamic Power Consumption (1/2) Power = Energy/transition * transition rate = C L * V dd2 * f 0->1 = C L * V dd2 * Pb 0->1 * f = C EFF * V dd2 * f = Pb 0->1 *C L *V dd2 * f C EFF = Effective Capacitance = C L * Pb 0->1 VLSI-DSP-6-12
Dynamic Power Consumption (2/2) Need to reduce Pb 0->1, C L, V dd, and f for low power design Reduce the probability, P 0 -> 1 Minimize the geometry and remove the redundancy Reduce the power supply level Use lowest clock frequency Power dissipation is data dependent function of switching activity. => Pattern Dependent! VLSI-DSP-6-13
Choice of Logic Style VLSI-DSP-6-14
Choice of Logic Style Power-delay product improves as voltage decreases The best logic style minimizes power-delay (i.e, energy) for a given delay constraint. VLSI-DSP-6-15
Type of Logic Function: NOR Example : Static-style 2-input NOR gate A B Out 0 0 1 0 1 0 1 0 0 1 1 0 Truth Table of 2-Input NOR Gate Assume : P(A=1) = ½ P(B=1) = ½ Then : P(Out=1) = ¼ P(0 1) = P(Out=0)*P(Out=1) =3/4 * 1/4 = 3/16 α 0->1 = 3/16 VLSI-DSP-6-16
2-Input NOR Gate Transition Probability P 1 =(1-P A )(1-P B ) P 0->1 =P 0 P 1 =(1-(1-P A )(1-P B ))(1-P A )(1-P B ) VLSI-DSP-6-17
Type of Logic Function: XOR Example : Static-style 2-input XOR gate A B Out 0 0 0 0 1 1 1 0 1 1 1 0 Truth Table of 2-Input XOR Gate Assume : P(A=1) = 1/2 P(B=1) = 1/2 Then : P(Out=1) = 1/2 P(0 1) = P(Out=0)*P(Out=1) =1/2 * 1/2 = 1/4 α 0->1 = 1/4 VLSI-DSP-6-18
2-Input XOR Gate Transition Probability P 1 =P A (1-P B )+P B (1-P A )=P A +P B -2P A P B P 0->1 =P 0 P 1 =(1-(P A +P B -2P A P B ))(P A +P B -2P A P B ) VLSI-DSP-6-19
Which One is Your Choice? XOR NOR Which one is for Low-Power design? VLSI-DSP-6-20
Glitching Activity in CMOS Network (x,c=0,0) (x,c=1,0) α 0->1 can be greater than 1 due to glitching! VLSI-DSP-6-21
Glitching in a Carry Ripple Adder VLSI-DSP-6-22
Chain vs Tree Datapath (1/2) A B O1 C Chain O2 D F A B B C Tree O1 O2 F O1 O2 F P 1 (Chain) 1/4 1/8 1/16 P 0 =1-P 1 (Chain) 3/4 7/8 15/16 P 0->1 (Chain) 3/16 7/64 15/256 P 1 (Tree) 1/4 1/4 1/16 P 0 =1-P 1 (Tree) 3/4 3/4 15/16 P 0->1 (Tree) 3/16 3/16 15/256 VLSI-DSP-6-23
Chain vs Tree Datapath (2/2) A B O1 C O2 D F A B B C O1 O2 F Chain Tree O1 O2 F P 0->1 (Chain)/P 0->1 (Tree) 1 0.58 1 α 0->1 (Chain)/α 0->1 (Tree) 1 0.83 1.47 Ideal w/t delay Practical with delay Which one is for Low-Power design? VLSI-DSP-6-24
Glitching at the Datapath Level Irregular Regular Two Glitches! VLSI-DSP-6-25
How to Minimize Glitching? Equalize Length of Timing Paths through Design! VLSI-DSP-6-26
Data Representation (1/2) Sign Bit Bit Position Bit Position VLSI-DSP-6-27
Data Representation (2/2) (Binary v.s. Gray Encoding) VLSI-DSP-6-28
Outline Introduction Low-Power Process-Level Design (Ignore here) Low-Power Logic/Circuit-Level Design Low-Power Algorithm/Architecture-Level Design Low-Power System-Level Design Conclusion References VLSI-DSP-6-29
Signal Reordering Operation Ex1. Y=AB+AC= A(B+C) Ex2. Y=3X=X+(X<<1) B B X + A + Y C X Y X A C X <<1 X Y + Y 3 X VLSI-DSP-6-30
Resource Sharing Can Increase Activity (1/2) Separate Bus Structure # of Bus Transitions Per Cycle =2(1+1/2+1/4+.)=4, Where 2 means 2 separate buses, 1 denotes the transition probability of LSB, ½ denotes the transition probability of 2nd LSB, and etc. Bus Sharing VLSI-DSP-6-31
Resource Sharing Can Increase Activity (2/2) Bit Position VLSI-DSP-6-32
Lowering V dd Increases Delay VLSI-DSP-6-33
Reducing V dd VLSI-DSP-6-34
Architecture Trade-offs: Reference Datapath + VLSI-DSP-6-35
Parallel Datapath + + VLSI-DSP-6-36
Pipelined Datapath + VLSI-DSP-6-37
Summary: A Low-Power Data Path Architecture type Voltage Area Power Reference Datapath (no pip/par) 5V 1 1 Pipelined datapath 2.9V 1.3 0.37 Parallel datapath 2.9V 3.4 0.34 Pipeline-parallel datapath 2.0V 3.7 0.18 Desire to operate at lowest possible speeds (using low supply voltages) Use architecture optimization to compensate for slower operation VLSI-DSP-6-38
Computational Complexity of DCT Algorithms VLSI-DSP-6-39
Low-Power Cache and Register Configuration Application profiling Trade-off between performance, power and size Rule of thumb Access and storage the most frequently used instructions Avoid accessing larger cache/register Partition cache and register Aware of partitioning Partition! Partition! CPU Reg Reg L1 Cache L2 Cache Memory VLSI-DSP-6-40
Outline Introduction Low-Power Process-Level Design (Ignore here) Low-Power Logic/Circuit-Level Design Low-Power Algorithm/Architecture-Level Design Low-Power System-Level Design Low Power System Perspective Low Power Applications Conclusion References VLSI-DSP-6-41
Power Down Techniques VLSI-DSP-6-42
Software versus Hardware Advantage Disadvantage Software Hardware Free but not always High flexibility Ease of compatibility High speed Low power High efficiency Less staff High power consumption Slow in execution Inefficient Larger staff High die cost Low flexibility Low compatibility VLSI-DSP-6-43
Energy-Efficient Software Coding Potential for power reduction via software modification is relatively unexploited. Code size and algorithmic efficiency can significantly affect energy dissipation. Pipelining at software level- VLIW coding style References: V. Tiwari et al., Power analysis of embedded software: a first step towards software power minimization, IEEE Trans. on VLSI, vol. 2, no. 4, Dec. 1994. J. Synder et al., Low-power software for low-power people, 1994 IEEE Symp. On Low Power Electronics. VLSI-DSP-6-44
Power Hunger Clock Network H-Tree design deficiencies based on Elmore delay model. PLL every designer (digital or analog) should have the knowledge of PLL. Multiple frequencies in chips/systems by PLL Low main frequency, But Jitter and noise, gain and bandwidth, pull-in and lock time, stability Asynchronous => Use gated clocks, sleep mode VLSI-DSP-6-45
Power Analysis in the Design Flow VLSI-DSP-6-46
Applications I: Wireless Computing/Communication VLSI-DSP-6-47
Applications II: A Portable Multimedia Terminal VLSI-DSP-6-48
Applications III: System on Chip (SOC) Entire system function Logic + Memory More than two types of devices Allow more freedoms in architecture Hardware and software partition VLSI-DSP-6-49
Conclusions Low-Power and high-speed tradeoff design is an essential requirement for many applications. Low power impacts on the cost, size, weight, performance, and reliability. Reduce P 0->1, C L, V dd, and f for low power design across each level!! VLSI-DSP-6-50
Reference [1] A. Chandrakasan and R. W. Brodersen, Minimizing power consumption in digital CMOS circuits, Proceedings of the IEEE, vol. 83, no. 4, pp. 498-523, Apr. 1995. [2] A. Chandrakasan, Architectures for Ultra Low-Power Design, in tutorial B3 of ASP-DAC, 1995. [3] A. Chandrakasan, Low-Voltage/Low-Power Digital Design, in tutorial of Workshop on Low-Power Low-Volgate and RF IC for Wireless Communication System, 1996, Taiwan. [4] T. Sakurai, Low Power Circuit Design Methodology, in tutorial B2 of ASP-DAC, 1995. [5] Chapter 17 of Textbook. VLSI-DSP-6-51
Self-Test Exercises STE1: Calculate the switching activity EQUATION EXPRESSION of 2-input AND gate and simulate the histogram of transition probability (P 0->1 ) vs P A and P B. STE2: Calculate the switching activity EQUATION EXPRESSION of 3-input NAND gate. VLSI-DSP-6-52