Logic Restructuring Revisited Low Power VLSI System Design Lectures 4 & 5: Logic-Level Power Optimization Prof. R. Iris ahar September 8 &, 7 Logic restructuring: hanging the topology of a logic network to reduce transitions ND: P = P x P = ( - P P ) x P P.875 (-.5)*.5 =.875.5.5 Y W.94.5.586 X.586.5 F.5.5 D F D Z.5.5.875 hain implementation has a lower overall switching activity than the tree implementation for random inputs Ignores glitching effects Lecture - Glitching in Static MOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards). X Z X Unit Delay Z Lecture - S Output Voltage (V) S5 Glitching in an R 5 5 4 4 S4 S S in S4 S5 S S5 S S S 4 6 8 Time (ps) S S in Input and in switch from,,, Lecture -4
alanced Delay Paths to Reduce Glitching Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs Theorem For correct operation with minimum energy consumption, a oolean gate must produce no more than one event per transition. F F F F F F Output logic state changes One transition is necessary Output logic state unchanged No transition is necessary So equalize the lengths of timing paths through logic Lecture -5 Lecture -7 Inertial Delay of an Inverter Multi-Input Gate V in d HL d LH d HL +d LH d = DPD: Differential path delay Delay d < DPD V out DPD time d d Hazard or glitch Lecture -9 Lecture -
alanced Path Delays Glitch Filtering by Inertia DPD Delay buffer Delay d < DPD Delay d > DPD DPD d > DPD d No glitch Filtered glitch Lecture - Lecture - Theorem Minimum Transient Design Given that events occur at the input of a gate with inertial delay d at times, t... t n, the number of events at the gate output cannot exceed t n t min ( n, + -------- ) d t n - t Minimum transient energy condition for a oolean gate: t i - t j < d Where t i and t j are arrival times of input events and d is the inertial delay of gate t t t t n time Lecture - Lecture -4
alanced Delay Method Hazard Filter Method ll input events arrive simultaneously Overall circuit delay not increased Delay buffers may have to be inserted No increase in critical path delay Gate delay is made greater than maximum input path delay difference No delay buffers needed (least transient energy) Overall circuit delay may increase Lecture -5 Lecture -6 Define Timing Variables d i Gate delay. Define two timing window variables per gate output: t i Earliest time of signal transition at gate i. T i Latest time of signal transition at gate i. Glitch suppression constraint: T i t i < d i Designing a Glitch-Free ircuit Maintain specified critical path delay. Glitch suppressed at all gates by Path delay balancing Increasing inertial delay of gates Inserting delay buffers when necessary. linear program optimally combines all objectives. t, T. t n, T n d i t i, T i Path delay = d Path delay = d Delay D d d < D Lecture -7 Lecture -8 4
n Example: Full dder Solution: maxdelay = 6 ritical path delay = 6 ritical path delay = 6 Lecture - Lecture -6 Solution: maxdelay = 6 Solution: maxdelay = 7 ritical path delay = 6 ritical path delay = 7 Lecture -7 Lecture -8 5
Solution: maxdelay Four-it LU 5 maxdelay uffers inserted 7 5 5 ritical path delay = 4 Maximum Power Savings (zero-buffer design): Lecture -9 Lecture - LU4: Original and Low-Power 755 ircuit: Spice Simulation Lecture - Power Saving: verage 58%, Peak 68% grawal, 7 Lecture - 6
Power as a Function of V DD 5.5 5 4.5 4.5.5.5.8..4.6.8..4 V DD (V) Decreasing the V DD decreases dynamic energy consumption (quadratically) ut, increases gate delay (decreases performance) Determine the critical path(s) at design time and use high V DD for the transistors on those paths for speed. Use a lower V DD on the other gates, especially those that drive large capacitances (since this yields the largest energy benefits). Lecture - Multiple V DD onsiderations How many V DD? Two is common Many chips already have two supplies (one for core and one for I/O) When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up) If a gate supplied with VDDL drives a gate at VDDH, the PMOS never turns off V DDH cross-coupled PMOS transistors do the level conversion NMOS transistors operate on a reduced supply V in Level converters are not needed for a step-down change in voltage V DDL Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop V out Lecture -4 Multiple V DD onsiderations V DDH V out Dual-Supply Inside a Logic lock Minimum energy consumption is achieved if all logic paths are critical (have the same delay) lustered voltage-scaling Each path starts with V DDH and switches to V DDL (gray logic gates) when delay slack is available Level conversion is done in the flipflops at the end of the paths V in V DDL Operating at V DD low Operating at V DD high Lecture -5 Lecture -7 7
Leakage (static) power dissipation Gate leakage Vout V DD I leakage Drain junction leakage Sub-threshold current Sub-threshold current is the dominant factor. ll increase exponentially with temperature! Lecture -8 Leakage (static) power dissipation Reducing V DD reduces dynamic power dissipation UT reduced V DD also means reduced performance Improve performance by reducing V th UT reduced V th means increased leakage current V V GS V DS th VGS V th nvt VT nvt Ileakage I e e Ie I is a function of gate oxide, mobility, and size of device VT is the thermal voltage (6mV at T=K) Notice that V ds is much larger than V gs for a device that is off. Lecture -9 ontolling Leakage urrent I leakage I VGS V nvt What can we do to control leakage current? V GS, V th, VT. e th Static Dual-V th ssignment Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Use lower threshold on timing-critical paths Only applied to off-critical paths Doesn t take into account circuit activity What about Dual V th synthesis? What are the limitations of Dual V th? Remember that leakage occurs whether devices are switching or not (i.e., active or not) Lecture -4 Lecture -4 8
Problem: Leakage Reduction This circuit is designed in 65nm MOS technology using low threshold transistors. Each gate has a delay of and a leakage current of n. Given that a gate with high threshold transistors has a delay of ps and leakage of n, optimally design the circuit with dual-threshold gates to minimize the leakage current without increasing the critical path delay. What is the percentage reduction in leakage power? What will the leakage power reduction be if a % increase in the critical path delay is allowed? Solution : No Delay Increase The critical path(s) are shown by the dashed line (delay = ). None of these gates on the critical path can be assigned a high V T. lso, the inverters that are on four-gate long paths cannot be assigned high threshold because then the delay of those paths will become 7ps. The remaining inverters and the NOR gate can be assigned high V T (shaded grey). Reduction in leakage power = (4 +7 )/( ) =.7% ritical path delay = ps ps ps Lecture -4 ps Lecture -4 Solution : % Delay Increase Several solutions are possible. Notice that any -gate path can have high threshold gates. Four and five gate paths can have only one high threshold gate. One solution is shown in the figure below where six high threshold gates are shown with shading and the critical path is shown by a dashed red line arrow. The reduction in leakage power = (6 +5 )/( ) = 49.9% ritical path delay = 9ps ps ps Standby/ctive Modes We want computation to be as fast as possible when needed When computation is not needed, we don t care: lock gating: inhibits switching, reduces dynamic power Standby mode: cuts off or reduces power supply, reduces static power When do we enter standby mode? fter some period of inactivity (reactive) nalyzing resource needs of current computation ps ps ps ps Lecture -44 Lecture -45 9
Reducing static power I leakage I VGS V nvt V th has an exponential effect on leakage current Increasing V th will lower static power dissipation ontrol directly (using high V th devices) ontrol indirectly (taking advantage of circuit topology and input values) e th Lecture -46 onsider a -input NND gate Leakage is lowest when both inputs ==V. Intermediate node V x settles to: V x V th ln (+n) D S V D X S Out Stack Effect V th V T I γ( D I e φ V e VGS Vth nkt / q V X I SU= f(i s,v GS,V S ) V th ln(+n) V GS =V S = -V X V GS =V S = V DD -V T V GS =V S = V SG =V S = Leakage reduction due to stacked transistors is called the stack effect F S φ F VDS kt / q ) Lecture -47 ommon Subexpression Extraction Example y y y y y = ac + b c + de y = ae + b e + e f t = (a + b ) y = tc + de y = te + e f x x xn Lecture -5 Lecture -54
Low Power Logic Synthesis How can the sub-expression impact the power dissipation of the circuit? Remember P=/ αv f clk Signal probabilities and transition densities of all nodes in a circuit remain unchanged with factoring What changes? apacitances at output of driver gate of each node containing literals found in sub-expression uts down switching activity of internal nodes in sub-expression term (term appears once instead of L times) FSM and ombinational Logic Synthesis ombinational Logic: onsider signal activity when selecting best common sub-expression to pull out during multi-level logic synthesis Factor highest-activity common sub-expression out of all affected expressions Latches: onsider conditions under which computation is needed Pre-compute logic ahead of time and selectively prevent latch from updating onsider likelihood of state transitions during state assignment Minimize # signal transitions on present state inputs V Lecture -55 Lecture -57 Precomputation-ased Optimization Precomputation rchitecture asic idea: Precompute circuit output logic values cycle before they are needed Use precomputed information in next clock cycle to disable unneeded hardware, Reduces switching activity Disable via clock enable to state bit latches Must be careful: Precomputation hardware can add to area and lengthen clock period Lecture -58 The group of registers marked R are controlled by precomputation logic Lecture -59