Outline Leakage Power Minimization in Deep-Submicron circuits Politecnico di Torino Dip. di Automatica e Informatica 1019 Torino, Italy enrico.macii@polito.it Introduction. Design for low leakage: Basics. Existing approaches. Sleep Transistor Insertion: Principle. Automated STI. Methodology. Preliminary results. Extensions. Conclusions. Electronic Technology Today: Convergence Power Dissipation in Circuits technology dominates in modern ICs. 1960s 1970s 1980s 1990s 000s Watch Chip Calculator PMOS SRAM Microprocessor FLASH Power dissipation of a gate: P = P SW + P SC + k P SW = Switching (or dynamic) power. P SC = Short -circuit power. k = Leakage (or stand-by) power. In older technologies (0.um and above), k was marginal w.r.t. switching power: Switching power minimization was the primary objective. DRAM Server/Mainframe PMOS Bipolar ECL BI 1960s 1970s 1980s 1990s 000s In deep sub-micron processes, k becomes critical. Leakage vs. Dynamic Power in Current Circuits Power Dissipation Due to Leakage Leakage power becomes comparable to dynamic power as technology scales. inverter: Example: ASICs [source: STMicroelectronics]. V DD Power Density (Watts/cm ) 10 1 100 7 Example: Microprocessors [source: Intel]. 0 0 0nm 180nm 10nm 90nm 6nm Leakage Power Dynamic Power V IN PMOS I sub V OUT Itanium : 180nm, 1.V, 1.0Hz, 1MTx (core+cache) Itanium : 10nm, 1.V, 1.Hz, 10MTx (core+cache) 100% 80% 60% 0% 0% 0% Itanium Itanium Leakage Power I/O Power Dynamic Power I gate C L 6 1
Power Dissipation Due to Leakage (Cont.) Leakage power of a gate: V DD = Supply voltage. I L = Leakage current. eakage = I L Leakage current I L consists of two major contributions: I L = I sub + I gate I sub = Sub-threshold current caused by low threshold voltage. I gate = ate current caused by reduced thickness of gate oxide. I sub dominates, but grows by X per generation. I gate is less relevant, but grows much faster (00X per generation). Low-Leakage Design Leakage power minimization: Design problem (and not just a technology/process problem). For memory macros: Optimization based on ad-hoc solutions (cell optimization). For cell-based logic: Optimization requires design automation. Integration with existing tools (both at logic and physical level) is mandatory. Different solutions proposed for both sub-threshold and gate leakage. 7 8 Existing Approaches to Low-Leakage Design DT Sub-threshold leakage: Variable-threshold (VT). Dual-threshold (DT). Multi-threshold (MT). Sleep transistor insertion (STI). Multi-voltage (MV). Body biasing (reverse -- RBB and forward -- FBB). State assignment. ate leakage: Boosted gate MOS (BMOS). P-type Domino. Pin reordering and state assignment. Low-threshold cells: 1-0% faster. 10x higher leakage. than high-threshold cells. Libraries containing high-v Th and low-v Th gates do exist. Use low-v Th gates for critical paths, high-v Th cells for the rest. Approach: Synthesize and map the design onto all high- V Th cells. Minimum leakage implementation. Replace high-v Th cells on the critical path with low-v Th cells to meet timing constraints. Leakage power increase required to meet timing constraints may vary from 0% to 00%. 9 10 MT MT (Cont.) Use multi-threshold cells with capability of operating at:, when in active mode; High-V Th, when in stand-by mode. Leakage power control obtained thanks to two effects: Transistor stacking. Low sub-threshold leakage current of high-v Th transistors. Principle of MT: Insertion of high-v Th transistors in series to the pull-up and pull-down networks in order to reduce the sub-threshold leakage current while maintaining high-speed operation in active mode. gate Sleep gate Virtual ND Sleep Virtual ND ND 11 1
MT (Cont.) Limitations of MT: Impact on area: Each cell includes two extra transistors for low -leakage stand-by operation Significant cost in terms of area. The PMOS transistor is normally much larger than the (e,.g., Form factor ~ 0) Need of huge buffering circuitry. Process modifications for supporting the implementation of high-v Th transistors. Impact on performance: Slow-down of power gated logic cells when the circuit is active. Re-activation delay for re -enabling a set of powered down cells. block (N cells) STI Modify the MT approach by: Using the same sleep transistors to control blocks of higher complexity. Avoiding the PMOS sleep transistor. block (N cells) ND Sleep Virtual ND ND 1 1 STI (Cont.) Further modification: Use low-v th sleep transistors. Consequences: All devices are fabricated using the same process. Sub-threshold leakage power reduced by transistor stacking effect only (smaller, but still significant reduction). Example: LOW - V TH ATED LOIC 1 SLEEP Vgnd 6 7 Automated STI Issues: ranularity of STI insertion. Large blocks: Size of sleep transistors and driving strengths of sleep signals. Small blocks: Number of sleep transistors and size of control logic. Design of sleep transistor cells: Different sizes and driving strenghts. Must be compliant with the cells in the library. Area and delay control. Selection of gates to which STI should be applied: Requires layout information. eneration of sleep signals: Area, timing and power overhead. LEAKAE - CONTROL CELL 1 16 Post-layout STI for combinational circuits. STI is performed on a row-by-row basis. Sleep transistors are added at the boundaries of each row and they are connected to a common virtual ground. Assumptions: All cells in the circuit can be potentially controlled by sleep transistors. Only one control signal is used to drive all the sleep transistors and it is available from some external module (e.g., a microprocessor). Design and characterization of a library of sleep transistor cells. Flow: Placed Row Area Constraint Calculate Cluster of ates Delay Constraint Size and Insert Sleep Transistor Update Layout Sleep Transistor Library 17 18
Controlling area penalty. Use part of the area of empty regions between cells according to the tolerated congestion overhead (compaction). Resize the floorplan according to the tolerated area overhead. For each row, consider the largest sleep transistor that can be inserted (free area + allowed overhead). Free Area Controlling delay penalty. The maximum sustainable current of each sleep transistor is computed according to the tolerated slow-down in active mode. The cell selection process performs a gate-by-gate exploration of each row starting from the cell with the longest timing path and going back towards the prymary inputs. The re-activation time penalty is traded (or nullified) by preventing the power gating in the circuit of some of the cells whose arrival times are shorter than the re -activation delay of the sleep transistors. 1 1 Free Area + Area Overhead For each row, the process stops when the current budget is exhausted or the re-activation time penalty is violated. 19 0 Experimental set-up: Six benchmarks (from 1900 to 600 standard cells). Circuits synthesized onto 0.1um technology library from STMicroelectronics. Sleep transistor cells chosen so as to guarantee a total perform ance degradation below % in active mode. Tolerated area overhead set to %. Results: Leakage power reductions around 80%. Total power savings, accounting for cell dynamic and internal power, are around 19%. Benchmark Block1 0.11 Original 0.9 0.0 0.0 Optimized 0. 0. 78.9 D -9.0 1.0 Block 0.19 0. 0.1 0.0 0. 0.8 80.0-10.1 1.0 Block 0.16 0.1 0.7 0.0 0. 0.7 7.6-8.8 1. Block 0.6 0.60 0.86 0.0 0.6 0.68 8.7 -.0 18.6 Block 0.1 0.9 0.1 0.0 0. 0. 78.9-9.7 1. Block6 0.6 0.88 1. 0.09 0.98 1.07 8. -1. 0.1 Avg. 79.7-9.6 18.9 1 Results (cont.): Area overhead around.% and circuit delay increase of %. Benchmark Block1 Block Block ates 18 1916 1 Sleep Cells 1 1 Area_Orig [µm ] 691 610 60 Area_Opt [µm ] 6679 66710 660 D.9.. Extensions to the post-layout STI approach: Handle sequential circuits: Problem of state retention in sleep mode. Need to design low-leakage flip-flops (based on the concept of Baloon Circuit ). Automatic extraction of logic conditions for sleep. Can reuse idle conditions from clock gating. Exploit ODC -driven clock gating approach. Minimum overhead (shared logic with gated clock circuitry). Block 67 1 61 66 1.7 Block 0 6 6918 6819. Block6 61 0 7098 7170.0
Preliminary results on the design of a low-leakage storage element (flip-flop with Baloon circuit ): Leakage current in stand-by mode: Regular flip-flop: 116nA. Low-leakage flip-flop: 10nA. Marginal delay increase. Conclusions Leakage accounts for around -10% of power budget at 180nm; this grows to 0-% at 10nm and to -0% at 90nm. Leakage power minimization must be faced from the design stand-point, not just at the technology/process level. Several low-leakage design approaches introduced recently. STI is promising, although it requires significant methodology and tool infrastructure support. Preliminary results are currently under evaluation. 6