New Approaches to Total Power Reduction Including Runtime Leakage. Leakage

Similar documents
THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment

EECS 427 Lecture 22: Low and Multiple-Vdd Design

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Power Spring /7/05 L11 Power 1

Leakage Power Minimization in Deep-Submicron CMOS circuits

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

A Survey of the Low Power Design Techniques at the Circuit Level

POWER GATING. Power-gating parameters

Low-Power Digital CMOS Design: A Survey

Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University

Dual-K K Versus Dual-T T Technique for Gate Leakage Reduction : A Comparative Perspective

Minimization of Dynamic and Static Power Through Joint Assignment of Threshold Voltages and Sizing Optimization

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

Computer-Aided Design for Low-Power Robust Computing in Nanoscale CMOS

Low Transistor Variability The Key to Energy Efficient ICs

ISSN:

Interconnect-Power Dissipation in a Microprocessor

Leakage Current Analysis

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

An Implementation of a 32-bit ARM Processor Using Dual Power Supplies and Dual Threshold Voltages

A Novel Low-Power Scan Design Technique Using Supply Gating

Leakage Control Techniques for Designing Robust, Low Power Wide-OR Domino Logic for Sub-130nm CMOS Technologies

A Dual-V DD Low Power FPGA Architecture

EEC 216 Lecture #8: Leakage. Rajeevan Amirtharajah University of California, Davis

Design of a Tri-modal Multi-Threshold CMOS Switch with Application to Data Retentive Power Gating

Low Power Design of Successive Approximation Registers

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

Leakage Power Reduction Through Hybrid Multi-Threshold CMOS Stack Technique In Power Gating Switch

Variable Body Biasing Technique to Reduce Leakage Current in 4x4 DRAM in VLSI

Low Power Techniques for SoC Design: basic concepts and techniques

Announcements. Advanced Digital Integrated Circuits. Quiz #3 today Homework #4 posted This lecture until 4pm

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

Low Power Design in VLSI

Minimum Energy CMOS Design with Dual Subthreshold Supply and Multiple Logic-Level Gates

Power and Energy. Courtesy of Dr. Daehyun Dr. Dr. Shmuel and Dr.

Logic Restructuring Revisited. Glitching in an RCA. Glitching in Static CMOS Networks

MTCMOS Hierarchical Sizing Based on Mutual Exclusive Discharge Patterns

Lecture 13 CMOS Power Dissipation

Ultra Low Power VLSI Design: A Review

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique

A NEW APPROACH FOR DELAY AND LEAKAGE POWER REDUCTION IN CMOS VLSI CIRCUITS

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

White Paper Stratix III Programmable Power

Gate Oxide Leakage and Delay Tradeoffs for Dual Ì ÓÜ Circuits

METHODS FOR TRUE ENERGY- PERFORMANCE OPTIMIZATION. Naga Harika Chinta

1. Short answer questions. (30) a. What impact does increasing the length of a transistor have on power and delay? Why? (6)

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

Characterization of 6T CMOS SRAM in 65nm and 120nm Technology using Low power Techniques

4 principal of JNTU college of Eng., JNTUH, Kukatpally, Hyderabad, A.P, INDIA

Low Power Realization of Subthreshold Digital Logic Circuits using Body Bias Technique

ESTIMATION OF LEAKAGE POWER IN CMOS DIGITAL CIRCUIT STACKS

A HIGH SPEED & LOW POWER 16T 1-BIT FULL ADDER CIRCUIT DESIGN BY USING MTCMOS TECHNIQUE IN 45nm TECHNOLOGY

A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation

Low Power Design. Prof. MacDonald

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

CMOS circuits and technology limits

ECE260B CSE241A Winter Design Styles Multi-Vdd/ Vth Designs. Website: / vlsicad.ucsd.edu/ courses/ ece260bw05

cq,reg clk,slew min,logic hold clk slew clk,uncertainty

LEAKAGE POWER REDUCTION IN CMOS CIRCUITS USING LEAKAGE CONTROL TRANSISTOR TECHNIQUE IN NANOSCALE TECHNOLOGY

A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting

CHAPTER 3 PERFORMANCE OF A TWO INPUT NAND GATE USING SUBTHRESHOLD LEAKAGE CONTROL TECHNIQUES

INTERNATIONAL JOURNAL OF APPLIED ENGINEERING RESEARCH, DINDIGUL Volume 1, No 3, 2010

19. Design for Low Power

Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law. Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs

Design of low power SRAM Cell with combined effect of sleep stack and variable body bias technique

Reducing the Sub-threshold and Gate-tunneling Leakage of SRAM Cells using Dual-V t and Dual-T ox Assignment

Contents 1 Introduction 2 MOS Fabrication Technology

Towards PVT-Tolerant Glitch-Free Operation in FPGAs

Ruixing Yang

CHAPTER 1 INTRODUCTION

Sleepy Keeper Approach for Power Performance Tuning in VLSI Design

Leakage Power Reduction for Logic Circuits Using Variable Body Biasing Technique

Comparative Study of Different Low Power Design Techniques for Reduction of Leakage Power in CMOS VLSI Circuits

Timing analysis can be done right after synthesis. But it can only be accurately done when layout is available

Analysis and Design of Low Power Ring Oscillators with Frequency ~ khz

International Journal of Innovative Research in Technology, Science and Engineering (IJIRTSE) Volume 1, Issue 1.

EEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis

A Novel Dual Stack Sleep Technique for Reactivation Noise suppression in MTCMOS circuits

Energy-Recovery CMOS Design

Static Energy Reduction Techniques in Microprocessor Caches

EDA Challenges for Low Power Design. Anand Iyer, Cadence Design Systems

Jack Keil Wolf Lecture. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. MOSFET N-Type, P-Type.

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

induced Aging g Co-optimization for Digital ICs

Keywords : MTCMOS, CPFF, energy recycling, gated power, gated ground, sleep switch, sub threshold leakage. GJRE-F Classification : FOR Code:

Effect of Device Scaling for Low Power Environment. Vijay Kumar Sharma

Near-threshold Computing of Single-rail MOS Current Mode Logic Circuits

Leakage Current in Low Standby Power and High Performance Devices: Trends and Challenges

Jan Rabaey, «Low Powere Design Essentials," Springer tml

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

A DUAL-EDGED TRIGGERED EXPLICIT-PULSED LEVEL CONVERTING FLIP-FLOP WITH A WIDE OPERATION RANGE

Low Power and High Speed Multi Threshold Voltage Interface Circuits Sherif A. Tawfik and Volkan Kursun, Member, IEEE

Semiconductor Memory: DRAM and SRAM. Department of Electrical and Computer Engineering, National University of Singapore

Design of Low power and Area Efficient 8-bit ALU using GDI Full Adder and Multiplexer

The challenges of low power design Karen Yorav

Minimizing the Sub Threshold Leakage for High Performance CMOS Circuits Using Stacked Sleep Technique

A Combined Gate Replacement and Input Vector Control Approach for Leakage Current Reduction

Optimal Module and Voltage Assignment for Low-Power

Transcription:

1 0 0 % 8 0 % 6 0 % 4 0 % 2 0 % 0 % - 2 0 % - 4 0 % - 6 0 % New Approaches to Total Power Reduction Including Runtime Leakage Dennis Sylvester University of Michigan, Ann Arbor Electrical Engineering and Computer Science http://vlsida.eecs.umich.edu dennis@eecs.umich.edu Colleagues on this work: Prof. David Blaauw, Ashish Srivastava, Dongwoo Lee, Harmander Deogun, Rajeev Rao, Saumil Shah Components of power dissipation Power trend 15 10 5 0 Switching 54% Sub-threshold leakage 0% 4%18%42% 0% 0% 0.01 0.1 1 Channel length (um) Increasing contribution of static (leakage) power Leakage is significant in both standby mode (mobile apps) and runtime (high-performance non-mobile parts) 1.4X Leakage Total Figure source: Intel Page 1

Reducing Power Dissipation Pressing need to reduce power dissipation High-performance designs Packaging / cooling costs Power supply integrity Reliability (temperature) Mobile applications In addition to above: Battery life Circuit performance is generally determined by a small fraction of the gates Requires the availability of very high performance devices Higher Vdd Lower threshold voltage Aggressive gate length All gates in the design contribute to power dissipation Would like to use slower devices whenever possible (higher Vth, lower Vdd, possibly longer gate lengths) Multiple Vth Exponential reduction in leakage power Cost : Additional masks Value of higher threshold Tradeoff: Delay penalty Leakage reduction Can be easily incorporated into standard design flows Multi-threshold library Tradeoff: Library size runtime Generally threshold selection is done at gate level 2X library size Provides runtime leakage power reduction Contrary to standby mode based approaches Page 2

Multiple Vdd Quadratic reduction in switching power P switching ~ α sw C L V DD 2 f Roughly cubic reduction in leakage power (DIBL, V*Ioff) Value of lower Vdd 0.6 0.7 times Vdd high 0.5*Vdd high in dual-vth processes Ref: Usami Multiple Vdd - Topological Constraint Vdd low cells cannot be directly connected to Vdd high cells PMOS does not turn off Results in static current VDDL VDDH Static Current Level converters (LCs) are used to up-convert a low Vdd signal to a high Vdd signal Incurs delay and energy overhead Page 3

Multiple Vdd 2 General Approaches Clustered Voltage Scaling (CVS) Only one voltage transition along a path Level conversion only at flip-flops Extended CVS (ECVS) Multiple voltage transitions along a path Level conversion using asynchronous LC s 40-50% improvement in power observed Other Issues in Multi-Vdd Generation of additional voltage supplies Impact on power grid design Hard to use standard design tools Simple Power Compiler based approach found to provide only a 6% power reduction Cell layout must change Increase in routing costs Page 4

Outline Concurrent Vdd/Vth assignment and sizing algorithm Standby mode leakage reduction using state, Vth, and Tox assignment Runtime leakage reduction with bus encoding + novel Vth assignment strategies Our Approach: Overview Seek to maximize total power reduction in a dual Vdd/Vth design Uses Vdd, Vth, and sizing: VVS VVS is a two-pass approach Uses sensitivity metrics to minimize power in each pass 1 st pass: CVS with concurrent up-sizing Generates slack and allows for a larger fraction of gates to be set to low Vdd 2 nd pass: Move back towards primary outputs (POs), setting gates to high Vth and re-setting gates to high Vdd or resizing to recover slack Continue while total power dissipation is found to decrease Page 5

Gate Level Vdd/Vth Assignment Perform timing analysis and begin CVS Initial circuit synthesized at Vdd high and Vth low Obtain the candidate set of gates (front) Do not serve as input to any high Vdd gate If set to low Vdd will violate timing 8 1 4 6 2 7 5 3 Backward Pass Order candidates based on a metric Slack, capacitance, etc. To meet timing size up gates Gates to be sized up are obtained based on sensitivity Size up until timing is again met Sensitivity= D/ Area D=Σ { delay arc (t) * 1/(k + Min (slack) slack arc )} arcs k is a small positive number Weights arcs that impact critical paths Page 6

Backward Pass, cont. Stopping criterion When a gate is set to low Vdd only a fixed number of gates are upsized The total power dissipation measure is not used in the hope to get out of local minimas The end of the pass is signaled when no candidate gates can be set to low Vdd The best seen solution is stored and is restored at the end of the pass Forward Pass Now candidate gates which define the front are Operating at low Vdd Have all high Vdd as inputs Select gates on the front and set them to high Vdd/upsize Select gates to be set to high Vt Commit these changes if total power is found to decrease Stop when no available options for gate upsizing/high Vdd assignment The gates are set to high Vth based on their sensitivity Sensitivities of the form Power/ Delay Weighted by slack All gates are candidates to be set to high Vth (no topological constraints) Page 7

Results 0.13µm process, timing constraint is 20% slower than absolute fastest design point (optimally sized, all Vdd high and Vth low ) Vdd high =1.2V, Vth high =0.23V Vdd low =0.6V, Vth low =0.12V % Savings compared to initial design Initial Power (uw) CVS only Backward Pass VVS Circuit Leakage Switching Total Leakage Switching Total Leakage Switching Total Leakage Switching Total c432 35.4 81.7 117.1 0.5% 1.9% 1.5% 0.5% 1.9% 1.5% 57.8% 6.0% 21.7% c880 48.9 140.1 188.9 20.6% 19.8% 20.0% 20.6% 19.8% 20.0% 44.0% 22.9% 28.4% c1908 75.3 202.7 278.0 5.4% 5.6% 5.5% 5.4% 5.6% 5.5% 44.1% 7.4% 17.4% c2670 100.0 248.9 349.0 20.3% 21.4% 21.1% 20.2% 37.8% 32.7% 20.2% 37.8% 32.7% c3540 131.6 302.6 434.2 3.4% 6.5% 5.6% 2.8% 26.4% 19.2% 49.4% 26.1% 33.2% c5315 210.9 413.8 624.7 21.2% 25.4% 23.9% 18.9% 50.5% 39.9% 19.0% 50.7% 40.0% c6288 544.3 1716.2 2260.5 1.1% 15.7% 12.2% 1.0% 15.8% 12.2% 20.3% 19.4% 19.6% c7552 214.9 521.4 736.3 30.2% 32.7% 32.0% 36.4% 50.8% 46.6% 36.6% 51.2% 46.9% Huffman 60.2 144.8 205.0 9.1% 9.3% 9.2% 20.9% 27.2% 25.4% 35.6% 27.0% 29.5% SOVA1 1483.2 3270.1 4753.3 42.7% 45.3% 44.5% 50.7% 57.0% 55.0% 83.3% 58.6% 66.3% SOVA2 3481.5 8016.7 11498.2 4.9% 5.1% 5.1% 41.5% 69.0% 60.7% 49.0% 69.8% 63.5% Average 290.5 704.2 994.7 15.4% 18.4% 17.6% 17.7% 29.3% 25.8% 41.0% 30.7% 33.6% High switching activity at primary inputs CVS+sizing (backward pass) does much better than just CVS Impact of Circuit Activity For low activities the algorithm successfully steers toward a better solution by attacking leakage power more directly In some benchmarks switching power is increased to minimize total power Low activities converges dual-vth + sizing VVS provides a single cohesive algorithm that seeks out best power reduction over a range of switching activities Ex: across functional units in a design Average power reduction by component across switching activities Activity Static Dynamic Total High (3) 41% 31% 34% Nominal (1) 69% 16% 45% Low (1/3) 73% 7% 59% Page 8

Other results For high switching activities, VVS assigns many gates to low Vdd and low Vth combination to attack dynamic power Exhaustive cutset enumeration was performed to find optimal results VVS performs close to optimal Least effective when optimal front lies in middle of circuit (more possibilities) % of total gates 60 50 40 30 20 10 0 (Vdd high, Vth low ) (Vdd low, Vth low ) (Vdd high, Vth high ) (Vdd low, Vth high ) 10 20 30 % Backoff Backoff Initial Power (uw) Final Power using VVS (uw) Final Power using cutset enumeration (uw) % Difference 1.2 117.10 91.70 91.70 0.00% 1.3 95.70 74.27 73.90 0.38% 1.4 78.60 57.84 56.90 1.20% 1.5 72.60 56.12 51.50 6.37% 1.6 66.70 48.40 46.80 2.40% Outline Concurrent Vdd/Vth assignment and sizing algorithm Standby mode leakage reduction using state, Vth, and Tox assignment Runtime leakage reduction with bus encoding + novel Vth assignment strategies Page 9

Leakage Current Components Subthreshold leakage (I sub ) Dominant when device is OFF Enhanced by reduced V t from process scaling Gate tunneling leakage (I gate ) Due to aggressive scaling of gate oxide thickness (T ox ) A super-exponential function of T ox Comparable to I sub in 90nm technologies Current [ua/um] 1.E+03 1.E-02 1.E-07 1.E-12 1.E-17 Year 1990 1995 2000 2005 2010 2015 2020 Subthreshold current 350 250 180 165 150 130 107 9080 70 65 50 35 25 Effective gate tunneling current Technology node [nm] High-k dielectrics expected to reach mainstream Low Power Standby Mode Previous approaches to put a circuit into standby mode State assignment [Halter, CICC1997] Multi-threshold CMOS (MTCMOS) [Mutoh, JSSC1995] Dual-V t assignment [Wei, DAC1998] Simultaneous state and V t assignment [Lee, DAC2003] Only for subthreshold leakage reduction Proposed work Leakage current reduction in standby mode Minimize both I sub and I gate Simultaneous state, V t and T ox assignment Gate leakage for PMOS One order of magnitude smaller than NMOS PMOS I gate is considered negligible in current analysis Page 10

Introduction Dual V t and Dual T ox Exploit dual oxide thickness technologies (becoming available) Dual T ox for I gate minimization Dual V t for I sub minimization Vt Low High Low High Assignment Oxide thickness Thin Thin Thick Thick Normalized values Leakage Delay 1.00 1.00 0.31 1.33 0.51 1.26 0.05 1.69 T ox ~ 3A, V t ~120mV, I gate /I leak =36% Both high V t and thick T ox : very large performance impact Overview of Approach If input state is unknown Cannot be predicted which transistors will be ON or OFF Some transistors must be assigned to both high-vt and thick oxide Given a known input state OFF device: I gate is small Considered only for high-v t ON device: no impact on I sub Only needs to be considered for thick T ox A transistor need not be assigned to both high-v t and thick T ox Significantly improved leakage/delay trade-off Only a subset of transistors need to be considered for high- V t or thick T ox Page 11

Exploit Input Pin Re-ordering I gate dependence of input pin ordering [Lee,DAC2003] I gate depends strongly on the position of ON/OFF transistors Place off-transistor at bottom of stack Reduce performance penalty of thick-oxide transistors Cell Library Options Library options Trade-off points for a given gate 4 vs. 2 Details in the paper (DATE04) V t or T ox assignment control in a stack individual-based vs. stack-based Both libraries have the same number of cells Stack control Individually Uniform # of tradeoff points 2 4 Design rule constraint for different V t and T ox assignment Page 12

Heuristics Exact solution has search space size of 2 n+2m (where n is # of PIs and m is # of gates) Branch and bound approach used Heuristic 1 Both state & gate tree: only one downward traversal Gate tree: pre-sorted by leakage Tends to produce a fast high quality solution Heuristic 2 Gate tree: only one downward traversal State tree: search w/time limit Results indicate Heuristic 1: fast runtime Heuristic 2: better results Results Leakage current comparison between heuristics 5% of maximum delay penalty Baseline is avg of 10K random All Low V Heu1 t & thin T input vectors ox X Time Delay with all low V t & thin T ox 0% 5% 10% 25% Delay with all high V t & thick T ox 100% c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 alu64 AVG 24.5 65.8 50.1 70.8 56.7 104.7 128.5 221.2 346.8 270.0 260.0 I leak 6.9 24.8 8.7 15.4 14.7 14.7 21.6 31.1 114.7 32.6 42.2 Current: (ua), time: (sec) Heu2 I leak X Time 3.6 2 3.8 6.5 1800 2.7 6 23.4 2.8 1800 5.7 7 7.7 6.5 1800 4.6 6 13.1 5.4 1800 3.9 4 13.5 4.2 1800 7.1 75 12.3 8.5 1800 6.0 17 19.9 6.5 1800 7.1 200 30.5 7.3 1800 3.0 63 107.5 3.2 1800 8.3 393 31.3 8.6 1800 6.2 455 40.4 6.4 1800 5.3 6.0 Page 13

Results Leakage current comparison vs. previous work At 25% delay penalty c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 alu64 AVG All low V t & thin T ox 24.5 65.8 50.1 70.8 56.7 104.7 128.5 221.2 346.8 270.0 260.0 I leak V t & State X 8.2 3.0 23.8 2.8 16.2 3.1 23.9 3.0 18.2 3.1 30.0 3.5 40.3 3.2 70.6 3.1 112 3.1 84.2 3.2 75.3 3.5 3.1 V t, T ox & State I leak X 2.7 9.2 7.5 8.8 7.0 7.1 7.6 9.3 6.2 9.2 11.3 9.2 13.7 9.4 24.1 9.2 36.8 9.4 28.3 9.5 28.0 9.3 9.1 Results Leakage current comparison between cell library options At 5% delay constraint c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 alu64 AVG All low V t & thin T ox 24.5 65.8 50.1 70.8 56.7 104.7 128.5 221.2 346.8 270.0 260.0 4-option individually I leak 6.9 24.8 8.7 15.4 14.7 14.7 21.6 31.1 114.7 32.6 42.2 X 3.6 2.7 5.7 4.6 3.9 7.1 6.0 7.1 3.0 8.3 6.2 5.28 2-option individually I leak 7.5 27.6 9.0 17.0 15.2 12.2 23.9 30.7 120.6 31.2 42.3 X 3.3 2.4 5.6 4.2 3.7 8.6 5.4 7.2 2.9 8.7 6.2 5.27 4-option uniform stack I leak 6.7 26.2 9.4 22.4 15.2 16.2 25.2 32.1 134.0 32.0 42.8 X 3.7 2.5 5.3 3.2 3.7 6.5 5.1 6.9 2.6 8.4 6.1 4.91 2-option uniform stack I leak 7.8 28.6 10.3 23.8 15.8 14.8 24.7 33.0 149.6 30.6 46.9 X 3.1 2.3 4.8 3.0 3.6 7.1 5.2 6.7 2.3 8.8 5.5 4.77 Page 14

Outline Concurrent Vdd/Vth assignment and sizing algorithm Standby mode leakage reduction using state, Vth, and Tox assignment Runtime leakage reduction with bus encoding + novel Vth assignment strategies Runtime leakage in buses 50% of total chip leakage in inverters/buffers Much of this in repeaters which are: Very wide Growing in # Heavily speed constrained so often use low Vth Do not experience stack effect as multi-input gates do Standby leakage reduction relatively easy compared to runtime We can absorb a delay penalty when we know that no new data is coming In runtime, data can come at any time; must be ready to process as fast as possible What can we do besides dual-vth? Page 15

Staggered Vth bus design 3.5 3.0 HVT SVT LVT Normalized Dynamic Energy 2.5 2.0 1.5 60% 1.0 1.00 1.25 1.50 1.75 2.00 Normalized Delay Selective use of high-vth devices yields the possibility of low leakage in runtime Stagger them along the wire to create a very low leakage state Delay (or dynamic energy) penalty is much lower than all high-vth We cannot dictate state in runtime so this does not help in general Unless we can dictate state Encoding to enforce proper state Choose a 3 4 encoding, also eliminate worst-case crosstalk Exact encoding selected to minimize total power Requires anticipated state and transition probabilities Ex: what is the most common state, what is the most common transition Also consider the encode/decode logic complexity Page 16

Reducing encoding complexity 16 1.06 # of Logic Gates 14 12 10 # of Logic Gates Normalized Total Power optimal tolerance 1.00 0 5 10 15 20 % Tolerance We consider all possible encodings (mappings from input states to actual transmitted encoded states) within T% of minimal Then use logic complexity as tiebreaker Results in 1-2% power penalty with 13% fewer gates/area overhead 1.05 1.04 1.03 1.02 1.01 Normalized Total Power Results (includes delay overhead) 1.0 LVT Bus SVT Bus Static Dynamic 1.0 Crosstalk Aware Only Crosstalk and Leakage Aware Static Dynamic Normalized Total Power 0.8 0.6 0.4 0.2 Normalized Total Power 0.8 0.6 0.4 0.2 0.0 bzip2 crafty eon gap gcc gzip mcf parser twolf test_1 Benchmark 0.13um CMOS at 105C, 64-bit Alpha architecture running 9 applications (address bus) 26% total power savings on average, 42% leakage reduction Maximal switching activity case (Test_1), total power still reduced 0.0 bzip2 crafty eon gap gcc gzip mcf parser twolf-- Benchmark Compared to previous crosstalkaware approaches, we save 54% total power (nearly all of it in leakage) Page 17

Alternate Repeater Vth Assignments Other possibilities of Vth assignment in repeaters can help reduce leakage in runtime Separate NMOS/PMOS Vth (SPNVt) All PMOS are low-vth, all NMOS are high-vth Advantages: predictable leakage (state independent), balances fast/slow paths through the repeater chain, easy to manufacture Mixed Vth Wide devices such as in repeaters are split into parallel fingers, separated by a contacted pitch Assign a fraction, α, of total width to low-vth (1- α is then high- Vth) Effectively a third Vth with speed and leakage behavior intermediate to high/low Vth No manufacturing costs for this 3 rd Vth, no area penalties since parallel fingers are spaced out significantly already Vth assignment scheme results Normalized Dynamic Energy 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.6 0.7 0.8 0.9 1.0 Normalized Delay SVt LVt SPNVt Mixed(Alpha) Vt Hybrid approaches are possible; upper bits in 64-bit address buses are usually zeroes Stagger to favor low-leakage 0s Mixed config: α = 0.3 Achievable speed is best for mixed, also good for SPNVt Runtime leakage of α = 0.3 is 54% lower than low-vth with small dynamic energy penalty Total average power reduction is 14% Switching behavior taken from 11 benchmark applications, address bus Strongly depends on ratio of static to dynamic power Page 18

Conclusions Need to leverage multi-everything to address the power management gap EDA must enable simultaneous sizing, Vdd, and Vth assignment; the 3 major knobs in power reduction Total power reductions on the order of 35-60% are achievable Standby mode leakage can be effectively reduced by combining state assignment with Vth and Tox assignment Sizable leakage reductions (5-9X) with modest delay penalties (3-15% vs. all low Vt and thin Tox) Much less overhead than MTCMOS, body biasing Runtime leakage in global interconnect repeaters can be addressed using Vth assignment schemes (sometimes with encoding) 40-54% leakage reductions with small dynamic power penalty Total power savings depends heavily on static/dynamic ratio Implies these techniques improve with scaling Mixed Vth provides pseudo-continuous Vth assignment, opening up a range of new optimizations in the energy/delay design space Page 19