Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

Similar documents
EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

ΙΑΛΕΞΗ 11: Low Power Architectures

On Reducing Leakage Energy

EECS 427 Lecture 22: Low and Multiple-Vdd Design

Low-Power Digital CMOS Design: A Survey

EDA Challenges for Low Power Design. Anand Iyer, Cadence Design Systems

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Power Spring /7/05 L11 Power 1

A Static Power Model for Architects

Low Power Design in VLSI

Chapter 1 Introduction

Low Power Design of Successive Approximation Registers

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

A Survey of the Low Power Design Techniques at the Circuit Level

Jan Rabaey, «Low Powere Design Essentials," Springer tml

UNIT-II LOW POWER VLSI DESIGN APPROACHES

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

Low-Power CMOS VLSI Design

Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law. Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs

Logic Restructuring Revisited. Glitching in an RCA. Glitching in Static CMOS Networks

Low Power Design for Systems on a Chip. Tutorial Outline

CS4617 Computer Architecture

ECE 484 VLSI Digital Circuits Fall Lecture 02: Design Metrics

19. Design for Low Power

Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Low Power Realization of Subthreshold Digital Logic Circuits using Body Bias Technique

Low Power Design. Prof. MacDonald

Design Challenges in Multi-GHz Microprocessors

Lecture 9: Clocking for High Performance Processors

Chapter 3 Novel Digital-to-Analog Converter with Gamma Correction for On-Panel Data Driver

Dual Programmable Clock Generator

Lecture 04 CSE 40547/60547 Computing at the Nanoscale Interconnect

Lecture 17 Low-Power Design: Dynamic Body Bias Energy Recovery in CMOS SOI. Midterm project reports due this Friday

Olivier Sentieys. IRISA/INRIA Cairn team. Power Consumption in Silicon Chips. Chips, logic gates and transistors.

Design and Implementation of Digital CMOS VLSI Circuits Using Dual Sub-Threshold Supply Voltages

Digital Integrated Circuits Lecture 20: Package, Power, Clock, and I/O

1. Short answer questions. (30) a. What impact does increasing the length of a transistor have on power and delay? Why? (6)

Chapter 6 Combinational CMOS Circuit and Logic Design. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Power and Energy. Courtesy of Dr. Daehyun Dr. Dr. Shmuel and Dr.

EE141-Spring 2007 Digital Integrated Circuits

Announcements. Advanced Digital Integrated Circuits. Midterm feedback mailed back Homework #3 posted over the break due April 8

1. The decimal number 62 is represented in hexadecimal (base 16) and binary (base 2) respectively as

Lecture 3 Switched-Capacitor Circuits Trevor Caldwell

On the Rules of Low-Power Design

Low Power Techniques for SoC Design: basic concepts and techniques

ADC Bit µp Compatible A/D Converter

An Overview of Static Power Dissipation

A LOW POWER DESIGN FOR ARITHMETIC AND LOGIC UNIT

Energy-Recovery CMOS Design

Trends and Challenges in VLSI Technology Scaling Towards 100nm

Sleepy Keeper Approach for Power Performance Tuning in VLSI Design

HI Bit, 40 MSPS, High Speed D/A Converter

Power Considerations in the Design of the Alpha Microprocessor

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

Exploring High-Speed Low-Power Hybrid Arithmetic Units at Scaled Supply and Adaptive Clock-Stretching

POWER GATING. Power-gating parameters

The challenges of low power design Karen Yorav

Lecture 13 CMOS Power Dissipation

DATASHEET HI-524. Features. Applications. Functional Diagram. Ordering Information. Pinout. 4-Channel Wideband and Video Multiplexer

USB4. Encoder Data Acquisition USB Device Page 1 of 8. Description. Features

Low-Power Multipliers with Data Wordlength Reduction

Pushing Ultra-Low-Power Digital Circuits

ECE 471 Embedded Systems Lecture 31

2002 IEEE International Solid-State Circuits Conference 2002 IEEE

ML4818 Phase Modulation/Soft Switching Controller

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Pramoda N V Department of Electronics and Communication Engineering, MCE Hassan Karnataka India

Lecture 19: Design for Skew

Low Power VLSI CMOS Design. An Image Processing Chip for RGB to HSI Conversion

ADC0808/ADC Bit µp Compatible A/D Converters with 8-Channel Multiplexer

IN the design of the fine comparator for a CMOS two-step flash A/D converter, the main design issues are offset cancelation

UNIT-1 Fundamentals of Low Power VLSI Design

444 Index. F Fermi potential, 146 FGMOS transistor, 20 23, 57, 83, 84, 98, 205, 208, 213, 215, 216, 241, 242, 251, 280, 311, 318, 332, 354, 407

Programmable Clock Generator

Assoc. Prof. Dr. Burak Kelleci

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Low-Power Design for Embedded Processors

Solid State Device Fundamentals

DATASHEET HI1171. Ordering Information. Typical Application Circuit. Pinout. 8-Bit, 40 MSPS, High Speed D/A Converter. FN3662 Rev.3.

An Analysis for Power Minimization at Different Level of Abstraction to Optimize Digital Circuit

EEC 118 Lecture #1: MOSFET Overview. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

Power dissipation in CMOS

Technical Paper FA 10.3

Lecture 8. Summary of Amplifier Design Methods Specific G T and F. Transistor Biasing. Lecture 8 RF Amplifier Design

CMOS Technology for Computer Architects

Course Content. Course Content. Course Format. Low Power VLSI System Design Lecture 1: Introduction. Course focus

NOVEL OSCILLATORS IN SUBTHRESHOLD REGIME

Eliminate Pipeline Headaches with New 12-Bit 3Msps SAR ADC by Dave Thomas and William C. Rempfer

Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University

Cold-Junction-Compensated K-Thermocoupleto-Digital Converter (0 C to +128 C)

Low Voltage Standard CMOS Opamp Design Techniques

Digital Electronics 8. Multiplexer & Demultiplexer

INTEGRATED CIRCUITS. AN179 Circuit description of the NE Dec

DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION

LM13600 Dual Operational Transconductance Amplifiers with Linearizing Diodes and Buffers

Electronic Circuits EE359A

Outline. Noise and Distortion. Noise basics Component and system noise Distortion INF4420. Jørgen Andreas Michaelsen Spring / 45 2 / 45

Lecture 20 Transistor Amplifiers (II) Other Amplifier Stages

Transcription:

Topics Low Power Techniques Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J. Rabaey

Review: Energy & Power Equations E = C L V 2 DD P 0 1 + t sc V DD I peak P 0 1 + V DD I leakage f 0 1 = P 0 1 * f clock P = C L V 2 DD f 0 1 + t sc V DD I peak f 0 1 + V DD I leakage Dynamic power (~90% today and decreasing relatively) Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing)

Power and Energy Design Space Energy Active Constant Throughput/Latency Design Time Logic Design Reduced V dd Sizing Multi-V dd Non-active Modules Clock Gating Sleep Transistors Variable Throughput/Latency Run Time DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-V T Multi-V dd + Variable V T Variable V T

Bus Multiplexing Buses are a significant source of power dissipation due to high switching activities and large capacitive loading 15% of total power in Alpha 21064 30% of total power in Intel 80386 Share long data buses with time multiplexing (S 1 uses even cycles, S 2 odd) S 1 D 1 S 1 D 1 S 2 D 2 S 2 D 2 But what if data samples are correlated (e.g., sign bits)?

Correlated Data Streams Bit switching probabilities Muxed Dedicated 0 14 12 10 8 6 4 2 0 MSB LSB Bit position 1 0.5 For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) Bus sharing should not be used for positively correlated data streams Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching

Glitch Reduction by Pipelining Glitches depend on the logic depth of the circuit - gates deeper in the logic network are more prone to glitching arrival times of the gate inputs are more spread due to delay imbalances usually affected more by primary input switching Reduce logic depth by adding pipeline registers additional energy used by the clock and pipeline registers Fetch Decode Execute Memory WriteBack PC Instruction MAR I$ D$ MDR pipeline stage isolation register clk

Power and Energy Design Space Energy Active Constant Throughput/Latency Design Time Logic Design Reduced V dd Sizing Multi-V dd Non-active Modules Clock Gating Sleep Transistors Variable Throughput/Latency Run Time DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-V T Multi-V dd + Variable V T Variable V T

Clock Gating Most popular method for power reduction of clock signals and functional units Gate off clock to idle functional units e.g., floating point units need logic to generate disable signal increases complexity of control logic consumes power timing critical to avoid clock glitches at OR gate output additional gate delay on clock signal gating OR gate can replace a buffer in the clock distribution tree clock R e g disable Functional unit

Clock Gating in a Pipelined Datapath For idle units (e.g., floating point units in Exec stage, WB stage for instructions with no write back operation) Fetch Decode Execute Memory WriteBack PC Instruction MAR I$ D$ MDR clk No FP No WB

Power and Energy Design Space Energy Active Constant Throughput/Latency Design Time Logic Design Reduced V dd Sizing Multi-V dd Non-active Modules Clock Gating Sleep Transistors Variable Throughput/Latency Run Time DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-V T Multi-V dd + Variable V T Variable V T

Review: Dynamic Power as a Function of V DD Decreasing the V DD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance) t p(normalized 5.5 5 4.5 4 3.5 3 2.5 ) 2 1.5 1 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 V DD (V) Determine the critical path(s) ) at design time and use high V DD for the transistors on those paths for speed. Use a lower V DD on the other logic to reduce dynamic energy consumption.

Dynamic Frequency and Voltage Scaling Intel s SpeedStep Hardware that steps down the clock frequency (dynamic frequency scaling DFS) when the user unplugs from AC power PLL from 650MHz 500MHz CPU stalls during SpeedStep adjustment

Dynamic Frequency and Voltage Scaling Transmeta LongRun Hardware that applies both DFS and DVS (dynamic supply voltage scaling) 32 levels of V DD from 1.1V to 1.6V PLL from 200MHz 700MHz in increments of 33MHz Triggered when CPU load change is detected by software heavier load ramp up V DD, when stable speed up clock lighter load slow down clock, when PLL locks onto new rate, ramp down V DD CPU stalls only during PLL relock (< 20 microsec)

Dynamic Thermal Management (DTM) Trigger Mechanism: When do we enable DTM techniques? Initiation Mechanism: How do we enable technique? Response Mechanism: What technique do we enable?

DTM Trigger Mechanisms Mechanism: How to deduce temperature? Direct approach: on-chip temperature sensors Based on differential voltage change across 2 diodes of different sizes May require >1 sensor Hysteresis and delay are problems Policy: When to begin responding? Trigger level set too high means higher packaging costs Trigger level set too low means frequent triggering and loss in performance Choose trigger level to exploit difference between average and worst case power

DTM Initiation and Response Mechanisms Operating system or microarchitectural control? Hardware support can reduce performance penalty by 20-30% Initiation of policy incurs some delay When using DVS and/or DFS, much of the performance penalty can be attributed to enabling/disabling overhead Increasing policy delay reduces overhead; smarter initiation techniques would help as well Thermal window (100Kcycles+) Larger thermal windows smooth short thermal spikes

DTM Activation and Deactivation Cycle Trigger Reached Turn Response On Check Temp Check Temp Turn Response Off Initiation Delay Response Delay Policy Delay Shutoff Delay Initiation Delay OS interrupt/handler Response Delay Invocation time (e.g., adjust clock) Policy Delay Number of cycles engaged Shutoff Delay Disabling time (e.g., re-adjust clock)

DTM Savings Benefits Designed for cooling capacity without DTM Temperature Designed for cooling capacity with DTM DTM trigger level System Cost Savings DTM Disabled DTM/Response Engaged Time

Power and Energy Design Space Energy Active Constant Throughput/Latency Design Time Logic Design Reduced V dd Sizing Multi-V dd Non-active Modules Clock Gating Sleep Transistors Variable Throughput/Latency Run Time DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-V T Multi-V dd + Variable V T Variable V T

Speculated Power of a 15mm µp Power (Watts) Power (Watts) 70 60 50 40 30 20 10-70 60 50 40 30 20 10 30-30 0.25µ, 15mm die, 2V 0% 0% 0% 0% 1% 1% 1% 2% 3% 40 50 60 70 80 Temp (C) 90 Leakage Active 100 110 Leakage 0.13µ, 15mm die. 1V Active 26% 20% 1% 2% 3% 5% 8% 11% 15% 40 50 60 70 80 Temp (C) 90 100 110 Power (Watts) Power (Watts) 70 60 50 40 30 20 10-70 60 50 40 30 20 10-30 30 0.18µ, 15mm die, 1.4V Active 0% 0% 1% 1% 2% 3% 5% 7% 9% 40 6% 40 50 60 70 80 Temp (C) 90 33% 26% 19% 9% 14% 50 0.1µ, 15mm die, 0.7V 60 Leakage Active 70 80 Temp (C) 90 Leakage 100 110 41% 49% 56% 100 110

Review: Leakage as a Function of Design Time V T Reducing the V T increases the sub- threshold leakage current (exponentially) But, reducing V T decreases gate delay (increases performance) ID (A) performance) 0 0.2 0.4 0.6 0.8 1 VGS (V) VT=0.4V VT=0.1V Determine the critical path(s) ) at design time and use low V T devices on the transistors on those paths for speed. Use a high V T on the other logic for leakage control.

V T (V) Review: Variable V T (ABB) at Run Time V T = V T 0 + γ 2φ F + V SB 2φ F where V T0 is the threshold voltage at V SB = 0 V SB is the source-bulk (substrate) voltage γ is the body-effect coefficient For an n-channel device, 0.9 the substrate is normally tied 0.85 to ground 0.8 ( ) A negative bias causes V T to increase from 0.45V to 0.85V Adjusting the substrate bias at run time is called adaptive body-biasing (ABB) 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4-2.5-2 -1.5-1 -0.5 0 V SB (V)

Next class Testing and Verification Exam April 12th No lab tomorrow Work on final project