CMOS Process Variations: A Critical Operation Point Hypothesis

Similar documents
Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law. Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs

Trends and Challenges in VLSI Technology Scaling Towards 100nm

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

CS61c: Introduction to Synchronous Digital Systems

EECS150 - Digital Design Lecture 2 - Synchronous Digital Systems Review Part 1. Outline

A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability

A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation

CS4617 Computer Architecture

VARIATION-TOLERANT MOTION ESTIMATION ARCHITECTURE. Girish V. Varatkar and Naresh R. Shanbhag

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

Introduction to VLSI ASIC Design and Technology

Lecture 9: Clocking for High Performance Processors

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

LSI and Circuit Technologies for the SX-8 Supercomputer

DESIGNING powerful and versatile computing systems is

Interconnect-Power Dissipation in a Microprocessor

Low-Power and Process Variation Tolerant Memories in sub-90nm Technologies

Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Low Power Design of Successive Approximation Registers

19. Design for Low Power

Chapter 1 Introduction

Power Spring /7/05 L11 Power 1

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

Reducing Transistor Variability For High Performance Low Power Chips

Investigation on Performance of high speed CMOS Full adder Circuits

Low Power Design for Systems on a Chip. Tutorial Outline

IN NANOSCALE CMOS devices, the random variations in

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

Lecture 7: Components of Phase Locked Loop (PLL)

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

Lecture 0: Introduction

A PC-BASED TIME INTERVAL COUNTER WITH 200 PS RESOLUTION

Low-Power Digital CMOS Design: A Survey

EECS 579 Fall What is Testing?

Pulse propagation for the detection of small delay defects

MICROPROCESSOR TECHNOLOGY

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

Lecture 33 - The Short Metal-Oxide-Semiconductor Field-Effect Transistor (cont.) April 30, 2007

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Incorporating Variability into Design

Practical Information

Reliability and Energy Dissipation in Ultra Deep Submicron Designs

Design of Low power and Area Efficient 8-bit ALU using GDI Full Adder and Multiplexer

Exploring High-Speed Low-Power Hybrid Arithmetic Units at Scaled Supply and Adaptive Clock-Stretching

Advanced Digital Integrated Circuits. Lecture 2: Scaling Trends. Announcements. No office hour next Monday. Extra office hour Tuesday 2-3pm

Extending Modular Redundancy to NTV: Costs and Limits of Resiliency at Reduced Supply Voltage

IMPLEMANTATION OF D FLIP FLOP BASED ON DIFFERENT XOR /XNOR GATE DESIGNS

Introduction. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

FinFET-based Design for Robust Nanoscale SRAM

Leakage Power Minimization in Deep-Submicron CMOS circuits

Advanced Digital Design

A REPORT ON LOW POWER VLSI CURCUIT DESIGN

An Efficient Design of CMOS based Differential LC and VCO for ISM and WI-FI Band of Applications

EE4800 CMOS Digital IC Design & Analysis. Lecture 1 Introduction Zhuo Feng

Dr. Ralf Sommer. Munich, March 8th, 2006 COM BTS DAT DF AMF. Presenter Dept Titel presentation Date Page 1

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment

PC accounts for 353 Cory will be created early next week (when the class list is completed) Discussions & Labs start in Week 3

CD40174BMS. CMOS Hex D -Type Flip-Flop. Features. Pinout. Applications. Functional Diagram. Description. December 1992

Deep Submicron Technology: Opportunity or Dead End for Dynamic Circuit Techniques

Dual-K K Versus Dual-T T Technique for Gate Leakage Reduction : A Comparative Perspective

Sensing Voltage Transients Using Built-in Voltage Sensor

Computer Architecture (TT 2012)

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Announcements. Advanced Digital Integrated Circuits. Project proposals due today. Homework 1. Lecture 8: Gate delays,

Technical Paper. Samuel Naffziger. Hewlett-Packard Co., Fort Collins, CO

Design Challenges in Multi-GHz Microprocessors

EE141-Spring 2007 Digital Integrated Circuits

Process and Environmental Variation Impacts on ASIC Timing

Lecture 19: Design for Skew

Statistical Static Timing Analysis Technology

Low Power 8-Bit ALU Design Using Full Adder and Multiplexer Based on GDI Technique

Low Transistor Variability The Key to Energy Efficient ICs

Opportunities and Challenges in Ultra Low Voltage CMOS. Rajeevan Amirtharajah University of California, Davis

Digital Design and System Implementation. Overview of Physical Implementations

CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997

Wide Fan-In Gates for Combinational Circuits Using CCD

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

Homework 10 posted just for practice. Office hours next week, schedule TBD. HKN review today. Your feedback is important!

SPIRO SOLUTIONS PVT LTD

Leakage Control Techniques for Designing Robust, Low Power Wide-OR Domino Logic for Sub-130nm CMOS Technologies

Course Content. Course Content. Course Format. Low Power VLSI System Design Lecture 1: Introduction. Course focus

Lecture 13 CMOS Power Dissipation

Amber Path FX SPICE Accurate Statistical Timing for 40nm and Below Traditional Sign-Off Wastes 20% of the Timing Margin at 40nm

EECS150 - Digital Design Lecture 9 - CMOS Implementation Technologies

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Low Power 8-Bit ALU Design Using Full Adder and Multiplexer

Implementation of dual stack technique for reducing leakage and dynamic power

EE241 - Spring 2013 Advanced Digital Integrated Circuits. Announcements. Lecture 13: Timing revisited

Pramoda N V Department of Electronics and Communication Engineering, MCE Hassan Karnataka India

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

/$ IEEE

Transistor Network Restructuring Against NBTI Degradation. P. F. Butzen a, V. Dal Bem a, A. I. Reis b, R. P. Ribas b.

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

Lecture 11: Clocking

A Low Complexity and Highly Robust Multiplier Design using Adaptive Hold Logic Vaishak Narayanan 1 Mr.G.RajeshBabu 2

Transcription:

CMOS Process Variations: A Critical Operation Point Hypothesis Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign jhpatel@uiuc.edu Computer Systems Colloquium (EE380) Stanford University April 2, 2008 2008 Janak H. Patel

Outline CMOS Process variations Current status Future projections A new Hypothesis on Critical Operation Point A Thought Experiment giving rise to the hypothesis Two Real Experiments in support of the hypothesis Potential exploits of the new hypothesis Power savings in large data-centers 2

Process Variations Sources of Variations Gate Oxide thickness (T OX ) Random Doping Fluctuations (RDF) Device geometry, Lithography in nanometer region Transistor Threshold Voltage (V T ) Sub threshold current, leakage, power, frequency Range of Variations 100% V T variation across a modern chip 30% speed variation across a wafer 100% leakage (static power) variation in a wafer 3

Static Variations today (source: Shekhar Borkar, Intel) 4

FMAX statistical analysis Source: Bowman, K.A.; Duvall, S.G.; Meindl, J.D., "Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration," Solid-State Circuits, IEEE Journal of, vol.37, no.2, pp.183-190, Feb 2002 5

Process Variations and Slack Time FFs FFs Combinational Logic clock Present State Signal Propagations Next State Slack time/ Guard band/ Safety margin Slack Time Reduction Process Variations Clock Frequency Supply Voltage Ambient Temperature Gate and Pin Switching rate Years of Aging 6

Errors and Process Variations 2 Errors Per 1 Day/month Parameters: Clock Frequency Supply Voltage Ambient Temperature Gate and Pin Switching rate Years of Aging Process Variations Reducing Process guard band (e.g. reducing slack time) Errors: All are timing errors No spontaneous bit flips 7

Protecting against process variations If the error rate from added delays remains relatively small, we can utilize some of the established techniques iroc, Razor, Biser etc. Error coding Parity codes, Arithmetic codes, Residue codes, Parity prediction, Algorithm-based fault-tolerance, TMR etc. Time redundancy like RESO What if the error rate is massive? Are massive errors possible in a good chip? 8

How many flip-flops on critical-paths? Consider a 1-Ghz chip with a million flip-flops Let us divide the 1ns Clock period in to 1000 bins Put a FF in bin p if the longest path at its input has a delay of p picoseconds How many FFs are in bins 900ps to 950ps? Flip Flops 200,000 150,000 100,000 50,000 0 100 200 300 400 500 600 700 800 900 1000 ps Path Delay in picoseconds 9

How many flip-flops on critical-paths? Consider a 1-Ghz chip with a million flip-flops Let us divide the 1ns Clock period in to 1000 bins Put a FF in bin p if the longest path at its input has a delay of p picoseconds How many FFs are in bins 900ps to 950ps? Flip Flops 200,000 150,000 100,000 50,000 0 100 200 300 400 500 600 700 800 900 1000 ps Path Delay in picoseconds 10

How many flip-flops on critical-paths? FFs 1M FFs Combinational Logic 200,000? 100,000? 50,000? Present State Next State clock Signal Propagations 0 900 950 1000 ps 11

A Thought Experiment Let us conservatively assume 100,000 ffs are on critical paths (10% of total) Consider any of the following factors that reduce the slack time of these ffs. Increase clock frequency (reduce cycle time) Decrease supply voltage (increases gate delays) Add years of aging (gates get slower with age) Increase process variations (larger sigma) Assume just 10% of critical ffs get its inputs late this cycle This implies 10,000 flip-flops produce errors in a single clock cycle! Massive number of errors result in a few clock cycles 12

Do your own Thought Experiment! Total Number of Flip-Flops: 400,000 Only 5% of these are on critical paths: 20,000 FFs Only 1% of these receive critical signals: 200 FFs In 10 consecutive clock cycles: 2000 errors! Do your own Thought Experiment Estimate number of FFs on critical paths from timing analysis or synthesis report. Guesstimate, % of active signals. How many errors in 10, 100 or 1000 consecutive clock cycles? Is there any scenario that doesn t lead to a catastrophic failure in an extremely short time? 13

A new hypothesis 2 OLD 10 6 NEW Errors 1 Per Hour/day 10 4 Errors/ Cycle Increase Clock Frequency Or Decrease Supply Voltage Or Increase Ambient Temperature Or Increase Process Variations Increase Clock Frequency Or Decrease Supply Voltage Or Increase Ambient Temperature Or Increase Process Variations 14

Hypothesis of Critical Operation Point In large CMOS circuits there exists a Critical Operating Frequency F C and Critical Voltage V C for a fixed ambient temperature T, such that Any frequency above F C causes massive errors Any voltage below V C causes massive errors Any frequency below F C or voltage above V C, no process related errors occur In practice, F C and V C are not single points, but are confined to an extremely narrow range for a given ambient temperature T C 15

F C and V C : Points or a Range? During a systematic search for the critical point, one will find a point when the system crashes Critical point varies in a very narrow range from one experimental search to another, most likely due to temperature variations Practically it is impossible to control the junction temperature of each transistor to a precise number T C 10 6 10 4 Errors/ Cycle Outcome of two distinct Experiments on the same chip 16

Experiments to disprove the hypothesis Subject a large chip to slowly increasing frequency or slowly decreasing supply voltage At each step, exercise the chip extensively and monitor continuously for any errors Two microprocessors were set up for detecting errors in the presence of reduced supply voltage PowerPC 750, 2.5V, 233MHz C-program to exercise and monitor for errors Pentium-M, 1.308V, 2GHz Third-party program to keep the cpu 100% busy and report errors (more like a power virus!) 17

Experiment to find which of these two? 2 OLD 10 6 NEW Errors 1 Per Day/month Decrease Supply Voltage 10 4 Errors/ Cycle Decrease Supply Voltage 18

Experimental Set-UP A Single-Board-Computer with PowerPC 750 233MHz, 2.5V Power Supply A Hewlett-Packard E3631A Power Supply Digital control in units of 10 milivolts steps A Blow-Drier to raise the ambient temperature A Program written to stress all major functional blocks Tried to maximize execution rate (load) Tried to maximize logic switching rate Every operation was checked against known good values and instantly reported for any error 19

Stressing PowerPC 750 (233 MHz) Routine Operations per loop Number of loops Total Operations Approx. Running Time Approx. Operations Per Second Register Unit 40 8,000,000 320,000,000 6.34 s 50.47x10 6 Instruction Fetch Unit 32 8,000,000 256,000,000 92.04 s 2.78x10 6 Integer Addition 40 8,000,000 320,000,000 9.35 s 34.22x10 6 Integer Subtraction 40 8,000,000 320,000,000 9.12 s 35.09x10 6 Integer Multiplication 58 8,000,000 464,000,000 18.21 s 25.48x10 6 Integer Division 50 8,000,000 400,000,000 33.72 s 11.86x10 6 Logical AND 20 8,000,000 160,000,000 0.71 s 225.35x10 6 Logical OR 20 8,000,000 160,000,000 0.64 s 250.00x10 6 Logical XOR 20 8,000,000 160,000,000 0.71 s 225.35x10 6 Integer Unit 2 40 adds & multiplies 8,000,000 640,000,000 48.75 s 13.13x10 6 Floating Point Add 20 8,000,000 160,000,000 0.82 s 195.12x10 6 Floating Point Subtract 20 8,000,000 160,000,000 0.82 s 195.12x10 6 Floating Point Multiply 20 8,000,000 160,000,000 0.83 s 192.77x10 6 Floating Point Divide 20 8,000,000 160,000,000 0.82 s 195.12x10 6 Branch Processing Unit 7 8,000,000 56,000,000 6.09 s 9.20x10 6 Load/Store Unit 320 loads, 192 stores 80,000 40,960,000 13.24 s 3.09x10 6 Data Cache 2 3,300,000 6,600,000 15.97 s 0.41x10 6 20

Results of Lowering Supply Voltage Power PC-750 µp Observations No. Critical Supply System Tests Voltage V C Hangs 1 45 1.99 V 2.10 V 31 14 2 35 2.00 V 2.08 V 26 9 3 25 2.10 V 2.29 V 18 7 4 25 2.08 V 2.20 V 17 8 Chip No. Program Crashed Nominal Supply Voltage of 2.5 V is reduced in steps of 1/100 th Volt with clock frequency constant at 233MHz No Data Error was ever Observed at user visible Registers! 21

More recent Experiment Processor: Pentium-M, speed step technology Rated at 2GHz at core voltage of 1.308V Experiment While keeping cpu 100% busy at 2GHz, reduced the voltage in steps of 16mV Third party software claimed to report errors Reduced voltage 15 steps down to 1.068 with no errors At the next step down to 1.052V, cpu crashed No errors observed only crashes! Similar results at seven other frequencies 22

Experiment on Pentium-M Critical Voltage vs Frequency 1.4 1.3 1.2 V Specification V Critical Supply Volatge 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.80 1.06 1.20 1.33 1.46 1.60 1.73 1.86 2.00 Clock Frequency in GHz 23

Some Remarks on Experiment Possible explanation for the observations A modern processor has a large number of flipflops that are not user visible e.g. Pre-fetch buffers, history tables, reservation stations, write buffers, and state controllers for everything from moving instructions and data to controlling a cache Control Logic fails simultaneously with ALU datapath Massive errors in control and data in a single cycle Instruction flow is completely disrupted. Therefore no error could be reported. Catastrophic failure! 24

Personal Remarks CMOS technology is robust now and will continue to be so for the foreseeable future Process Variation related errors if any, must be massive No industry can survive with massive failures Process variations must remain bounded within some reasonable limits Moore s Law continues to hold! 45nm with (HiK+MG) has lower RDF and T OX variations than 65nm [Kelin J. Kuhn, Reducing Variation in Advanced Logic Technologies: Approaches to Process and Design for Manufacturability of Nanoscale CMOS, IEDM 2007.] 25

Exploiting Process Variations If the critical operation point hypothesis holds Above critical frequency F C massive failure occurs, below this point error-free operation results Below critical supply voltage V C massive failure occurs, above it error-free operation results In data-centers with 1000 s of µps, operating each µp with the lowest V C for a given frequency can save lots of power As the number of cores approach 100 or more, it would be imperative to use different voltagefrequency pair (F C, V C ) for each core on the same die 26

Dynamic Power Savings in Pentium-M Power savings when operating the processor at Vc for each Fc 35% 30% Power Saving 25% 20% 15% 10% 5% 0% 0.80 1.06 1.20 1.33 1.46 1.60 1.73 1.86 2.00 Clock Frequencey in GHz 27

Future Research Need to verify the proposed hypothesis with more experiments or simulations Off-line Test To determine several critical frequency-voltage pairs (F C, V C ) for each die and possibly each core on the die On-line Test To establish new frequency-voltage pairs (F C, V C ) in the field at the time of deployment To monitor aging, since (F C, V C ) may shift with age Self-Test Self Calibrate periodically to arrive at current (F C, V C ) 28

Questions? Comments? 29

30

31

32