On-silicon Instrumentation

Similar documents
White Paper Stratix III Programmable Power

Improved Delay Measurement Method in FPGA based on Transition Probability

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

DESIGNING powerful and versatile computing systems is

Towards PVT-Tolerant Glitch-Free Operation in FPGAs

induced Aging g Co-optimization for Digital ICs

Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law. Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

Implementation of High Precision Time to Digital Converters in FPGA Devices

Process and Environmental Variation Impacts on ASIC Timing

Fine-Grained Characterization of Process Variation in FPGAs

ECEN720: High-Speed Links Circuits and Systems Spring 2017

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Phase interpolation technique based on high-speed SERDES chip CDR Meidong Lin, Zhiping Wen, Lei Chen, Xuewu Li

Lecture 11: Clocking

Timing Issues in FPGA Synchronous Circuit Design

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

Digital Systems Design

A10-Gb/slow-power adaptive continuous-time linear equalizer using asynchronous under-sampling histogram

SPIRO SOLUTIONS PVT LTD

A single-slope 80MS/s ADC using two-step time-to-digital conversion

R Using the Virtex Delay-Locked Loop

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

Lecture 9: Clocking for High Performance Processors

Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance

CMOS VLSI IC Design. A decent understanding of all tasks required to design and fabricate a chip takes years of experience

Implementing Logic with the Embedded Array

Challenges of in-circuit functional timing testing of System-on-a-Chip

Research Article Analysis and Enhancement of Random Number Generator in FPGA Based on Oscillator Rings

PE713 FPGA Based System Design

Static Timing Overview with intro to FPGAs. Prof. MacDonald

ECE520 VLSI Design. Lecture 5: Basic CMOS Inverter. Payman Zarkesh-Ha

DATE 2016 Early Reliability Modeling for Aging and Variability in Silicon System (ERMAVSS Workshop)

Geared Oscillator Project Final Design Review. Nick Edwards Richard Wright

Jitter Measurements using Phase Noise Techniques

Fast Characterization of PBTI and NBTI Induced Frequency Shifts under a Realistic Recovery Bias Using a Ring Oscillator Based Circuit

Multiple Reference Clock Generator

Trends and Challenges in VLSI Technology Scaling Towards 100nm

PV SYSTEM BASED FPGA: ANALYSIS OF POWER CONSUMPTION IN XILINX XPOWER TOOL

A Digital Clock Multiplier for Globally Asynchronous Locally Synchronous Designs

Temperature Monitoring and Fan Control with Platform Manager 2

NOWADAYS, many Digital Signal Processing (DSP) applications,

High quality standard frequency transfer

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

ATA Memo No. 40 Processing Architectures For Complex Gain Tracking. Larry R. D Addario 2001 October 25

CHAPTER 4 GALS ARCHITECTURE

Announcements. Advanced Digital Integrated Circuits. Midterm feedback mailed back Homework #3 posted over the break due April 8

2009 Spring CS211 Digital Systems & Lab 1 CHAPTER 3: TECHNOLOGY (PART 2)

An Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing. Rajeevan Amirtharajah University of California, Davis

Effect of Aging on Power Integrity of Digital Integrated Circuits

Invasive and Non-Invasive Detection of Bias Temperature Instability

BICMOS Technology and Fabrication

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

EC 1354-Principles of VLSI Design

INF3430 Clock and Synchronization

Michael S. McCorquodale, Ph.D. Founder and CTO, Mobius Microsystems, Inc.

Clock and Data Recovery With Coded Data Streams Author: Leonard Dieguez

Thermal Characterization and Optimization in Platform FPGAs

Variability-Aware Circuit Performance Optimisation Through Digital Reconfiguration

Heterogeneous Concurrent Error Detection (hced) Based on Output Anticipation

Source Coding and Pre-emphasis for Double-Edged Pulse width Modulation Serial Communication

International Research Journal in Advanced Engineering and Technology (IRJAET)

An All-Digital Approach to Supply Noise Cancellation in Digital Phase-Locked Loop

GROK-LAB: Generating Real On-chip Knowledge for Intra-cluster Delays Using Timing Extraction

Temperature Monitoring and Fan Control with Platform Manager 2

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

FPGA Based System Design

Managing Metastability with the Quartus II Software

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

DLL Based Frequency Multiplier

Jitter Analysis Techniques Using an Agilent Infiniium Oscilloscope

Analogue to Digital Conversion

Single Chip Velocity Measurement System for Incremental Optical Encoders

Lecture 1. Tinoosh Mohsenin

FPGA Circuits. na A simple FPGA model. nfull-adder realization

VLSI Design Verification and Test Delay Faults II CMPE 646

Reduce Power Consumption for Digital Cmos Circuits Using Dvts Algoritham

A Compact, Low-Power Low- Jitter Digital PLL. Amr Fahim Qualcomm, Inc.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH

Lecture 160 Examples of CDR Circuits in CMOS (09/04/03) Page 160-1

FUNDAMENTALS OF ANALOG TO DIGITAL CONVERTERS: PART I.1

A VCO-based analog-to-digital converter with secondorder sigma-delta noise shaping

EECS 427 Lecture 22: Low and Multiple-Vdd Design

Lecture 19: Design for Skew

An Improved DCM-based Tunable True Random Number Generator for Xilinx FPGA

64-Macrocell MAX EPLD

UTILIZATION OF AN IEEE 1588 TIMING REFERENCE SOURCE IN THE inet RF TRANSCEIVER

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

An Introduction to Jitter Analysis. WAVECREST Feb 1,

Design Strategy for a Pipelined ADC Employing Digital Post-Correction

Arria V Timing Optimization Guidelines

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

PRFloor: An Automatic Floorplanner for Partially Reconfigurable FPGA Systems

IMPLEMENTATION OF QALU BASED SPWM CONTROLLER THROUGH FPGA. This Chapter presents an implementation of area efficient SPWM

EDA Challenges for Low Power Design. Anand Iyer, Cadence Design Systems

Spread Spectrum Frequency Timing Generator

NanoFabrics: : Spatial Computing Using Molecular Electronics

Lecture 4. The CMOS Inverter. DC Transfer Curve: Load line. DC Operation: Voltage Transfer Characteristic. Noise in Digital Integrated Circuits

Transcription:

On-silicon Instrumentation An approach to alleviate the variability problem Peter Y. K. Cheung Department of Electrical and Electronic Engineering 18 th March 2014 U. of York How we started (in 2006)! Process variability hot issue at the time! The curses of FPGAs " Used for ANY design, assume worst case in everything! The blessings of FPGAs Self-test is almost free (bitstream storage & time) Ability to reconfigure The opportunity: LATE BINDING Page 2

What is Conventional Binding? Current Performed once for ALL chips at Place-and-Route logical view physical view Page 3 What is Late Binding? LATE BINDING logical view Performed part of this Mapping AS LATE AS POSSIBLE for EACH chip based on its individual characteristics physical view Page 4

Late Binding FPGA Configuration measure delays LATE BINDING ALGORITHM Page 5 Instrument 1: Ring Oscillators Application of Instrument 1: Investigate process variability in FPGAs How bad is stochastic variation as compared with systematic variation for 90nm? Page 6

Xilinx Altera interoperability! device under test: Altera Cyclone II measurement circuit: Xilinx Virtex-4 Page 7 Have we measured the right thing? x 10 8 Frequency x 10 8 2.7 2.65 2.6 2.68 2.66 2.64 2.62 thermal effects & self heating? sensitivity to place and route? measurement error? 2.55 0 0 2.6 10 20 20 40 2.58 30 60 LAB row (Y) 40 80 LAB column (X) Page 8 Error source Error (3σ) Noise 0.038% Scan order 0.002% Place and route 0.223% LSB of counter 0.02% (max)

Modelling measured loop delay model of correlated = + variation stochastic variation delay x 10-9 x 10-9 x 10-11 1.95 1.95 4 2 1.9 1.9 0-2 1.85 100 column 50 0 0 20 row 40 1.85 100 50 0 0 20 40-4 100 50 0 0 20 40 Page 9 Probability 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0-2 -1.5-1 -0.5 0 0.5 1 1.5 2 Delay model residuals (percent of mean) Cyclone II FPGAs: 90nm technology EP2C35 part 18 devices Stochastic 3σ variation per LUT = ±3.54% Correlated variation per LUT < 3.66% Sedcole & Cheung, "Within-die Delay Variability in 90nm FPGAs and Beyond, FPT 2006 Page 10

Ring Oscillators is a BAD Instrument Easy to implement It gets Hot No Thanks Out Poor representation of circuit paths in real FPGA designs. Combinatorial loops!? Inaccurate Only gives average delay between rising and falling transitions, NOT worst-case: Vdd PMOS t rise Out NMOS t fall t fall t rise Page 11 Instrument 2: Failure Rate Detector Failure Rate Detector (FRD) circuit Clock Period CUT delay Error histogram freq. A combinatorial circuit in a pipelined structure (CUT). Clock frequency increased until pipeline fails. EDC detects the error and increment error count on the EHA. Page 12

KEY IDEA: Exploit PLL Measurement Resolution Δt = t 1 1 1 t 2 = f f 1 ( Δf % & + f # ' $ 1 Δf f 2 f f + f t 1 t 2 t Worse-case timing resolution from 300 to 800MHz = 1.33ps Average timing resolution < 1ps Page 13 Assumptions Clock jitter is approximately Gaussian with symmetrical probability distribution. pdf 1 cdf pdf Expected clock edge 50% t t Given that the probability density function (pdf) of the clock jitter is symmetrical The resultant cumulative distribution function (cdf) would have its 50% point centered at the expected position of the clock edge Page 14

Failure Rate Profile Explained Positive Edge failed Measured Failure Rate Curve Negative Edge failed Negative Edge failed Positive Edge failed Clock D (Case 1) D (Case 2) Page 15 Application 1 (Instr 2): Better LUT Delay Map FPGA Chip wide Delay map Results obtained from Cyclone II using the measurement circuit CUT: minimum 2 LUTs as inverter. Page 16 Wong%&%Cheung,% Self%measurement-of-combinatorial-circuit-delays-in-FPGAs%,% ACM%TRETS,%(2)%2,%pp.%1:22,%%2009%&%FPT%2008%

Application 1: LUT Delay Map video Videos showing how FPGA timing failure progressively as test clock frequency is increased Page 17 Application 2 (Instr 2) : Clock Delay Variabilities LUT Delay Measurements for Virtex-5 XC5VLX50-1 How much variability comes from the clock tree? Page 18

Differential delay measurement circuit launch circuit common signal path p 1 c 1 capture circuit v 1 clock source common clock path p 2 c 2 capture circuit v 2 Delay diff = [t(p 1 ) t(c 1 )] [t(p 2 ) t(c 2 )] If p 1 is near p 2 (and c 1 near c 2 ) then spatially correlated variations cancel out Page 19 Differential delay measurement example Delay diff = [t(p 1 ) t(c 1 )] [t(p 2 ) t(c 2 )] Page 20

Components of signal path and clock tree Simplified lumped model of delays Components are isolated by making incremental routing changes Variances are calculated from the measured differences A regression equation of variances can be solved v 1 v 2 Page 21 Results Solve linear regression equations to find standard deviations of delays: 4.4ps 4.1ps 5.6ps 7.2ps σ = 2.8ps Page 22

Application 2 Results: How much clock skew? What is the minimum clock skew variation in a single clock region? Estimated σ = 12ps Similar to LUT delay variation (σ = 11ps) Sedcole,%Wong%and%Cheung,%"CharacterisaIon%of%FPGA%Clock% Variability",%IEEE%InternaIonal%Symposium%on%VLSI%pp.322:328%(2008)% Page 23 Problem with Instrument 2 Good resolutions Only works for combinational circuits Need to access both inputs and outputs of the capture registers Need a better method suitable for blackbox approach Page 24

Instrument 3: Delay Measurement using Transition Probability No synchronous Error Detector needed Infer Timing Error by observing Transitions Probability (TP) The TP Method Fails (f max ) Fails TP = No. of Transitions No. of Test Clock cycles in a freq. step Toggle signal f max 500 To Async. Transition Counter Slide 25 How about complex circuits? Drive inputs with random patterns Random Inputs TP @ 2nd output bit of a 4x4 fixed-point multiplier No longer has the nice Characteristics of 1 path f max But it is formed by a combination of them from each failing path Slide 26 Wong & Cheung, Improved Delay Measurement method in FPGA based on Transition Probability, ACM Symposium on FPGA 2011 Wong & Cheung, A Timing Measurement Platform for Arbitrary Black-box circuit based on Transition Probability, TVLSI 2013

Accuracy and precision Isolated single-path: Resolution: ~1 to 2ps (depends on clock generator) Measurement based on nominal clock period (centre of jitter distribution) Entire circuit (Multi-path): Same as single-path. Measurement based on minimum clock period (min. of jitter distribution) Largest Design Tested Slide 27 Application 1 (Instr. 3): Dealing with Delay variability due to ageing Degradation characterisation Accelerated life test Measure and model how logic slows down over time under stresses Heat, voltage and different switching stresses Stott, Wong & Cheung., Degradation in FPGAs: Measurement and Modelling, ACM Symposium on FPGA 2010 Slide 28

Demo: 10 years worth of degradation in 17 seconds of video Cyclone III Accelerated life test with 4 types of input stress @ 125 C, 1.8v TP Test every hour @ 35 C, 1.2v (default voltage) Path under test / stress: Delay Delay 300 MHz Toggle 1 Hz Toggle Static 1 Static 0 NBTI Negative Bias Temperature Instability Slide 29 What do the results tell us about degradations on FPGAs?

Application 2 (Instr. 3): Variation-aware place-androute Idea: Measure chip-specific delay map (Variation Map) Place critical part of design into Fast Region (Variation-aware Placement) Slower FPGA Faster Slide 31 However, Practical use of FPGAs involves large number of chips NOT just one specific chip Many Variation Maps Each chip has unique Variation Map (and optimum placement) Very Time consuming: Variation-aware Placement for each chip Slide 32

Solution Pattern classification / clustering Group similar patterns together Additional chip(s) FPGA Finite no. of classes Find best Match pattern / class Perform variation-aware placement for each class Use placement optimised for Class-12 Reduce total run time, while retaining close-to-optimal placement Slide 33 Goals of Variation-Aware Investigate how to use measured variation maps to improve timing performance With reasonable execution time overhead Integration into practical work flow for industry What we have achieved so far Two-stage variation-aware placement Variation-aware partial rerouting Variation-aware retiming Guan, Wong, Constantinides & Cheung, A two-stage variation-aware placement method for fpgas exploiting variation maps classication, FPL 2012 Guan, Wong, Constantinides & Cheung, A Variation-adaptive Retiming Method Exploiting Reconfigurability, FPL 2013 Page 34

Results Combined all Optimisation Methods Page 35 Where have we got to? Instrument Applications 1. Ring Oscillator Stochastic vs Systematic Variation 2. Timing Error Detection LUT delay map characterisation Clock skew measurement 3. Transition probability Degradation characterisation Variation-aware P&R and re-timing # Our instruments so far operate OFF-LINE # Need another method to perform delay measurement under normal operational condition 4. Online Slack Measurement Online Health Monitoring Dynamic voltage/frequency scaling Page 36

Instrument 4: Online Slack Measurement (OSM) Clock Shadow Clock phase lead Input Regs 1 1 Logic Regs 1 2 Output Clock Slack Measurement Circuit Design Entry Application Circuit Shadow Clock Reg S Reg D Discrep. Clock Levine Stott, Constantinides, & Cheung, Online Measurement of Timing in Circuits: for Health Monitoring & Dynamic Voltage and Frequency Scaling, FCCM 2012 Page 37 Applications (Instr. 4): Health monitoring & Dynamic Voltage/Frequency Scaling Measure the actual timing slack in the circuit while it is working normally using Online Slack Measurement (OSM) technique Use timing slack to reduce the timing margin in order to: Reduce power, or Increase throughput, or A combination of the two Levine, Stott & Cheung, Dynamic Voltage & Frequency Scaling with Online Slack Measurement, ACM FPGA Symposium, 2014 Page 38

Timing Safety Margins Best Case Inter-Die Variation Intra-Die Variation Degradation Temperature Noise Worst Case 150 MHz 100 MHz Page 39 Reduced Timing Margin Best Case Inter-Die Variation Intra-Die Variation Degradation Temperature Noise Worst Case Actual Guardband Reclaimed Variation Margin 150 MHz 130 MHz 100 MHz Page 40

Dynamic Scaling Dynamic Voltage Scaling (DVS): Scale the voltage Frequency is constrained Dynamic Frequency Scaling (DFS): Scale the frequency Voltage is constrained Dynamic Voltage & Frequency Scaling (DVFS): Scale both the voltage and frequency Power is constrained Page 41 Experiments A variety of functional benchmarks from FloPoCo and Spiral Contain memory and DSP LUTs: 1.1k 5.4k, Regs: 0.9k-5.1k Automatically instrumented for online slack measurement Overheads: 1.1% increase in LUTs 2.5% increase in Regs 1.8% decrease in model fmax Page 42

Experiment Rig Altera Cyclone IV FPGA (Tersaic DE0-nano) Temperature controlled package PSU supplies core voltage and provides power measurement Page 43 Dynamic Voltage Scaling Results 250 200-25% -34% nominal DVS (85 C) DVS (27 C) Power (mw) 150 100 50 0 fpadd64 fpexp64 dct1d fplog32 fpmult32 fpexp32 filter Page 44

Dynamic Frequency Scaling Results 250 Throughput (s 1 10 6 ) 200 150 100 50 +31% +39% STA DFS (85 C) DFS (27 C) 0 dct1d fpadd64 fpmult32 filter fplog32 fpexp64 fpexp32 Page 45 Automation Tools no hands! Tools to automatically insert TP delay (TPD) and online slack measurement (OSM) circuitry Fully compatible with vendor s compilers Requires little to no manual intervention Add Sensors Application HDL Compile Bare Application Timing Report Identify Critical Registers Final Compile Bitstream P & R Constraints Calibration Operating Parameters Page 46

Summary Instrument Applications 1. Ring Oscillator Stochastic vs Systematic Variation 2. Timing Error Detection LUT delay map characterisation Clock skew measurement 3. Transition probability Degradation characterisation Variation-aware P&R and re-timing 4. Online Slack Measurement Online Health Monitoring Dynamic voltage/frequency scaling Page 47 Conclusions Variability: this problem is here to stay. What are our response? Just give up.. yield will become zero.. semiconductor industry will always solve the problem.. On-silicon instrumentation Will become increasingly important When coupled with reconfigurability, open up new possibilities VLSI chips: no need to treat them all the same (clones) is Reconfigurability may be the answer to the variability and reliability challenge Page 48

Acknowledgement Thanks to EPSRC for support of these grants: Variation-Adaptive Design in FPGAs PLATFORM: Custom Computing for Advanced Digital Systems PLATFORM: Field-Programmable Logic for Custom Computing PROGRAMME: PRiME (Power-efficient, Reliable, Many-core Embedded systems) Xilinx and Altera My students/ras working/worked with me on this topic: Secole Wong Stott Guan Levine Davis Page 49 Advertisement EPSRC funded CENTRE FOR DOCTORAL TRAINING (CDT) In HIGH-PERFORMANCE EMBEDDED AND DISTRIBUTED SYSTEMS (HiPEDS) Department of EEE and Department of Computing Imperial College London 50+ new PhD positions from October 2014 until 2020 Page 50