Towards PVT-Tolerant Glitch-Free Operation in FPGAs

Similar documents
Towards PVT-Tolerant Glitch-Free Operation in FPGAs

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Review and Analysis of Glitch Reduction for Low Power VLSI Circuits

EECS 427 Lecture 22: Low and Multiple-Vdd Design

COFFE: Fully-Automated Transistor Sizing for FPGAs

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

SHOULD FPGAS ABANDON THE PASS-GATE? Charles Chiasson and Vaughn Betz

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

Low-Power Digital CMOS Design: A Survey

A Novel Low-Power Scan Design Technique Using Supply Gating

Andrew Clinton, Matt Liberty, Ian Kuon

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers

Timing analysis can be done right after synthesis. But it can only be accurately done when layout is available

White Paper Stratix III Programmable Power

INF3430 Clock and Synchronization

Improved DFT for Testing Power Switches

Managing Metastability with the Quartus II Software

UNIT-II LOW POWER VLSI DESIGN APPROACHES

A Case Study of Nanoscale FPGA Programmable Switches with Low Power

TRENDS in technology scaling make leakage power an

PV SYSTEM BASED FPGA: ANALYSIS OF POWER CONSUMPTION IN XILINX XPOWER TOOL

Single Event Transient Effects on Microsemi ProASIC Flash-based FPGAs: analysis and possible solutions

Lecture 9: Clocking for High Performance Processors

A Dual-V DD Low Power FPGA Architecture

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

Interconnect-Power Dissipation in a Microprocessor

Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

An Interconnect-Centric Approach to Cyclic Shifter Design

ELEC Digital Logic Circuits Fall 2015 Delay and Power

A Survey of the Low Power Design Techniques at the Circuit Level

Logic Restructuring Revisited. Glitching in an RCA. Glitching in Static CMOS Networks

Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Charles Chiasson

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Low Power Design in VLSI

CS250 VLSI Systems Design. Lecture 3: Physical Realities: Beneath the Digital Abstraction, Part 1: Timing

On-silicon Instrumentation

Accurate Timing and Power Characterization of Static Single-Track Full-Buffers

Disseny físic. Disseny en Standard Cells. Enric Pastor Rosa M. Badia Ramon Canal DM Tardor DM, Tardor

Phase interpolation technique based on high-speed SERDES chip CDR Meidong Lin, Zhiping Wen, Lei Chen, Xuewu Li

PERFORMANCE COMPARISON OF DIGITAL GATES USING CMOS AND PASS TRANSISTOR LOGIC USING CADENCE VIRTUOSO

Timing Issues in FPGA Synchronous Circuit Design

DESIGNING powerful and versatile computing systems is

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

Source Coding and Pre-emphasis for Double-Edged Pulse width Modulation Serial Communication

POWER GATING. Power-gating parameters

ISSN:

Lecture 10. Circuit Pitfalls

Period and Glitch Reduction Via Clock Skew Scheduling, Delay Padding and GlitchLess

Module -18 Flip flops

Glitch Power Reduction for Low Power IC Design

CHAPTER 6 PHASE LOCKED LOOP ARCHITECTURE FOR ADC

Latch-Based Performance Optimization for Field-Programmable Gate Arrays

Dynamic Logic. Domino logic P-E logic NORA logic 2-phase logic Multiple O/P domino logic Cascode logic 11/28/2012 1

The challenges of low power design Karen Yorav

Lecture 9: Cell Design Issues

Nanowire-Based Programmable Architectures

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

Pramoda N V Department of Electronics and Communication Engineering, MCE Hassan Karnataka India

A Multiplexer-Based Digital Passive Linear Counter (PLINCO)

A Low-Power SRAM Design Using Quiet-Bitline Architecture

An Analysis for Power Minimization at Different Level of Abstraction to Optimize Digital Circuit

NEW PCM BASED FPGA ARCHITECTURE AND GRAPHENE MEMORY CELL DESIGN CHUNAN WEI THESIS

Power Modeling and Characteristics of Field Programmable Gate Arrays

IJMIE Volume 2, Issue 3 ISSN:

CHAPTER 3 NEW SLEEPY- PASS GATE

Active Decap Design Considerations for Optimal Supply Noise Reduction

Yet, many signal processing systems require both digital and analog circuits. To enable

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

ISSN:

POWER CONSUMPTION IN DIGITAL CIRCUITS

A new 6-T multiplexer based full-adder for low power and leakage current optimization

QUATERNARY LOGIC LOOK UP TABLE FOR CMOS CIRCUITS

A Novel Latch design for Low Power Applications

NanoFabrics: : Spatial Computing Using Molecular Electronics

Design of High Performance Arithmetic and Logic Circuits in DSM Technology

ECE 551: Digital System Design & Synthesis

Low Power System-On-Chip-Design Chapter 12: Physical Libraries

International Journal of Advance Engineering and Research Development

Power Spring /7/05 L11 Power 1

Chapter 4. Problems. 1 Chapter 4 Problem Set

Domino Static Gates Final Design Report

Course Outcome of M.Tech (VLSI Design)

Lecture #2 Solving the Interconnect Problems in VLSI

Minimization Of Power Dissipation In Digital Circuits Using Pipelining And A Study Of Clock Gating Technique

Amber Path FX SPICE Accurate Statistical Timing for 40nm and Below Traditional Sign-Off Wastes 20% of the Timing Margin at 40nm

Abstract. 1 Introduction

Statistical Timing Analysis of Asynchronous Circuits Using Logic Simulator

Announcements. Advanced Digital Integrated Circuits. Quiz #3 today Homework #4 posted This lecture until 4pm

DESIGNING OF SRAM USING LECTOR TECHNIQUE TO REDUCE LEAKAGE POWER

PROCESS and environment parameter variations in scaled

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

! Is it feasible? ! How do we decompose the problem? ! Vdd. ! Topology. " Gate choice, logical optimization. " Fanin, fanout, Serial vs.

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

EE141-Spring 2007 Digital Integrated Circuits

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting

Low Power, Area Efficient FinFET Circuit Design

Transcription:

Towards PVT-Tolerant Glitch-Free Operation in FPGAs Safeen Huda and Jason H. Anderson ECE Department, University of Toronto, Canada 24 th ACM/SIGDA International Symposium on FPGAs February 22, 2016

Motivation *Google data centre FPGA power increasingly critical because of new markets Data centers Mobile electronics Previous studies have shown glitch power is significant contributor to overall dynamic power consumption: Shum et al. [1] : ~26% of core dynamic power Lamoureaux et al. [2]: ~30% of signal transitions are due to glitches 2

Reviewing ` X 1 X 2 Y Unequal arrival times on input pins of combinational logic may result in spurious transitions These spurious transitions dissipate power 3

Previous Approaches Path Delay Balancing Δt t g Δt t g ` X 1 X 2 Δt OUT t g t g Add delay to fast-arriving inputs, equalize delay difference Delay balance circuitry may have low overhead [2,3] 4

Delay Predictability of FPGA Circuitry V DD V DD M 1 V DD V DD Pass transistor mux M 2 M 3 M 5 M 6 M 7 CMOS Buffer FPGAs consist of CMOS and pass-gate based circuitry CMOS gates Rise/Fall delay inversely proportional to μ, (V DD V T ) NMOS pass-gates M 4 Fall delay similar to CMOS Rise delay has inverse quadratic relationship with (Vdd V T ) 5

Delay Predictability of FPGA Circuitry V DD V DD M 1 V DD V DD Pass transistor mux M 2 M 3 M 5 M 6 M 7 CMOS Buffer M 4 Two styles of circuitry have different sensitivities to temperature Rise/Fall delay balance hard to ensure with NMOS pass-gates 6

LUT Delay Characteristics Delay (ps) 1000 900 800 700 600 500 400 300 200 100 0 0c Rise Delay 85c Rise Delay 0c Fall Delay 85c Fall Delay A B C D E F LUT Input Stratix-III LUT delays for each input for different temp. Inputs A & B have negative temperature dependence, large rise/fall delay imbalance Inputs C F have different delay characteristics 7

Limitation of Path Delay Balancing Difficult to balance input delays over all logic states and operating conditions! 0º Celsius 85º Celsius ` ` X 1 X 2 OUT X 1 X 2 OUT t g t g Set Δt = t g Set Δt = t g ` ` X 1 X 2 Δt OUT X 1 X 2 Δt OUT t g Δt 8

Proposed Glitch Reduction Technique Want: Reliable glitch reduction over all PVT corners Glitch reduction technique applicable over a wide range of circuits Low cost This work: Replace glitch filtering FF/latch with asynchronous element Eliminates cost and granularity problems of synchronous technique ` Y G F OUT t g t g D Q D Q Ф CLK Ф CLK 9

Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line Output stage disconnected from input immediately following transition at input stage Transition must propagate through delay-line before being sampled by output 10

Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line If input is a pulse with width t pw < t d (delay line delay), will not appear at output Programmable delay-line allows glitches of varying widths to be filtered 11

Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 6 M 7 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line If input is a pulse with width t pw < t d (delay line delay), will not appear at output Programmable delay-line allows glitches of varying widths to be filtered Implies better tolerance to PVT 12

Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line Cost of glitch filtering: When used, delay penalty t D (hundreds of ps to ns range) Cannot apply to all paths Only use to slack fill 13

Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line Cost of glitch filtering: When unused, delay penalty ~30ps (can be much smaller with improved topology) 14

Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line Cost of glitch filtering: Area: small (will be shown later), dominated by delay-line range and granularity 15

Proposed Architecture BLE Augment BLE output buffer with asynchronous glitch filtering circuitry GFC s programmable delay line allows glitch suppression/slack trade-off to be made at each BLE output GFC s delay line setting determined by CAD flow K-LUT D Q G F C Output buffer augmented with glitch filtering circuitry 16

CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Uses Quartus for compilation + power analysis Modelsim for timing and functional analysis Custom routines for extraction of glitch power statistics + optimization 17

CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Compile design with Quartus, output delay annotated netlist 18

CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Run Timing and Functional simulations with Modelsim 19

CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Use Quartus to analyse power based on timing and func. Sims Difference between two sims corresponds to glitch power 20

CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Collect glitch statistics for each node in the circuit # glitches for each pulse width Glitch power dissipated at each node Power savings 21

CAD Flow Benchmark Quartus compilation (map, fit, asm, sta, eda) Post-routing delay annotated netlist Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Optimize power given glitch statistics and timing graph 22

CAD Flow Benchmark Quartus compilation (map, fit, asm, sta, eda) Post-routing delay annotated netlist Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Append behavioural model of GF to netlist Run timing sim. with glitch filter settings Power savings Run power analysis with new netlist and compare w/ prev. 23

Glitch Filter Setting Opt. n x n z n y Given a timing graph for a circuit with nodes n i representing comb. gates, FFs, PIs, POs, etc. Edges annotated with delays 24

Glitch Filter Setting Opt. p n x t p n z t n y p Annotate edges with glitch power density functions GP i (t) Can be obtained through a combination of functional, timing, and power analyses t 25

Glitch Filter Setting Opt. p n x g x t p n z g z n y g y p t t Append nodes g i to outputs of nodes n i These represent glitch filters Delay from n i to g i corresponds to glitch filter setting at i We shall call this delay d i 26

Glitch Filter Setting Opt. For every choice of d i we get a corresponding glitch power decrease at node i All glitches with pulse width less than d i are suppressed Cannot arbitrarily set d i Must obey timing constraints Certain settings of d i may have adverse effects Can potentially create new glitches downstream Optimization tool must be aware of this and avoid where needed 27

Glitch Filter Setting Opt. Delay annotated netlist Glitch Statistics GP i (t) Formulate MILP Problem.lp file Maximize x + y + z Subject To c0: x + y = 1 c1: x + 5 y + 2 z <= 10 qc0: x + y + [ x ^ 2-2 x * y + 3 y ^ 2 ] <= 5 Bounds 0 <= x <= 5 z >= 2 Generals x y z Gurobi Optimizer Glitch filter settings Optimization problem can be described as a set of linear equations with linear constraints Cast problem as Mixed Integer Linear Program (MILP) Solve using Gurobi optimizer tool 28

Experimental Study Used MCNC benchmarks to assess power reduction possibilities Easy to use random vectors to perform power analysis Supplied 10000 random vectors for timing/func. simulation As previously reported, glitches account for ~26% of core dynamic power [1] Used benchmark set from UMass [4] Communications/signal processing benchmark set of 6 designs Glitch power was insignificant for one design (ava) For other designs, ranged from 5-22% of core dynamic power All delay/power/functional characterization of circuits done using STMicroelectronics 65nm models 29

Experimental Study Want to understand best design parameters for GF Delay setting of each GF constrained by physical limits Ideal GF delay line has: Infinite range delay can be as large as needed Infinitesimal resolution GF delay can be set to any value Real GF delay line has: Finite range maximum delay is bounded Finite resolution GF delay values quantized (100ps, 200ps, ) Finite res. and range limit ability to optimize glitch power Increased range and/or decreased res. increased area GF area dominated by delay line 30

Delay Line Parameters Glitch power reduction ranges from ~62% to ~72% as delay line granularity and range are varied Ideal delay line allows ~75% glitch power to be reduced Very close to ideal with real parameters 31

Power vs Area Overhead Used [5] to estimate LUT area and estimate GF area Assume LUT:Routing = 25:75 32

Power vs Area Overhead ~2% area overhead allows ~62% glitch power reduction Corresponds to ~12% decrease in core dynamic power Up to ~72% glitch power reduction at ~7% area cost ~14% core dynamic power reduction 33

Power vs Area Overhead For designs studied in this work, delay penalty ~1% 30ps LUT delay increase may not correspond to 1% critical path hit for other designs Can be reduced with improved GF topology 34

Conclusion Proposed a glitch reduction technique which overcomes some of the shortcomings of previous techniques Able to reduce glitch power by ~60-70% at an area overhead of ~2-3%, c.p. degradation ~1% Corresponds to total core dynamic power reduction of ~12-14% Future Work: Improve CAD flow Inherently pessimistic assumptions/approach A model to relate arrival time to glitch power could improve decision making significantly Investigate heuristics to replace MILP formulation Assess multi PVT corner optimization of glitch power 35

References [1] Shum, W. and Anderson, J.H. "FPGA glitch power analysis and reduction. IEEE ISLPED 2011 [2] Lamoureux, J. et al. "GlitchLess: Dynamic power minimization in FPGAs through edge alignment and glitch filtering." IEEE TVLSI no. 16.11 (2008): 1521-1534. [3] Dinh, Q. et al. "A routing approach to reduce glitches in low power FPGAs." IEEE TCAD no 29.2 (2010): 235-240. [4] www.ecs.umass.edu/ece/tessier/rcg/benchmarks/ [5] Chiasson, C. and Betz, V. "COFFE: Fully-Automated Transistor Sizing for FPGAs," IEEE FPT 2013 36

Backup Slides 37

Previous Approaches Pipelining ` X 1 X 2 Y D Q OUT t g t g Synthesis techniques to heavily pipeline [3] Effective in removing glitches blocked from propagating to interconnect Potentially not suitable/applicable to all designs 38

Previous Approaches Synchronous Glitch Filtering ` D Q D D Q Y Q OUT Ф CLK t g t g Ф s Ф CLK Add additional FF or latch to sample intermediate points in datapath using a clock with different phase ф s [4] Does not affect behaviour of original design Has limited scope to filter glitches, may not be applicable to many parts of complex design Supplying ф s may be costly/not feasible 39

Routing Delay Characteristics Stratix-III routing delay characteristics extracted from a benchmark Shows routing state & temperature dependent delay highly unpredictable 40

Some Implementation Details CS Delay Cell Conv. Delay Cell V DD V DD M 4 M 5 M PB M 4 M 8 IN M 3 OUT IN M 3 M 7 OUT M 2 M 2 M 6 A M 1 M 6 M NB M 1 M 5 A Two possible delay cell implementations considered in this paper Current Starved (CS) potentially has lower area overhead, at a cost of increased PVT sensitivity Conventional less sensitive to PVT, greater area 41

Additional Constraint n x d X g x t X p n z g z n y d y g y t y t Obtaining GP v (t) is time consuming Would like to minimize number of times we compute these Ideally compute only once! Glitch profile function of relative arrival times of signals at inputs to combinational nodes 42

Additional Constraint n x d X g x t X d X d Y p n z g z n y d y g y t y t What happens if d X d Y? Relative arrival time at Z = (d x + t x ) (d y + t y ) = (t x t y ) (d x d y ) (t x t y ) GP Z (t): function of relative arrival times at z 43

Additional Constraint n x d X g x t X d X d Y p n z g z n y d y g y t y t What happens if d X d Y? Relative arrival time at Z = (d x + t x ) (d y + t y ) = (t x t y ) (d x d y ) (t x t y ) GP Z (t): function of relative arrival times at z Setting d X d Y makes GP z (t) stale, need to recompute! 44

Our Approach Only compute GP v (t) once If relative arrival times at v are altered, GP v (t) discarded From optimizer s point of view, v s power cannot be reduced Ensure relative arrival times of consequential nodes is preserved Nodes whose power we want to reduce! Nodes whose worst case glitch power is deemed to be insignificant are free to have input relative arrival times altered 45

Optimization Problem (revisited) p n x g x t p n z g z n y g y p t t Objective: Minimize P total subject to timing constraints and soft arrival time equalization constraints 46