Towards PVT-Tolerant Glitch-Free Operation in FPGAs Safeen Huda and Jason H. Anderson ECE Department, University of Toronto, Canada 24 th ACM/SIGDA International Symposium on FPGAs February 22, 2016
Motivation *Google data centre FPGA power increasingly critical because of new markets Data centers Mobile electronics Previous studies have shown glitch power is significant contributor to overall dynamic power consumption: Shum et al. [1] : ~26% of core dynamic power Lamoureaux et al. [2]: ~30% of signal transitions are due to glitches 2
Reviewing ` X 1 X 2 Y Unequal arrival times on input pins of combinational logic may result in spurious transitions These spurious transitions dissipate power 3
Previous Approaches Path Delay Balancing Δt t g Δt t g ` X 1 X 2 Δt OUT t g t g Add delay to fast-arriving inputs, equalize delay difference Delay balance circuitry may have low overhead [2,3] 4
Delay Predictability of FPGA Circuitry V DD V DD M 1 V DD V DD Pass transistor mux M 2 M 3 M 5 M 6 M 7 CMOS Buffer FPGAs consist of CMOS and pass-gate based circuitry CMOS gates Rise/Fall delay inversely proportional to μ, (V DD V T ) NMOS pass-gates M 4 Fall delay similar to CMOS Rise delay has inverse quadratic relationship with (Vdd V T ) 5
Delay Predictability of FPGA Circuitry V DD V DD M 1 V DD V DD Pass transistor mux M 2 M 3 M 5 M 6 M 7 CMOS Buffer M 4 Two styles of circuitry have different sensitivities to temperature Rise/Fall delay balance hard to ensure with NMOS pass-gates 6
LUT Delay Characteristics Delay (ps) 1000 900 800 700 600 500 400 300 200 100 0 0c Rise Delay 85c Rise Delay 0c Fall Delay 85c Fall Delay A B C D E F LUT Input Stratix-III LUT delays for each input for different temp. Inputs A & B have negative temperature dependence, large rise/fall delay imbalance Inputs C F have different delay characteristics 7
Limitation of Path Delay Balancing Difficult to balance input delays over all logic states and operating conditions! 0º Celsius 85º Celsius ` ` X 1 X 2 OUT X 1 X 2 OUT t g t g Set Δt = t g Set Δt = t g ` ` X 1 X 2 Δt OUT X 1 X 2 Δt OUT t g Δt 8
Proposed Glitch Reduction Technique Want: Reliable glitch reduction over all PVT corners Glitch reduction technique applicable over a wide range of circuits Low cost This work: Replace glitch filtering FF/latch with asynchronous element Eliminates cost and granularity problems of synchronous technique ` Y G F OUT t g t g D Q D Q Ф CLK Ф CLK 9
Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line Output stage disconnected from input immediately following transition at input stage Transition must propagate through delay-line before being sampled by output 10
Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line If input is a pulse with width t pw < t d (delay line delay), will not appear at output Programmable delay-line allows glitches of varying widths to be filtered 11
Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 6 M 7 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line If input is a pulse with width t pw < t d (delay line delay), will not appear at output Programmable delay-line allows glitches of varying widths to be filtered Implies better tolerance to PVT 12
Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line Cost of glitch filtering: When used, delay penalty t D (hundreds of ps to ns range) Cannot apply to all paths Only use to slack fill 13
Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line Cost of glitch filtering: When unused, delay penalty ~30ps (can be much smaller with improved topology) 14
Asynchronous Glitch Filtering Circuitry INPUT STAGE GLITCH FILTERING STAGE OUTPUT STAGE V DD V DD M 7 M 6 IN B 1 GATING CIRCUITRY OUT M 5 Delay Line Cost of glitch filtering: Area: small (will be shown later), dominated by delay-line range and granularity 15
Proposed Architecture BLE Augment BLE output buffer with asynchronous glitch filtering circuitry GFC s programmable delay line allows glitch suppression/slack trade-off to be made at each BLE output GFC s delay line setting determined by CAD flow K-LUT D Q G F C Output buffer augmented with glitch filtering circuitry 16
CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Uses Quartus for compilation + power analysis Modelsim for timing and functional analysis Custom routines for extraction of glitch power statistics + optimization 17
CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Compile design with Quartus, output delay annotated netlist 18
CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Run Timing and Functional simulations with Modelsim 19
CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Use Quartus to analyse power based on timing and func. Sims Difference between two sims corresponds to glitch power 20
CAD Flow Benchmark Post-routing delay annotated netlist Quartus compilation (map, fit, asm, sta, eda) Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Collect glitch statistics for each node in the circuit # glitches for each pulse width Glitch power dissipated at each node Power savings 21
CAD Flow Benchmark Quartus compilation (map, fit, asm, sta, eda) Post-routing delay annotated netlist Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Power savings Optimize power given glitch statistics and timing graph 22
CAD Flow Benchmark Quartus compilation (map, fit, asm, sta, eda) Post-routing delay annotated netlist Func. sim (Modelsim) Timing sim (Modelsim).vcd file Glitch power analysis Glitch power data Diff Power Analysis (PowerPlay) Power Analysis (PowerPlay) Glitch statistics for each node in the circuit Diff Glitch filter setting optimization Final Power Analysis (Modelsim timing simulation + Glitch filter settings Quartus PowerPlay).vcd file Append behavioural model of GF to netlist Run timing sim. with glitch filter settings Power savings Run power analysis with new netlist and compare w/ prev. 23
Glitch Filter Setting Opt. n x n z n y Given a timing graph for a circuit with nodes n i representing comb. gates, FFs, PIs, POs, etc. Edges annotated with delays 24
Glitch Filter Setting Opt. p n x t p n z t n y p Annotate edges with glitch power density functions GP i (t) Can be obtained through a combination of functional, timing, and power analyses t 25
Glitch Filter Setting Opt. p n x g x t p n z g z n y g y p t t Append nodes g i to outputs of nodes n i These represent glitch filters Delay from n i to g i corresponds to glitch filter setting at i We shall call this delay d i 26
Glitch Filter Setting Opt. For every choice of d i we get a corresponding glitch power decrease at node i All glitches with pulse width less than d i are suppressed Cannot arbitrarily set d i Must obey timing constraints Certain settings of d i may have adverse effects Can potentially create new glitches downstream Optimization tool must be aware of this and avoid where needed 27
Glitch Filter Setting Opt. Delay annotated netlist Glitch Statistics GP i (t) Formulate MILP Problem.lp file Maximize x + y + z Subject To c0: x + y = 1 c1: x + 5 y + 2 z <= 10 qc0: x + y + [ x ^ 2-2 x * y + 3 y ^ 2 ] <= 5 Bounds 0 <= x <= 5 z >= 2 Generals x y z Gurobi Optimizer Glitch filter settings Optimization problem can be described as a set of linear equations with linear constraints Cast problem as Mixed Integer Linear Program (MILP) Solve using Gurobi optimizer tool 28
Experimental Study Used MCNC benchmarks to assess power reduction possibilities Easy to use random vectors to perform power analysis Supplied 10000 random vectors for timing/func. simulation As previously reported, glitches account for ~26% of core dynamic power [1] Used benchmark set from UMass [4] Communications/signal processing benchmark set of 6 designs Glitch power was insignificant for one design (ava) For other designs, ranged from 5-22% of core dynamic power All delay/power/functional characterization of circuits done using STMicroelectronics 65nm models 29
Experimental Study Want to understand best design parameters for GF Delay setting of each GF constrained by physical limits Ideal GF delay line has: Infinite range delay can be as large as needed Infinitesimal resolution GF delay can be set to any value Real GF delay line has: Finite range maximum delay is bounded Finite resolution GF delay values quantized (100ps, 200ps, ) Finite res. and range limit ability to optimize glitch power Increased range and/or decreased res. increased area GF area dominated by delay line 30
Delay Line Parameters Glitch power reduction ranges from ~62% to ~72% as delay line granularity and range are varied Ideal delay line allows ~75% glitch power to be reduced Very close to ideal with real parameters 31
Power vs Area Overhead Used [5] to estimate LUT area and estimate GF area Assume LUT:Routing = 25:75 32
Power vs Area Overhead ~2% area overhead allows ~62% glitch power reduction Corresponds to ~12% decrease in core dynamic power Up to ~72% glitch power reduction at ~7% area cost ~14% core dynamic power reduction 33
Power vs Area Overhead For designs studied in this work, delay penalty ~1% 30ps LUT delay increase may not correspond to 1% critical path hit for other designs Can be reduced with improved GF topology 34
Conclusion Proposed a glitch reduction technique which overcomes some of the shortcomings of previous techniques Able to reduce glitch power by ~60-70% at an area overhead of ~2-3%, c.p. degradation ~1% Corresponds to total core dynamic power reduction of ~12-14% Future Work: Improve CAD flow Inherently pessimistic assumptions/approach A model to relate arrival time to glitch power could improve decision making significantly Investigate heuristics to replace MILP formulation Assess multi PVT corner optimization of glitch power 35
References [1] Shum, W. and Anderson, J.H. "FPGA glitch power analysis and reduction. IEEE ISLPED 2011 [2] Lamoureux, J. et al. "GlitchLess: Dynamic power minimization in FPGAs through edge alignment and glitch filtering." IEEE TVLSI no. 16.11 (2008): 1521-1534. [3] Dinh, Q. et al. "A routing approach to reduce glitches in low power FPGAs." IEEE TCAD no 29.2 (2010): 235-240. [4] www.ecs.umass.edu/ece/tessier/rcg/benchmarks/ [5] Chiasson, C. and Betz, V. "COFFE: Fully-Automated Transistor Sizing for FPGAs," IEEE FPT 2013 36
Backup Slides 37
Previous Approaches Pipelining ` X 1 X 2 Y D Q OUT t g t g Synthesis techniques to heavily pipeline [3] Effective in removing glitches blocked from propagating to interconnect Potentially not suitable/applicable to all designs 38
Previous Approaches Synchronous Glitch Filtering ` D Q D D Q Y Q OUT Ф CLK t g t g Ф s Ф CLK Add additional FF or latch to sample intermediate points in datapath using a clock with different phase ф s [4] Does not affect behaviour of original design Has limited scope to filter glitches, may not be applicable to many parts of complex design Supplying ф s may be costly/not feasible 39
Routing Delay Characteristics Stratix-III routing delay characteristics extracted from a benchmark Shows routing state & temperature dependent delay highly unpredictable 40
Some Implementation Details CS Delay Cell Conv. Delay Cell V DD V DD M 4 M 5 M PB M 4 M 8 IN M 3 OUT IN M 3 M 7 OUT M 2 M 2 M 6 A M 1 M 6 M NB M 1 M 5 A Two possible delay cell implementations considered in this paper Current Starved (CS) potentially has lower area overhead, at a cost of increased PVT sensitivity Conventional less sensitive to PVT, greater area 41
Additional Constraint n x d X g x t X p n z g z n y d y g y t y t Obtaining GP v (t) is time consuming Would like to minimize number of times we compute these Ideally compute only once! Glitch profile function of relative arrival times of signals at inputs to combinational nodes 42
Additional Constraint n x d X g x t X d X d Y p n z g z n y d y g y t y t What happens if d X d Y? Relative arrival time at Z = (d x + t x ) (d y + t y ) = (t x t y ) (d x d y ) (t x t y ) GP Z (t): function of relative arrival times at z 43
Additional Constraint n x d X g x t X d X d Y p n z g z n y d y g y t y t What happens if d X d Y? Relative arrival time at Z = (d x + t x ) (d y + t y ) = (t x t y ) (d x d y ) (t x t y ) GP Z (t): function of relative arrival times at z Setting d X d Y makes GP z (t) stale, need to recompute! 44
Our Approach Only compute GP v (t) once If relative arrival times at v are altered, GP v (t) discarded From optimizer s point of view, v s power cannot be reduced Ensure relative arrival times of consequential nodes is preserved Nodes whose power we want to reduce! Nodes whose worst case glitch power is deemed to be insignificant are free to have input relative arrival times altered 45
Optimization Problem (revisited) p n x g x t p n z g z n y g y p t t Objective: Minimize P total subject to timing constraints and soft arrival time equalization constraints 46