Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs 1
Outline Variations Process, supply voltage, and temperature Impact of variations on circuits and microarchitecture Variation tolerance and reduction Process, circuit, and microarchitecture techniques Summary 2
3
P, V, T Variations Process Die-to to-die variation Within-die variation Static for each die Device Ion Very slow Years Time dependent degradation Voltage Chip activity change Current delivery RLC Dynamic: ns to 10-100us 100us Within-die variation Temperature Activity & ambient change Dynamic: 100-1000us 1000us Within-die variation 4
Frequency & SD Leakage Normalized Frequency 1.4 1.3 1.2 1.1 1.0 0.9 30% 20X 0 5 10 15 20 Normalized Leakage (Isb) 0.18 micron ~1000 samples Low Freq Low Isb High Freq Medium Isb High Freq High Isb 5
Vt Distribution # of Chips 120 100 80 60 40 20 0 0.18 micron ~1000 samples ~30mV -39.71-25.27-10.83 3.61 18.05 32.49 High Freq High Isb VTn(mv) High Freq Medium Isb Low Freq Low Isb 6
Frequency Distribution 150 100 50 # of Chips 0 1.37 1.30 1.22 1.15 1.07 1.00 Freq (Normalized) High Freq High Isb High Freq Medium Isb Low Freq Low Isb 7
Isb Distribution 100 # of Chips 1 20.11 16.29 12.47 8.64 4.82 1.00 Isb (Normalized) High Freq High Isb High Freq Medium Isb Low Freq Low Isb 8
Process variation impacts Fmax distribution Cumulative distribution (%) 100 80 60 40 20 0 Within-die (WID) WID + D2D mean Fmax loss Die-to-die (D2D) 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 Fmax (normalized) 9
WID variation scaling trends CD control Variation components WID CD variation 3σ (% of nominal length) 15% 10% 5% 0% 130nm 90nm 65nm Technology generation WID CD variation 3σ (nm) 8 7 6 5 4 3 2 1 0 Systematic Random 130nm 90nm 65nm Technology generation CD control ~ fixed % of nominal gate length Random variations more dominant with scaling 10
Systematic vs. random variations % mean Fmax loss Fmax mean (µ)( ) impact 20% 15% 10% 5% 0% Systematic + random Random only 2 4 8 12 20 Logic depth Deeper pipeline Deeper pipelining worsens random variation impact Total variation impact insensitive to pipeline depth 11
Impact of steeper speedpath walls Fmax mean (µ) ) impact % mean Fmax loss 20% 15% 10% 5% 0% 10 100 200 1000 10000 # critical paths Variation increases as #critical paths 12
Supply Voltage Variation Supply voltage (V) Reliability & power Vmax Vmin frequency Time (µsec) Activity changes Current delivery RI and L(di/dt) drops Dynamic: ns to 10-100us Within-die variation 13
Temperature Variation Temp ( o C) Cache 70ºC Core 120ºC Activity & ambient change Dynamic: 100-1000us Within-die variation 14
15
Impact on Path Delays Path Delay Probability Delay Path delay variability due to variations in Vcc, Vt, and Temp Impacts individual circuit performance and power Objective: full chip performance, power, and yield Multivariable optimization of individual circuit Vcc, Vt, size Optimize each circuit for full chip objectives 16
Circuit Design Tradeoffs 2 1.5 1 0.5 power target frequency probability 2 1.5 1 0.5 0 small large Transistor size 0 low high Low-Vt usage Higher probability of target frequency with: 1. Larger transistor sizes 2. Higher Low-Vt usage But with power penalty 17
Impact of Critical Paths Number of dies 60% 40% 20% # critical paths 0% 0.9 1.1 1.3 1.5 Clock frequency Mean clock frequency 1.4 1.3 1.2 1.1 1 9 17 25 # of critical paths With increasing # of critical paths Both σ and µ become smaller Lower mean frequency 18
Impact of Logic Depth Number of samples (%) 40% NMOS PMOS 20% 40% 20% 0% Device I ON Delay -16% -8% 0% 8% 16% Variation (%) NMOS Ion σ/µ Logic depth: 16 PMOS Ion σ/µ Delay σ/µ 5.6% 3.0% 4.2% Ratio of delay-σ to Ion-σ 1.0 0.5 0.0 16 49 Logic depth 19
µarchitecture Tradeoffs 1.5 1.5 1 1 0.5 frequency 0.5 0 large small Logic depth target frequency probability 0 less more # uarch critical paths Higher target frequency with: 1. Shallow logic depth 2. Larger number of critical paths But with lower probability 20
21
Forward Body Bias Normalized operating frequency Router chip with body bias 1.5 1 0.5 0 1.2V 110 C 450mV 0 200 400 600 Forward body bias (mv) CBG Die size Technology Transistors Area overhead 2% Power overhead 1% I/O: S-Links Digital Core 24 LBGs PLL 10.1 X 10.1 mm2 150nm CMOS 6.6 million I/O: F-Links FBB increases circuit frequency & SD leakage Fmax (MHz) Fmax (MHz) 2000 1750 1500 1250 1000 750 500 250 2000 1750 1500 1250 1000 750 500 250 Body bias chip with 450 mv FBB T j ~ 60 C NBB chip & body bias chip with ZBB 0.9 1.1 1.3 1.5 1.7 Vcc (V) Body bias chip with 450 mv FBB T j ~ 60 C Body bias chip with ZBB 0 5 10 15 20 Active power (W) 22
Reverse Body Bias Intrinsic Leakage Reduction Factor (X) 100 10 110C 0.5V RBB Higher V T Shorter L Lower V T 1 0.01 0.1 1 10 100 1000 Target I off (na/µm) ICC (A) 1E-05 1E-06 1E-07 1E-08 Lnom Lwc 150nm, 27C Chip 1E-09 0 0.5 1 1.5 Reverse V BS (V) Tech Opt.RBB Ioff Red. 0.35 µm 2V 1000X 0.18 µm 0.5V 10X Total Leakage Power Measured on 0.18µ Test Chip Microprocessor critical path circuit I/O circuit RBB reduces SD leakage Less effective with: shorter L, lower V T, & scaling 23
Adaptive Body Bias-- --Experiment 5.3 mm Multiple subsites Resistor Network PD & Counter CUT Delay Resistor Network Bias Amplifier 4.5 mm Technology Number of subsites per die Body bias range Bias resolution 150nm CMOS 21 0.5V FBB to 0.5V RBB 32 mv 1.6 X 0.24 mm, 21 sites per die 150nm CMOS Die frequency: Min(F 1..F 21 ) Die power: Sum(P 1..P 21 ) 24
Adaptive Body Bias-- --Results Number of dies too slow too leaky ABB FBB RBB f target Frequency f target Accepted die 100% 60% 20% 0% nobb ABB within die ABB 97% highest bin 100% yield Higher Frequency For given Freq and Power density 100% yield with ABB 97% highest freq bin with ABB for within die variability 25
Vcc Variation Reduction With Die Caps Without Die Caps On die decoupling capacitors reduce Vcc Cost area, and gate oxide leakage concerns On die voltage down converters & regulators 26
Adaptive supply voltage Total power (normalized) Standby leakage power (normalized) Die count 100% 80% 60% 40% 20% 0% 400 200 10 W/cm 2 100 0 1.05V 110 C α: 0.03 0.5 W/cm 2 0 8 10 1 40 C 0.85 0.9 0.95 1 1.05 1.1 Frequency (normalized) 0%0% 0.9 0.95 1 1.05 Frequency Bin 10 9 Switched capacitance (normalized) Fixed supply Adaptive supply Improve frequency bin split under max power constraint 27
Adaptive supply & body bias Adaptive supply Die count Die count 100% 60% 20% 100% 60% 20% 74% 79% 0% 0% 0.9 0.95 1 1.05 Frequency Bin 79% 71% Adaptive body bias Adaptive body bias Adaptive supply + body bias 0% 0% 0.9 0.95 1 1.05 Frequency Bin PMOS body bias (V) PMOS body bias (V) 0.4 0.2 0-0.2-0.4 0.4 0.2 0-0.2-0.4-0.4-0.4 P FBB N RBB P RBB N RBB Adaptive V BS -0.2 0 P FBB N FBB P RBB N FBB 0.2 NMOS body bias (V) 2% 25% Adaptive V DD +V BS P FBB P FBB N RBB N FBB P RBB N RBB -0.2 0 0.2 NMOS body bias (V) 0.4 P RBB N FBB 0.4 28
Temperature Control Temperature Tmax: frequency & power Throttle Time (usec) When temperature exceeds the threshold 1. Lower freq (activity) 2. Lower Vcc 29
30
Impact on Design Methodology Path Delay Probability Due to variations in: Vdd, Vt, and Temp Delay # of Paths Deterministic # of Paths Probabilistic Frequency Deterministic Probabilistic 10X variation ~50% total power Delay Target Delay Target Leakage Power 31
Limitations of deterministic design Multi-Vt or Le devices Transistor sizing Path count 10000 1000 100 10 multi-vt or Le single-vt 10X Path delay 0.95 1.00 Path count optimally sized Path delay oversized target Power (cost) Die area (cost) Mean Fmax loss Nominally optimized Low Low High Variation tolerant High High Low 32
Probabilistic design concept WID CD variations WID drive current profile WID leakage current profile estimated Size transistors Assign Vt or Le Maximize metric (expected mean Fmax) α Best pre-silicon design (switching power + expected leakage power) β Probabilistic optimization metric WID temperature profile Node activity data Systematic variations * 1 (die area) γ Post-silicon tuning FINAL DESIGN actual 33
Active input offset compensation circuit Vin Idif1 Idif2 Id1 Id2 Id1 Id2 inn inp d7 d7 d0 d0 bias bias <254:127> <0> Active input offset compensation circuit - simple structure - programmable output voltage with 8-bit high resolution 34
Generated input voltage offsets Vos[mV] h00 280 230 180 130 80 30-20 -70-120 -170-220 -270 h7f digital input offset cal. hff Voltage offset range : -200 mv to 200 mv Voltage offset resolution / 1 bit : 1.56 mv Active input voltage offset compensation circuit 35
Major Paradigm Shift From deterministic design to probabilistic and statistical design A A path delay estimate is probabilistic (not deterministic) Multi-variable design optimization for Yield and bin splits Parameter variations Active and leakage power Performance Post-silicon silicon offset cancellation 36
Summary Parameter variations will become worse with technology scaling Robust variation tolerant circuits and microarchitectures needed Multi-variable design optimizations considering parameter variations Major shift from deterministic to probabilistic design 37