Incorporating Variability into Design Jim Farrell, AMD Designing Robust Digital Circuits Workshop UC Berkeley 28 July 2006
Outline Motivation Hierarchy of Design tradeoffs Design Infrastructure for variability Circuits techniques for variability Architectural techniques for variability Conclusion 2
Motivation Beyond basic yield, Variation costs frequency, which reduces revenue. Example of how we might partition a lot into frequency bins: Top Bin Fmax Mid Bin Bottom Bin Sidd 3
How Variability affects Circuit Delay Which components are more affected by variations, LARGE gates or SMALL gates? Example: Inverter Chain with different inverter sizes Delay Variations (%) 40 35 30 25 20 15 Large Gates vs. Small Gates 0 0.5 1 1.5 2 2.5 3 3.5 Gate Width (um) Gate Width (um) 0.24 0.5 1.0 1.5 2.0 3.0 Variation (%) 35 29 26 20 20 20 4
1.4 1.2 1 0.8 0.6 0.4 0.2 0 Impact of Variation on Leakage Power -3 sigma Nom Le +3 sigma leakage device distribution leakage distribution Relative Speed Part Frequency P C TOT α F Vdd 2 + N TOT α F Vdd I CO + N ON I LEAK Vdd Selective use of High Vt and longer channel devices shifts leakage profile to lower values But, poly bias is limited in CPP From Naffziger, 2006 VLSI Ckt Symp Switching Power Crossover Power Leakage Power 5
Limits to poly biasing for leakage variation reduction Contacted poly pitch includes: PC width CA width Spacer and overlay tolerances CA Competing requirements for: CA resistance CA-PC capacitance CA-PC breakdown voltage And, maximum poly bias for leakage PC RX 6
Infrastructure to Reduce Variability This is the Boring stuff Classic engineering isolation of problem areas Is this a problem that should addressed? Simulation, data and judgment Critical work at the beginning of the project Examples: Contact vs Salicide resistance Feedback sizing in M/S flops 7
Infrastructure: Transistor R,C model Question do I care about the variation in contact or salicide resistance? Rc Rc Rc Rdiff Rdiff Rdiff Rdiff Rc Rc Rc 8
Comparing Source vs Contact effect on transistor delay. Delay Diff Normalized (%) 11 10 9 8 7 6 5 4 3 2 1 0 N Width Contact resistance dominates over source resistance Conclusion: Concentrate on minimizing variation impact of contacts Rc(min)-Rc(0) Rc(nom)-Rc(0) Rc(max)-Rc(0) Ra(min)-Ra(0) Ra(nom)-Ra(0) Ra(max)-Ra(0) 9
Choices for M/S Flops: clocked feedback safer for variation Din Qb CLK Ability to use ratioed feedback severely compromised by writeability vs noise margin 10
Consider variation in design of Flops Din Qb CLK Noise pulse on clock requires analysis of sizing of keeper devices vs data path ->Reducing keeper stack sizing by half for clock power reduces noise margin by from 14 to 9% at 6sigma due to Vt and Leff variation. 11
Circuit Techniques for Variability Based on triage from infrastructure work Focus on the goals of the design: Power Timing Voltage range Look for opportunities to reduce constraints imposed by standard circuits 12
Latching In 45/32nm with a 16 FO4 T CYCLE and the standard MSFF design, the time available for logic is 16-1.5-1.5-1.5 = 11.5 usable FO4 delays. T SU +T CQ = 3 FO4s, T JITTER 1.5 FO4s (~25ps typical) Jitter is likely to be larger than this given process variability trends The jitter budget must comprehend path variability due to supply droop and device/wire variations A flop that tolerates variations in clock and data arrival time without hurting frequency is highly desirable Should also provide low flow-through time Should provide the ability to incorporate logic for reduced delay Multiple flavors of flops is a usage and support cost (i.e. SA flop, fqbx, msfqbx etc.) If a single flop can provide good performance and robustness, that s better AMD s eight generation processors have ~140,000 flops. At 10fC switching cap on average (10% data activity) and assuming 25% avg. clock gating, 15% of our C AC budget is spent on flops T JIT T CQ T LOGIC T LOGIC = T CYCLE T CQ T SU - T JIT T SU TCQ From Naffziger, 2006 VLSI Ckt Symp 13
Some Ways to Shoulder the Variation Burden: Adaptive clocking Programmable Delay Buffers Empirically set the clock edge to optimize frequency Higher granularity more variation tolerance From Naffziger, 2006 VLSI Ckt Symp L BIST and GA search algorithms show promise for per-part optimization 14
Logic vs Metal Delay variations Logic deltal dominated Interconnect linewidth and material dominated 65nm timing tools dominated by worst case assumptions Next generation needs to consider the statistical variation of the logic and the interconnect. Biasing metal linewidth may provide margin for critical interconnect much like biasing Lpoly for transistors. Ideally consider layout of interconnect to incorporate dependencies. Metal fill improves variation. 15
Metal fill tradeoffs Opportunity to improve variation from processing Designer needs to be cognizant of o additional capacitance (and coupling if not tied fill) o cad tool burden datasize, fill generation o data prep time Before fill After fill 16
Clkin from PLL Adaptive Circuits Example: duty cycle correction for process variation through training cycle (Agarwal, 2006 VLSI) Delay Line c d Clkout: Corrected duty cycle from variation a b Ctrl bits From detector/cntr 17
Architectural Techniques to reduce variability Goal change the problem to one that can be solved. Example: Rather than make every cell perfect, redundancy reduces effect of weak bitcells Consider the larger issue of identifying weak cells Anticipate problems during the full life of the product 18
Architectural Options separate voltage planes VDD1 L2$ L3 Cache VDD2 VDD3 L2$ L2$ VDD4 VDD0 L2$ Solving problem with how to increase headroom for small memory cells Common Technique but difficulty in supplying multiple supplies and architectural latency crossing Boundaries (Kanno, 2006 ISSCC) Core VDD5 Core Core Core VDD6 VDD7 VDD8 Architectural choices in how many supplies can be supported or tied together. 19
Adaptive Supply Voltage Energy / Operation Per-part and dynamic voltage management are key More range flexibility and finer grain response will provide differentiation Short Nom High Channel Length Long Low Vdd From Naffziger, 2006 VLSI Ckt Symp 20
Options for weak memory elements Cache Datapath1 RegFile Datapath2 Ctrl Identify weakness at test Redundancy in cache Fusing off bad entries in Register file (virtual vs Physical registers Datapath/ctrl can use redundant storage elements at significant area cost (similar to ultradrowsy from L. Clark, et.al, JSSC Feb 2005) Difficult problem for test: BIST for weakness Ideally probe for weakness over life of product and softfuse off any failing cells 21
Conclusions Variation requires a new body of infrastructure that must be analyzed for critical requirements Minimizing the impact through circuit techniques is the first answer Consider the overall problem: architectural tradeoffs can also minimize the effect of variation 2006, Advanced Micro Devices. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. 22