A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation Maziar Goudarzi, Tohru Ishihara, Hiroto Yasuura System LSI Research Center Kyushu University, Fukuoka, Japan
Background Outline Process variation Power in nanometer embedded processor- based systems Our work 1 Motivation Approach Experiments Summary and Future work 1 This is part of the CREST Ultra Low Power Design Projects sponsored by Japan Science and Technology Corporation (JST), http://www.slrc.kyushu-u.ac.jp/~ishihara/crest/e_kenkyu.html 2
Background: Process Variation 0.35 μm 0.18 μm 90 nm Feature Size Scale Down We focus on Threshold Voltage (V th ) variation Intra-die Inter-die Both inter-die and intra-die variations become increasingly important! * Source: X. Li, J. Le, L. Pileggi, Projection-Based Statistical Analysis of Full-Chip Leakage Power with Non-Log-Normal Distributions, DAC, 2006. 3
Our Focus: Intra-die (Within-die) V Variation th Large Intra-Die Variation Current 3-sigma = 13% V th 3-sigma = 67mV Variation is huge in small transistors q Na σvth = C ox W 3 L W L, W: Effective channel length and width q: electron charge C ox : oxide capacitance N a : substrate doping concentration W dm : maximum depletion width dm nmos: MAU 14x14 Ids Variation 12 9 6 3 0-3 -6-9 -12 Y(0) Y(4) Y(8) Y(12) min(%) -11.4 max(%) 11.4 Eijiro Toyoda, DFM: Device & Circuit Design Challenges, Int l Forum on Semiconductor Technology, 2004 4 X(0) X(4) X(8) X(12) 6-9 3-6 0-3 -3-0 -6--3-9--6-12--9 nmos: MAU 14x14 Vth Variation Vth 60-80 40-60 20-40 0-20 -20-0 -40--20-60--40-80--60 Y(0) Y(4) Y(8) Y(12) X(0) X(4) X(8) X(12) min(mv) -66.6 max(mv) 57.0 L = 0.1um W= 0.4um Av. = 203.7uA Sigma = 4.4% min. = -11.4% max. = 11.4% L = 0.1um W= 0.4um Av. = 308.3uA Sigma = 22.1mV min. = -66.6mV max. = 57.0mV
Unavoidable Cause of V Variation: th Mean Number of Dopant Atoms Random Dopant 10000 1000 100 10 1000 500 250 130 65 32 Technology Node (nm) Random Dopant Fluctuations Source: S. Borkar Dopant Fluctuation (RDF) Vth variability (%) Nature of variations Systematic Random ITRS-2005 roadmap forecast 120 100 80 60 40 20 0 All sources Doping alone 2006 2010 2014 2018 5
Power consumption Dynamic activity-based Static (leakage) Trend Our Focus: Leakage Power activity-independent independent Traditionally: Dynamic >> Static Nanometer technologies Static >> Dynamic Source: P.K. Huang, S. Ghiasi (DAC 06) 6
Our Focus: Caches Memories Largest portion of chips => biggest leakage Minimum-area transistors => most susceptible to process variation σ Vth q = C ox Na W 3 L W dm PowerPC TM 40% of core area Data Cache L2 Cache Instruction $ StrongARM-110 TM 75% of core area 7
Process Variation at 90nm I Subthreshold W T 2 VT L ox V exp α V V T : Thermal voltage (25mV@room temperature) α: Sub-threshold factor (1.40~1.65) T ox : Oxide thickness 1 transistor out of 64K-Byte SRAM th T Large Leak Year min. L [nm] 1 V TH [V] 2 V TH 2004 37 (90) 0.32 0.12 2005 32 (80) 0.33 2006 28 (70) 0.34 0.09 0.06 [V] 1: Low Operating Power Process 2: MPU process Ultra-Leaky Transistor (ULT): Transistors that that leak leak beyond beyond a given Mean given constraint Large Delay 5σ Vth =0.3V 100 tr. Threshold Voltage Leakage is 1,400x higher than nominal! 330x ±σ: 68.3% ±2σ: 95.4% ±3σ: 99.7% ±4σ: 99.9936% ±5σ: 99.99994% 8
Ultra-Leaky SRAM Cells Problem Ultra-Leaky Cache Cells and Ultra-leaky leaky Cache Lines: Those containing one or more ULT Problem Ultra-leaky leaky cache cells dissipate lots of power Especially for long- standby applications, cause rapid discharge of battery ULTs Leakage (%) 30 25 20 15 10 5 0 Share of ULTs in Total Cache Leakage 0.14 Leakage of ULTs Number of ULTs 0.12 0.1 0.08 0.06 0.04 0.02 0 # ULTs (%) 90nm 80nm 70nm 9
Ultra-Leaky SRAM Cells Problem (cont d) Naïve solution Mark as faulty, replace with spare row/column Disadvantages Spares may be leaky themselves Spares should replace slow/faulty cells as well Fuse-blowing expensive and slow Aging may introduce ULTs over time Temperature may also introduce ULTs Cache SRAM cells Ultra-leaky Normal Slow Spare rows and columns 10
Our Fundamental Observation: Cell Leakage is Value-Dependant Word line Charged to V dd at inactive mode M2 M4 M6 1 M5!Q10 Q 01 1!Bit line M1 M3 ULT Bit line Our Approach: If M2, M3, or M5 is leaky, the SRAM cell is 1-leaky If M1, M4, Store M6 the is leaky, Leakage-Safe the SRAM cell Value is 0-leaky when entering standby mode 11
Flow of Operations Detect Leaky Cache Lines Suppress Leaky Cache Lines Fabricated Chip RTOS Schedules Apps. Decision to go standby Actually go standby Offline Testing/ Booting Phase System Active Mode Leakage is saved here. Suitable for long-standby low-power applications Wakeup System Standby Mode 12
Offline Testing Phase Goal: Detect location of ULTs Location accuracy: cache line or cache cell Idea ΔI DDQ Testing: If the leaky cell is sensitized,, the quiescent current reflects an abnormal change. General outline Write all 0 s, 0 then all 1 s 1 s to every cache line and measure the leakage current 13
Improvement in Leakage Yield Leakage Yield = % of chips meeting a given leakage constraint Leakage Yield (%) 100 90 80 70 60 50 40 With our technique (%) 30 Original (%) 20 10 0 200 280 360 440 520 600 680 760 840 ULT Leakage (na) Experiments: Monte Carlo simulation 1000 chips 32 Kb data + 22 Kb tag 60mv within-die V th variation Nominal values from a 90nm process th =320mv V th Nominal transistor leakage =0.345 na 14
Maximum Leakage Power Saving vs. Within-die Variation 80 Max. Leakage Saving (%) 70 60 50 40 30 20 10 0 200 260 320 380 440 500 560 620 680 740 800 860 ULT Leakage (na) V th std. deviation 60 mv 70 mv 80 mv 90 mv Nominal transistor leakage =0.345 na 15
Associated Costs Costs Power Performance Area Why to pay Run instructions to store leakage-safe values in leaky cache lines Invalidated, but later-referenced, referenced, cache contents Leakage-measurement on-chip circuitry When to pay When going to standby mode After returning from standby mode Chip design & manufacturing 16
Analysis of Costs Energy benefit & Performance cost linearly depend on the number of leaky cells cured (N)( EnergySaving( t) = N ( P t E E ) leak Perf. Penalty N ( T M Tc ) lock fetch N: Number of leaky cells cured t: Time duration spent in standby P leak : Avg. power saved per cured cache line E lock : Energy for locking leakage-safe value in the cache E fetch : Energy for fetching invalidated data if needed T M : Memory access time T c : Cache access time Leakage power saving (nw) 4000 3500 3000 2500 2000 1500 1000 500 0 9 27 45 ULTs leak 900nA ULTs leak 400 na ULTs leak 200nA 63 81 99 117 135 153 Max. Performance Penalty (ns) Results for M32R processor: 0.18u process, 200mW @ 50MHz Memory latency: 10 ns Cache latency: 1 ns 17
Effect of the Processor Used M32R ARM920 M32R ARM920 Leakage saving (nw) 4000 3000 2000 1000 0 9 27 45 63 81 99 Max. Perf. Penalty (ns) M32R: 0.18u, 200mW @ 50 MHz 117 135 153 Minimum standby duration (s) 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 900nA 800nA 400nA 200nA ARM920: 0.18u, 0.8mW / MHz 1 ULT Leakage (na) 1 http://www.arm.com 18
Summary Thanks! & + Future Q&A Work Thanks! + Q&A Presented a software technique to suppress, during standby mode, leakage of ultra-leaky leaky transistors No major hardware/circuit change required Only uses already-popular cache-control control instructions Useful even for dynamic effects such as aging and temperature Results Reduced leakage power in standby mode Salvage chips containing ULTs => higher yield for long-standby low-power applications Future work Reduce leakage power, even in active mode, by matching cache contents with the less-leaky leaky state of cache cells 19