Dual-K K Versus Dual-T T Technique for Gate Leakage Reduction : A Comparative Perspective S. P. Mohanty, R. Velagapudi and E. Kougianos Dept of Computer Science and Engineering University of North Texas email-id: smohanty@cs.unt.edu Homepage: http://www.cs.unt.edu/~smohanty/ ISQED 2006 Mohanty 1
Outline of the Talk Introduction Related Work Methods for Gate Leakage Reduction Datapath Component Library Gate Leakage Optimization Experimental Results Conclusions ISQED 2006 Mohanty 2
Why Low Power? Packaging costs Chip and system cooling costs Power supply rail Power affects Noise and reliability Environmental Battery life ISQED 2006 Mohanty 3
Power Dissipation in CMOS Power Dissipation Static Dissipation Dynamic Dissipation Sub-threshold current Gate Leakage Reverse-biased diode Leakage Contention current Capacitive Switching Gate Leakage Short circuit Source: Weste and Harris 2005 ISQED 2006 Mohanty 4
Leakages in CMOS I 1 : reverse bias pn junction (both ON & OFF) I 2 : subthreshold leakage (OFF ) I 3 : Gate Leakage current (both ON & OFF) I 4 : gate current due to hot carrier injection (both ON & OFF) I 5 : gate induced drain leakage (OFF) I 6 : channel punch through current (OFF) Gate I 3, I 4 Source Drain N + N + I 6 I 2 P-Substrate I 1 I 5 Source: Roy 2003 ISQED 2006 Mohanty 5
Power Dissipation Redistribution 10 2 300 Normalized Power Dissipation Normalized Power Dissipation 10 0 10-2 10-4 10-6 Sub-Threshold Gate Gate Leakage Leakage Gate Length Dynamic Trajectory, With High-K 0 1990 1995 2000 2005 2010 2015 2020 Physical Gate Length (nm) Physical Gate Length (nm) Chronological (Year) Source: Hansen Thesis 2004 250 200 150 100 50 ISQED 2006 Mohanty 6
Gate Leakage Paths in an Inverter Low Input: Input supply feeds tunneling current. High Input: Gate supply feeds tunneling current. V DD V DD -I gs, -I gcs ON OFF -I gd, -I gcd I gd V out =V high V in =V high V out = V low V in = V low I gd, I gcd -I gd OFF I gs, I gcs ON Low Input High Input NOTE: Gate to body component found to be negligible. ISQED 2006 Mohanty 7
Gate Leakage Reduction Techniques Research in Gate Leakage reduction is in full swing, but is not as mature as that of dynamic power or subthreshold leakage. Few methods: Dual T OX (Sultania TVLSI Dec 2005, Sultania - DAC 2004, Sirisantana - IEEE DTC Jan-Feb 2004, Mohanty VLSI Design 2006) Dual K (Mukherjee - ICCD 2005) Pin and Transistor Reordering (Sultania - ICCD 2004, Lee - DAC 2003) ISQED 2006 Mohanty 8
Related Works : Behavioral Level Subthreshold Leakage: Khouri - TVLSI 2002 : Algorithms for subthreshold leakage power analysis and reduction using dual-v Th approach. Gopalakrishnan - ICCD2003: Dual-V Th approach for reduction of subthreshold current through binding. Gate Leakage: Mohanty - VLSI Design 2006: Dual-T ox approach for reduction of gate leakage current. ISQED 2006 Mohanty 9
Related Works : Logic / Transistor Level Gate Leakage Reduction Lee - TVLSI2004 : Pin reordering to minimize gate leakage during standby positions of logic gates. Sultania TVLSI Dec 2005 and Sultania - DAC2004 : Heuristic for dual-t ox assignment for gate leakage and delay tradeoff. Sirisantana - IEEE DTC Jan-Feb 2004: Use multiple channel lengths and multiple gate oxide thickness for reduction of leakage. Mukherjee - ICCD 2005: Introduced dual-k approach for reduction of gate leakage. ISQED 2006 Mohanty 10
Key Contributions of this Paper Introduces dual dielectric assignment approach for architectural level gate leakage reduction. Presents a Simulated Annealing based optimization for gate leakage current reduction during behavioral synthesis. Compares the two approaches (Dual- Dielectric Vs Dual-Thickness i. e. Dual-K Vs Dual-T). ISQED 2006 Mohanty 11
Dual-K : Low K gate and High K gate N + N + N + N + P Low K gate Larger I gate, Smaller delay P High K gate Smaller I gate, Larger delay ISQED 2006 Mohanty 12
Dielectrics for Replacement of SiO 2 Silicon Oxynitride (SiO x N y ) (K=5.7 for SiON) Silicon Nitride (Si 3 N 4 ) (K=7) Oxides of : Aluminum (Al), Titanium (Ti), Zirconium (Zr), Hafnium (Hf), Lanthanum (La), Yttrium (Y), Praseodymium (Pr), their mixed oxides with SiO 2 and Al 2 O 3 ISQED 2006 Mohanty 13
Dual-T : Low T gate and High T gate N + N + N + N + Low T gate Larger I gate, Smaller delay P High T gate P Smaller I gate, Larger delay ISQED 2006 Mohanty 14
Dual-K Vs Dual-T Approach Assumption: All functional units have transistors of same K gate or T gate. Dual K Dual T ISQED 2006 Mohanty 15
Synthesis for Low Gate Leakage Input HDL Compilation Data Flow Graph Transformation Behavioral Scheduler for Gate Leakage Reduction Resource Allocation and Binding Datapath and Control Generation Gate Leakage and Delay Estimator Characterized Cells Dual-K or Dual-T RTL Description Logic Synthesis Physical Synthesis Output Layout Description ISQED 2006 Mohanty 16
Datapath Component Library : 3 Level Bottom-up Hierarchical Approach N + P N + Transistor Level Logic Level Datapath Components We observed that a NAND gate has least gate leakage compared to all other basic logic gates. Therefore we constructed datapath components using NAND gates. ISQED 2006 Mohanty 17
Datapath Component Library First we characterize the NAND gate using analog simulations and then characterize functional units. We assume that there are total n total NAND gates in the network of NAND gates constituting an n-bit functional unit out of which n cp are in the critical path. We do not consider the effect of interconnect wires and focus on the gate leakage current dissipation and propagation delay of the active units only. ISQED 2006 Mohanty 18
Datapath Component Library: Logic BSIM4 model based simulations used to calculate gate leakage I ox and T pd. Due to the unavailability of silicon data we used an analytical estimate for area calculations. A NAND = K inv 1+ 4 ( nin 1) AR K 1 KinvAR ( 1+ βnand) W NMOS = NMOS width f = Minimum feature size for a technology k inv = Area of minimum size inverter using f AR NAND = aspect ratio of NAND gate n in = number of inputs β NAND = ratio of PMOS width to NMOS width NAND inv W f * 1+ NMOS NAND Source: Bowman TED 2001 Aug ISQED 2006 Mohanty 19
Datapath Component Library: Logic input 00 input 01 input 10 input 11 I 00 I 01 I 10 I 11 (State 1) (State 2) (State 3) (State 4) I 00+ I 01+ I 10+ I 11 IgateNAND = (Assuming all states to be equiprobable.) 4 ISQED 2006 Mohanty 20
Datapath Component Library Gate leakage current (I gatefu ) of an n-bit functional unit is: ntotal T pd FU n T i = 1 = cp pd I NANDi gate FU = Similarly, the propagation delay and silicon area of an n-bit functional unit are: i= 1 I gate A NANDi where I gatenandi is the average gate leakage current dissipation of the i th 2-input NAND gate in the functional unit, assuming all states to be equiprobable. FU = n total A i = 1 NANDi ISQED 2006 Mohanty 21
Gate Leakage Vs Permittivity Igate(in log scale) > 0.01 0.0001 1e 06 1e 08 1e 10 1e 12 1e 14 I gatefu Vs K IgateVsK Adder Subractor Multiplier Divider Comprator Register Multiplexer 1e 16 1e 18 0 5 10 15 20 25 K > K I gatefu + ( ) α µ A = A e IgateFU 0 As the gate dielectric constant increases the gate leakage current decreases. ISQED 2006 Mohanty 22
Propagation Delay Vs Permittivity Tpd(in log scale) > 1e 06 1e 07 1e 08 T pdfu Vs K TpdVsK Adder Subractor Multiplier Divider Comparator Register Multiplexer T pdfu ( ns) A2+ = C e A1 A2 K K 0 * 1+ β K e K α + T, 2.5 K 6 pdfu 0, 6 K 30 1e 09 0 5 10 15 20 25 K > As the gate dielectric constant increases the propagation delay increases. ISQED 2006 Mohanty 23
Gate Leakage Vs SiO 2 Thickness I oxfu Vs T ox I gatefu ( ) ( ) µ A = Aexp Tox + β As the gate oxide thickness increases the gate leakage current decreases. α ISQED 2006 Mohanty 24
Propagation Delay Vs SiO 2 Thickness T pdfu Vs T ox T A A ( ) ( 1 2 ns ) A 2 pdfu = + ν ox As the gate oxide thickness increases the propagation delay increases. 1 + T β ISQED 2006 Mohanty 25
Silicon Area Vs SiO 2 Thickness A FU Vs T ox A FU ( 2 nm ) = α Tox + β As the gate oxide thickness increases the area increases. ISQED 2006 Mohanty 26
Simulated Annealing for Optimization Analogous to the annealing process, the mobility of nodes in a DFG is dependent on the total available resources. Nodes of a DFG are analogous to the atoms and temperature is analogous to the total number of available resources. To maximize the leakage reduction we need to ensure that a node can be scheduled in such a way that a higher thickness (or dielectric) resource can be assigned. The chance of assigning a higher thickness (or dielectric) resource is higher if the total number of available higher thickness resources is higher. ISQED 2006 Mohanty 27
Optimization Algorithm Simulated Annealing Algorithm (UDFG, DTF, LRM) (01) Available Resources (02) While there exists a schedule with available resources. (03) i = Number of iterations (04) Perform resource constrained ASAP and ALAP (05) Initial Solution ASAP Schedule (06) S Allocate Bind() (07) Initial gate leakage gate leakage(s) (08) While (i > 0) (09) Generate a random Tox in range (Tox Tox, Tox + Tox) (10) Generate random transition from S to S* (11) I gate leakage(s) gate leakage(s*) (12) if( I > 0 ) then S S* (13) i i 1 (14) end While (15) Decrement available resources (16) end While (17) return S ISQED 2006 Mohanty 28
Experimental Results Critical path delay of the circuit is the sum of the delays of the vertices in the longest path of the DFG for single cycle case and number of control steps times slowest delay resource for multicycling or chaining case. The delay trade-off factor (DTF) is used to provide various time constraints for our experiments. ISQED 2006 Mohanty 29
Experimental Results While calculating the gate leakage current for single thickness, we used a nominal 1.4nm thickness and SiO 2 (K=3.9) is used as a nominal dielectric value from BSIM4.4.0 model. For dual thickness approach the following pair is considered: 1.4nm 1.7nm. For dual dielectric approach the following pair is considered: SiO 2 (K=3.9) Si 3 N 4 (K=7). The results take into account the gate leakage current, area and propagation delay of functional units, interconnect units, and storage units present in the datapath circuit. ISQED 2006 Mohanty 30
Experimental Results DCT ARF x 10 4 x 10 4 1.8 2.4 1.6 2.2 Area in µ m 2 1.4 1.2 Area in µ m 2 2 1.8 1 1.6 0.8 600 400 Delay in ns 200 0 1000 2000 6000 5000 4000 3000 Gate Tunneling Current in µ A 1.4 600 400 Delay in ns 200 0 0 2000 8000 6000 4000 Gate Tunneling Current in µ A Each layer corresponds to a different resource constraint, each time the number of T oxh multipliers are decreased a new layer is formed. We observed that the number of design corners reduces when we use more multipliers of T oxh thickness, since delay increases and mobility of the nodes is restricted in order to satisfy the time constraint. ISQED 2006 Mohanty 31
Experimental Results Average % Reduction Of Tunneling Current 90 85 80 75 70 65 Average % Reduction Of Tunneling Current Dual T Dual K ARF BPF DCT EWF FIR HAL IIR LMS ISQED 2006 Mohanty 32
Conclusions and Future Works A comparison of dual thickness and dual dielectric approaches for reduction of gate leakage during behavioral synthesis is presented. A simulated annealing based algorithm for simultaneous scheduling and binding of functional units is introduced. Tradeoff between gate leakage, area and performance is explored. Both approaches for gate leakage reduction account for the ON as well as OFF state. ISQED 2006 Mohanty 33
Conclusions and Future Works Experiments prove significant reductions in gate leakage current without performance penalty. The method of using dual dielectric is proven to be more productive than the dual thickness approach. This work on gate leakage will be extended to provide a broader solution to the problem of power dissipation in all its forms at the behavioral level. Dual-K or Dual-T based design may need more masks for the lithographic process during fabrication compared to single-k or single-t. ISQED 2006 Mohanty 34