Leakage Control Techniques for Designing Robust, Low Power Wide-OR Domino Logic for Sub-130nm CMOS Technologies

Similar documents
Double Stage Domino Technique: Low- Power High-Speed Noise-tolerant Domino Circuit for Wide Fan-In Gates

CHAPTER 3 PERFORMANCE OF A TWO INPUT NAND GATE USING SUBTHRESHOLD LEAKAGE CONTROL TECHNIQUES

Design of High Performance Arithmetic and Logic Circuits in DSM Technology

RELIABILITY ANALYSIS OF DYNAMIC LOGIC CIRCUITS UNDER TRANSISTOR AGING EFFECTS IN NANOTECHNOLOGY

t Microprocessor Research Laboratories, Intel Corporation, Hillsboro, OR

1. Short answer questions. (30) a. What impact does increasing the length of a transistor have on power and delay? Why? (6)

Leakage Current Analysis

Minimizing the Sub Threshold Leakage for High Performance CMOS Circuits Using Stacked Sleep Technique

Domino Static Gates Final Design Report

High-Performance of Domino Logic Circuit for Wide Fan-In Gates Using Mentor Graphics Tools

PERFORMANCE ANALYSIS ON VARIOUS LOW POWER CMOS DIGITAL DESIGN TECHNIQUES

Wide Fan-In Gates for Combinational Circuits Using CCD

Comparison of Power Dissipation in inverter using SVL Techniques

Transistor Network Restructuring Against NBTI Degradation. P. F. Butzen a, V. Dal Bem a, A. I. Reis b, R. P. Ribas b.

EEC 118 Lecture #12: Dynamic Logic

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

Performance Analysis of Novel Domino XNOR Gate in Sub 45nm CMOS Technology

Low Power Realization of Subthreshold Digital Logic Circuits using Body Bias Technique

Reduced Swing Domino Techniques for Low Power and High Performance Arithmetic Circuits

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

Energy Efficiency of Power-Gating in Low-Power Clocked Storage Elements

Sleepy Keeper Approach for Power Performance Tuning in VLSI Design

Power-Area trade-off for Different CMOS Design Technologies

PROCESS and environment parameter variations in scaled

Reducing the Sub-threshold and Gate-tunneling Leakage of SRAM Cells using Dual-V t and Dual-T ox Assignment

Design of Low Power Vlsi Circuits Using Cascode Logic Style

Leakage Power Reduction for Logic Circuits Using Variable Body Biasing Technique

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

[Sri*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Dynamic-static hybrid near-threshold-voltage adder design for ultra-low power applications

4 principal of JNTU college of Eng., JNTUH, Kukatpally, Hyderabad, A.P, INDIA

High Speed Low Power Noise Tolerant Multiple Bit Adder Circuit Design Using Domino Logic

A NEW APPROACH FOR DELAY AND LEAKAGE POWER REDUCTION IN CMOS VLSI CIRCUITS

A Novel Low-Power Scan Design Technique Using Supply Gating

Ultra Low Power VLSI Design: A Review

COMPARISON AMONG DIFFERENT CMOS INVERTER WITH STACK KEEPER APPROACH IN VLSI DESIGN

ISSN:

RECENT technology trends have lead to an increase in

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

LEAKAGE POWER REDUCTION IN CMOS CIRCUITS USING LEAKAGE CONTROL TRANSISTOR TECHNIQUE IN NANOSCALE TECHNOLOGY

High-performance, Low-power, and Leakage-tolerance Challenges for Sub-70nm Microprocessor Circuits

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

A Literature Review on Leakage and Power Reduction Techniques in CMOS VLSI Design

Power Efficient and Noise Immune Domino Logic for Wide Fan in Gates

ESTIMATION OF LEAKAGE POWER IN CMOS DIGITAL CIRCUIT STACKS

Low Power Design of Successive Approximation Registers

Topic 6. CMOS Static & Dynamic Logic Gates. Static CMOS Circuit. NMOS Transistors in Series/Parallel Connection

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm

Comparative Study of Different Low Power Design Techniques for Reduction of Leakage Power in CMOS VLSI Circuits

Design of low power SRAM Cell with combined effect of sleep stack and variable body bias technique

Design of 32-bit ALU using Low Power Energy Efficient Full Adder Circuits

Low Power Design Techniques for Reduction of Leakage Power in CMOS VLSI Circuits using Modified Sleepy Keeper

Power Spring /7/05 L11 Power 1

Low-Power Digital CMOS Design: A Survey

EEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis

Microelectronics Journal

An Analysis of Novel CMOS Ring Oscillator Using LECTOR Technique with Minimum Leakage

Designing of Low-Power VLSI Circuits using Non-Clocked Logic Style

Announcements. Advanced Digital Integrated Circuits. Quiz #3 today Homework #4 posted This lecture until 4pm

STATIC POWER OPTIMIZATION USING DUAL SUB-THRESHOLD SUPPLY VOLTAGES IN DIGITAL CMOS VLSI CIRCUITS

Investigation on Performance of high speed CMOS Full adder Circuits

INTEGRATION, the VLSI journal

UNIT-1 Fundamentals of Low Power VLSI Design

A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting

Comparison of Leakage Power Reduction Techniques in 65nm Technologies

Design and Analysis of Sram Cell for Reducing Leakage in Submicron Technologies Using Cadence Tool

IJMIE Volume 2, Issue 3 ISSN:

Effect of Device Scaling for Low Power Environment. Vijay Kumar Sharma

Characterization of Variable Gate Oxide Thickness MOSFET with Non-Uniform Oxide Thicknesses for Sub-Threshold Leakage Current Reduction

DAT175: Topics in Electronic System Design

International Journal of Innovative Research in Technology, Science and Engineering (IJIRTSE) Volume 1, Issue 1.

Leakage Power Reduction by Using Sleep Methods

Design and Optimization of Half Subtractor Circuits for Low-Voltage Low-Power Applications

Active Decap Design Considerations for Optimal Supply Noise Reduction

AS very large-scale integration (VLSI) circuits continue to

Chapter 6 Combinational CMOS Circuit and Logic Design. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

CHAPTER 3 NEW SLEEPY- PASS GATE

Leakage Power Reduction in CMOS VLSI

NOVEL OSCILLATORS IN SUBTHRESHOLD REGIME

Sub-threshold Leakage Current Reduction Using Variable Gate Oxide Thickness (VGOT) MOSFET

High Performance and Low power VLSI CMOS Circuit Designs using ONOFIC Approach

An Overview of Static Power Dissipation

Design and Implementation of Digital CMOS VLSI Circuits Using Dual Sub-Threshold Supply Voltages

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 60, NO. 4, APRIL

Impact of Leakage on IC Testing?

Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law. Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs

INTERNATIONAL JOURNAL OF APPLIED ENGINEERING RESEARCH, DINDIGUL Volume 1, No 3, 2010

12-nm Novel Topologies of LPHP: Low-Power High- Performance 2 4 and 4 16 Mixed-Logic Line Decoders

A DUAL-EDGED TRIGGERED EXPLICIT-PULSED LEVEL CONVERTING FLIP-FLOP WITH A WIDE OPERATION RANGE

DESIGNING powerful and versatile computing systems is

Dynamic Logic. Domino logic P-E logic NORA logic 2-phase logic Multiple O/P domino logic Cascode logic 11/28/2012 1

Noise Tolerance Dynamic CMOS Logic Design with Current Mirror Circuit

A High Performance IDDQ Testable Cache for Scaled CMOS Technologies

Design of Ultra-Low Power PMOS and NMOS for Nano Scale VLSI Circuits

Design of Single Phase Continuous Clock Signal Set D-FF for Ultra Low Power VLSI Applications

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

Reduction Of Leakage Current And Power In CMOS Circuits Using Stack Technique

Transcription:

Leakage Control Techniques for Designing Robust, Low Power Wide-OR Domino Logic for Sub-30nm CMOS Technologies Bhaskar Chatterjee, Manoj Sachdev Ram Krishnamurthy * Department of Electrical and Computer Engineering * Microprocessor Research, Intel Labs University of Waterloo Intel Corporation Waterloo, ON, Canada Hillsboro, OR, US bhaskar@vlsi.uwaterloo.ca ram.krishnamurthy@intel.com Abstract In this paper, we discuss the design of leakage tolerant wide- OR domino gates for deep submicron (DSM), bulk CMOS technologies. Technology scaling is resulting in 3-5x increase in transistor I OFF /µm per generation resulting in 5%-30% noise margin degradation of high performance domino gates. We investigate several techniques that can improve the noise margin of domino logic gates and thereby ensure their reliable operation for sub-30nm technologies. Our simulations indicate that, selective usage of dual V TH transistors shows acceptable energy-delay tradeoffs for the 90nm technology. However, techniques like supply voltage (V cc ) reduction and using non-minimum L e transistors are required in order to ensure robust and scalable wide-or domino designs for the 70nm generation.. Introduction Aggressive technology scaling over the past 30 years has resulted in improved circuit performance and allowed designers to achieve unprecedented levels of on-die integration. However, as the transistor threshold voltage is scaled, there is a 3-5x increase in the off-state current (I OFF ) per generation. As a result, ensuring low power operation of complex ICs has become a major design challenge, especially for mobile and battery operated devices [, 2, 4, 0, 4]. Figure shows the scaling trends of the threshold voltage (V TH ) and I ON /I OFF ratio for both high and low V TH transistors for sub 30nm technologies using the Berkeley Predictive Technology Models [3]. Our simulations indicate that, as the technology is scaled from 30nm to 70nm, the transistor I ON /I OFF ratio degrades by 26x for the high V TH and 42x for the low V TH cases. It is expected that the exponential increase in leakage current will offset the savings in switching energy (CV 2 scaling) obtained from technology scaling [8, 4]. Furthermore, the degraded transistor I ON /I OFF ratio, scaled device geometries and power supply voltage, ever increasing switching frequency are all contributing to reduced noise margins for DSM domino logic gates. In fact, the noise margin of wide-or domino gates is being degraded by 5%- 30% per generation []. Such gates are normally used in the design of high performance register files (RFs) []. Wide- OR domino gates are especially susceptible to leakage induced false evaluations due to the presence of multiple pulldown paths. This is expected to seriously compromise their reliable operation in future DSM technologies. Thus, there exists the need to investigate techniques that can reduce leakage current and improve circuit robustness while minimizing associated performance overheads. In this paper, we investigate the following techniques in the context of wide-or domino gates: Upsized p-mos keeper [9] Selective usage of dual V TH [5, 5] Pseudo-static technique [] Selective usage of non-minimum L e transistors [7, 6] Supply voltage reduction [6, 3] I ON /I OFF (n-mos transistor) 00000 0000 000 0 0 C, typical corner, nominal V cc I ON /I OFF (high V TH ) 26x reduction High V TH I ON /I OFF (low V TH ) 42x reduction Low VTH 00 40 70 00 30 Technology generation (nm) 300 250 200 50 00 Figure : I ON /I OFF and V TH scaling for sub 30nm generations We study the impact of the above techniques on the following parameters: propagation delay, leakage and switching energy, and DC robustness. The rest of the paper is organized as follows: in Section 2 we discuss the design of wide-or domino gates and quantify the DC robustness degradation caused by technology scaling. In Sections 3 and 4, we present the different techniques, and their associated design tradeoffs for the 90nm and 70nm technologies. Section 5 is for conclusions. 2. Wide Domino: Design and Robustness Scaling Wide-OR domino gates are used in the design of local and global bit lines (LBL, GBL) of high performance RFs. Figure 50 0 Threshold voltage (mv)

2 shows an 8-wide domino gate with 2-stack n-mos pulldown implemented using the compound domino logic (CDL). In addition to the 2-stack pulldowns, high performance functional unit blocks (FUBs) also use single n- MOS pulldowns (GBLs). The inputs to the pulldown network are normally domino compatible. This allows removal of the clocked footer transistor, reduces the stack height, improves performance and lowers switching energy. In this paper, we consider the worst-case conditions for both DC robustness and propagation delay. As indicated in Figure 2, the worst-case gate delay occurs when only one of the pulldown paths is selected and the wide-or gate operates as a high performance MUX. During the evaluation phase (CLK=), if the gate signals of both transistors are high (A 0, B 0 =), the dynamic node evaluates to ground (Dyn_node=0) resulting in the static gate output transitioning to V cc (OUT=). Typically, in RF applications, the signals B 0 -B 7 are setup ahead of time while the MUX select signals (A 0 -A 7 ) are timing critical []. This fact will subsequently be exploited in the selective assignment of dual V TH and non-minimum L e for the 90nm and 70nm designs. Keeper Clk Dyn_node A 0 A 7 B 0 B 7 8 parallel nmos pulldowns OUT From wide-or gate I leak *no. of pulldown paths Figure 2: Wide-OR domino gate for RFs (LBL organization) In this paper, we consider DC robustness as our metric for determining noise margin of wide-or domino gates. The DC robustness is defined with respect to the node OUT (for both 2-nMOS LBL, and -nmos GBL pulldowns) and can be better understood with the help of the simulation waveforms shown in Figure 3. DC robustness waveforms are obtained under worst-case leakage conditions when the signals A 0 -A 7 are subjected to DC noise (simulated using a slow ramp signal). The voltage when the wide-or domino output (OUT) equals the input, is identified as the unity gain noise margin (UGNM) point. DC robustness for a given technology is defined as the normalized UGNM (UGNM/V cc ). This definition for DC robustness (UGNM) is well established in the context of leakage tolerant domino logic design [9,, 2]. The results shown in Figure 3 indicate that, a 5% p-mos keeper results in DC robustness of ~7% for an 8-wide domino gate for the 30nm technology under worst-case conditions. We use this as our reference design to set the target DC robustness for the 90nm and 70nm technologies. This allows us to compare the different techniques and quantify various design tradeoffs. It is possible to set a different absolute value for the robustness threshold, but the general trends and energy-delay tradeoffs would still remain unaffected. DC robustness waveforms (V).5.2 CLK 0 0 C, typical corner, nominal V cc OUT 0.3 Input noise: A 0 -A 7 UGNM: ~7%V CC 0.4.9 2.4 2.9 Time (ns) Figure 3: DC robustness waveforms for 30nm Figure 4 shows the impact of technology scaling on DC robustness for the 8-wide, LBL with 5% p-mos keeper. Our results indicate that, for the 90nm (70nm) technology, there is 24% (4%) degradation in DC robustness. It should be noted that the data in Figure 4 for the 30nm and 90nm technologies, correspond to all low-v TH designs. On the other hand, the data for 70nm corresponds to a dual V TH design. This is because an all-low V TH 70nm design shows unacceptable noise margin under worst-case conditions and fails to operate due to excessive transistor leakage. The DC robustness for wide-or domino gates with -nmos pulldown also shows similar scaling trends as those in Figure 4. It is clear from these results that, the 3-5x increase in I OFF current per generation will significantly degrade the noise margin of high performance domino logic gates resulting in possible false evaluations. Therefore, we need to explore alternate design/leakage control techniques that improve DC robustness and allow reliable operation of DSM domino gates. Wide-OR domino DC robustness 0.2 0.6 0.2 0.08 0.04 Typical corner, 0 0 C, 5% kpr Robustness threshold: ~7% -24% - 4% 0 30nm 90nm 70nm Figure 4: Wide-OR domino DC robustness scaling trends

3. Techniques for Improving Robustness In this section we discuss some of the different techniques that can be used to improve the UGNM and robustness of wide-or domino gates for DSM technologies. We present the energy-delay tradeoffs associated with the techniques mentioned earlier, discuss their applicability to both 2-stack and -stack domino designs (LBL and GBL) and show their scaling trends for the 90nm and 70nm generations. 3. Keeper Upsizing The simplest technique to improve domino logic noise margin is to strengthen the p-mos pullup keeper. This ensures that the normally ON p-mos transistor sources a larger linear mode current to offset the increased I OFF current of the pulldown network. Our simulations indicate that, the p-mos keeper has to be upsized by 2x (2.3x) for the 90nm (70nm) generations to maintain iso-robustness (UGNM ~7%). As the keeper size is increased, it contends with the pulldown network, resulting in increased propagation delay and switching energy. Figure 5 shows the energy-delay tradeoffs for an 8-wide 2-stack LBL design for the 90nm and 70nm generations using upsized keepers. Our results indicate that, when upsized keepers are used to meet the noise margin threshold, there is a 2%-6% delay degradation, and ~2% increase in switching energy. In addition, there is an %- 4% reduction in leakage energy. This results from the fact that the dynamic node is firmly anchored to V cc (reduced DC droop) causing less subthreshold leakage in the subsequent static NAND gate. This technique is simple and can be used for domino gates with both 2-stack and -stack (LBL, GBL) n-mos pulldowns. However, it is clear that the energy-delay tradeoffs associated with keeper upsizing are not favourable for designing high performance datapaths. Normalized energy-delay plots.2. 90nm 70nm Figure 5: Impact of upsized keeper on DSM domino gates 3.2 Dual V TH Technique Typical corner, 0 0 C simulations 90nm 70nm 90nm 70nm The dual-v TH technique is based on the selective usage of low and high threshold transistors to minimize leakage current while limiting the delay degradation. The high V TH transistors help in the reduction of leakage current and charge loss from the dynamic node thereby improving the UGNM. The 2-stack LBL domino gates are organized such that the gate signal for the bottom transistors B 0 -B 7 are connected to the local bitcells and are setup ahead of time. However, the performance critical Read Select signals typically drive long interconnects and are connected to the transistors A 0 -A 7. Under worst-case conditions, these signals may be subjected to input noise while signals B 0 -B 7, are held at V cc and are ON. Consequently, transistors A 0 -A 7 determine the domino gate leakage and worst-case UGNM. In the dual-v TH scheme, we use high V TH for these transistors, while low V TH transistors are used for B 0 -B 7 to limit the overall performance degradation. Figure 6 shows the simulation results indicating the energy-delay tradeoffs involved with a dual-v TH LBL scheme for the 90nm technology. Normalized energy-delay plots.2 90nm, 0 0 C, typical corner, 3% kpr 0.4 Figure 6: Dual V TH domino logic energy-delay tradeoffs for 90nm Our results indicate that, the reduction in leakage current associated with the dual-v TH technique, allows us to use a weaker p-mos keeper (3%) to meet the noise margin threshold. Therefore, for the 90nm technology, it is possible to limit the delay degradation to within 2%. The selective usage of high V TH transistors also allows 4% reduction in leakage energy. In addition, the weaker p-mos keeper results in less pulldown contention allowing a.5% savings in switching energy. However, for the 70nm technology, the leakage current of both the high and low V TH transistors increase by 3-5x. As a result, the dual-v TH technique needs to be used in conjunction with upsized p-mos keeper to meet the robustness threshold. Therefore, to maintain iso-robustness, a dual-v TH LBL design needs 2.3x (.3%) p-mos keeper, which results in 6% delay degradation. Further more, the dual-v TH technique cannot be used effectively for designing robust -stack wide domino gates. Thus, GBL designs require an all-high V TH pulldown with a.9x (9.5%) upsized keeper resulting in 0% delay degradation. In both cases, the upsized keeper results in ~2% increase in switching energy due to extra contention during evaluation. Thus, it is clear from the above results that, for the 70nm generation, the dual-v TH technique alone, cannot guarantee robust operation of wide-or domino logic gates.

3.3 Pseudo-Static Technique The pseudo-static technique [] has been advanced as a means for designing robust wide-or domino logic gates for DSM technologies. In this section we briefly study this technique and discuss its impact on LBL, GBL designs. The pseudo-static circuit technique is explained with the help of Figure 7. This technique improves the UGNM by reducing the leakage current and dynamic node charge loss through transistors N2-N6. Firstly, the order of the pulldown n-mos transistors is reversed, whereby the performance critical signals (A 0 -A 7 ) are connected to the bottom of the LBL stack. Secondly, a minimum sized p-mos transistor (P) is used to pullup the internal stack node voltage (V X ) to V cc for all deselected paths. Keeper B 0 # B 7 # N2 V X P A 0 A 7 N Clk Figure 7: Robust domino design using pseudo-static scheme Thirdly, a 2 input static NOR gate is used to turn OFF transistor N2 in case the pulldown path is deselected (A 0 =0). This scheme ensures that both transistors in the n-mos stack are OFF, N2 has a higher effective threshold voltage (reverse body bias and reduced DIBL effect) and a negative V GS bias voltage. As a result, there is significant reduction in leakage current though N2, resulting in improved UGNM. In fact, our simulations indicate that it is possible to maintain iso-robustness for the 70nm technology, while using an all low V TH n-mos pulldown and 3% p-mos keeper. However, the above technique suffers from several drawbacks that result in delay degradation, and increased overall switching and leakage energy:. The reversal of transistor order results in performance critical signals (A 0 -A 7, Read Selects) being placed further from the gate output. 2. The p-mos transistor (P-P7) adds additional capacitance to the intermediate node V X and precharges the node to V cc. This is unlike the normal LBL design where the data is setup ahead of time, pre-discharging the corresponding node to ground. 3. The critical path has an extra stage of inversion due to the 2-input NOR gate. Further more, the NOR gate has to be designed in order to aid the 0 transition, resulting in increased p-mos transistor widths. As a result, there P7 N6 N5 OUT From wide-or domino gate is increased leakage through the deselected NOR gates and added capacitive loading at the intermediate node V X. 4. When a particular pulldown path is deselected (A 0 =0), the pmos transistor (P) turns ON, and the voltage across N (V X ) approaches V cc. The final steady-state voltage is reached when the I OFF current of N2 and linear current of P equal the I OFF of N. Our simulations for the 70nm technology indicate that, under worst-case conditions, the V X node voltage equals ~5V cc. This implies that even though the leakage current through N2 is reduced resulting in improved UGNM, the overall leakage current is actually increased, with the extra current flowing through the parallel path formed by transistors P-N. 5. The extra capacitance introduced by P-P7 and NOR gates result in higher switching energy. 6. This technique depends on the availability of the intermediate node V X and is therefore not suitable for robust GBL designs with single n-mos pulldown stacks. The above drawbacks associated with the pseudo-static technique, offset the delay improvements resulting from an all low V TH pulldown and 3% p-mos keeper design. This is clear from the energy-delay tradeoffs for the 70nm LBL design as shown in Figure 8. Our simulations indicate that, the pseudo-static LBL meets the DC robustness threshold, while resulting in a 9% delay penalty. In addition, there is an 8% increase in switching energy, with 4% savings in leakage energy. This implies that the static-nor delay and leakage (2 p-mos stack upsized for improved performance) degrade the overall switching and leakage energy of the wide-or domino gate. In addition, the worst-case noise margin for -stack n- MOS pulldown degrades with scaling and cannot be improved using this circuit technique. Normalized energy-delay plots.2. 70nm, typical corner, 0 0 C, 3% kpr Figure 8: Pseudo-static LBL energy-delay plots for 70nm technology 4. Non-minimum L e, Scaled V cc : Robust 70nm design In this section, we focus on the selective usage of nonminimum channel length (L e ) transistors and supply voltage

scaling on wide-or domino gates for the 70nm generation. We first investigate the effect of both these techniques on the I ON -I OFF plane at the transistor level, and then discuss the energy-delay tradeoffs associated with both LBL (2-stack) and GBL (-stack) organizations. 4.. Transistor Level I ON -I OFF Tradeoffs There are several different techniques that can be used to reduce transistor leakage current. Among these techniques, some depend on supply voltage reduction, while others are based on increasing the transistor threshold. The reduction of power supply has a twofold impact on leakage power: there is a reduction in transistor DIBL current and lowering of the V cc.i OFF product. On the other hand, increasing the transistor channel length results in higher threshold voltage. This in turn results in an exponential reduction of the weak inversion current. However, both of these techniques also result in reduced transistor I ON current [ ( ) V V α ] and cause performance degradation. A technique that offers larger leakage power/energy reductions with minimum delay degradation is more efficient and is suitable for robust, high performance logic designs. Figure 9 compares the effectiveness of two techniques for the 70nm technology using transistor level simulations when the supply voltage is reduced by 25%, and the channel length is increased by 33%, respectively. We compare these two techniques in the [V cc.i OFF ]-[V cc /I ON ] plane. The first term is the leakage power while the second term reflects the delay degradation associated with each technique. Normalized V CC.I OFF 0.4 ~5% delay reduction B ~30% leakage power reduction 0.2 5.05..5.2.25.3.35.4 Normalized V CC /I ON Figure 9: Leakage techniques compared for 70nm technology Our simulation results indicate that, lowering the power supply is a more efficient leakage control technique than using non-minimum L e since it results in less delay degradation. It is clear from data points A and B, that for the same amount of leakage power, supply scaling offers 5% less delay degradation. Conversely, for the same delay (points A and C), there is ~30% lower leakage power consumption. In addition, there is a quadratic savings in switching energy resulting from supply voltage scaling as opposed to a near cc TH 70nm, 0 0 C, typical corner simulations Baseline point, nominal V cc, channel length A C Supply scaling non-minimum L e linear increase associated with using non-minimum channel length transistors. This increase can be attributed to an increase in switching capacitance due to higher effective WL. e product of the transistors. 4.2. Robust, Energy Efficient 70nm Wide-OR Domino In this section, we study the impact of the above techniques on 8-wide, 2-stack pulldown 70nm domino logic gates. Both these techniques are also applicable to -stack n-mos pulldown (GBL) domino designs. In this study, the domino supply voltage was lowered up to 28%. The channel lengths of transistors (A 0 -A 7 ) were increased (up to 33%) while those at the bottom (B 0 -B 7 ) were left unchanged. This is similar to the approach adopted for the dual-v TH design as described earlier in Section 3.2. Figure 0 shows the impact of non-minimum L e transistors on LBL designs while meeting the noise margin threshold at each data point. As the channel length is increased, the leakage current reduces allowing downsizing of the p-mos keeper (.3% 6%). It is clear from these results, that the reduction in leakage energy is compensated for by an increase in switching energy. Therefore, the reduction in total energy depends on the relative ratio of the switching and leakage energy components. In addition, the reduction in I OFF depends on the proportion of the weak inversion current in the total off-state current. Our results indicate that, with the selective usage of non-minimum L e transistors (L e +33%), the propagation delay degrades by ~4% while resulting in ~2% savings in total energy. It should be noted that the weakened keeper helps limit the delay impact associated with this technique to within 4%. Normalized energy 0.7 0.5 0.4 70nm, 0 0 C, typical corner simulations Keeper size decreasing:.3% to 6%.05.03.0 9 0.3 7 5.05.5.25.35 Normalized channel length Figure 0: Energy-delay plots for 70nm using non-min. L e The results in Figure correspond to the case when the supply voltage is reduced from the nominal value to 0.72V cc. All the data points correspond to 7% DC noise margin. As the supply voltage is scaled, there is a Normalized delay

corresponding reduction in leakage current allowing the p- MOS keeper to be downsized from.3% to 5%. Our results indicate that when the power supply is scaled by 4%, the delay degradation is ~4% allowing ~35% reduction in total energy. This implies that limited supply voltage scaling can be used for DSM wide-or domino logic gates to ensure robust designs and low power operation while limiting performance penalty to within acceptable limits. A similar 4% scaling of the power supply for the GBL results in ~5% delay degradation with 38% savings in total energy. Normalized energy.3. 0.7 0.5 Figure : Energy-delay plots for 70nm with supply scaling 5. Conclusion 70nm, 0 0 C, typical corner simulations Total energy switching+leakage 0.3 9 0.7. Supply scaling (normalized) In this paper, we discussed the impact of technology scaling on domino logic gates. In particular, we focussed on the noise margin degradation of wide-or domino gates. We compared several different circuit and leakage control techniques that can be used to ensure robust domino logic operation for the sub-30nm generations. Our results indicate that while dual-v TH technique is suitable for the 90nm technology, limited supply voltage scaling (0%- 5%) followed by usage of non-minimum L e transistors demonstrate improved energy-delay tradeoffs for the 70nm generation. It is expected that such techniques will ensure robust, low-power operation of high performance DSM domino logic gates. 6. Acknowledgements Keeper size decreasing:.3% to 5% Supply scaled ~4%.29.24.9.4.09.04 Authors would like to acknowledge O. Semenov, S. Naraghi and C. Kwong from the University of Waterloo, and S. Hsu and S. Borkar from Intel Corp. for encouragement and support. Normalized delay [2] A. P. Chandrakasen, S. Sheng, and R. W. Brodersen, Low power CMOS Digital Design, IEEE Journal of Solid State Circuits, vol. 27, no. 4, pp. 473-484, 992. [3] http://www-device.eecs.berkeley.edu: BSIM3 00nm and 70nm predictive technology process files. [4] V. De, and S. Borkar, Technology and Design Challenges for Low Power and High Performance, Proceedings of the International Symposium on Low Power Design, pp. 63-68, 999. [5] K. Roy, S. Mukhopadhyay, and H. M. Meimand, Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits, Proceedings of the IEEE, vol. 9, no. 2, pp. 305-327, Feb. 2003. [6] M. R. Stan, Optimal Voltages and Sizing for Low Power, 2 th IEEE International Conference on VLSI Design, pp. 428-433, 999. [7] N. Sirisantana, L. Wei, and K. Roy, High-Performance Low- Power CMOS Circuits Using Multiple Channel Length and Multiple Oxide Thickness, Proceedings of the International Conference on Computer Design, pp. 227-232, 2000. [8] T. Kuroda, CMOS Design Challenges to Power Wall, International Conference on Microprocessors and Nanotechnology, pp. 6-7, 200. [9] S. O. Jung, K. W. Kim, and S. Kang, Noise Constrained Power Optimization for Dual V T Domino Logic, Proceedings of the International Symposium on Circuits and Systems, pp.58-6, 200. [0] A. Chandrakasan, W.J. Bowhill, and F. Fox, Design of High Performance Microprocessor Circuits. IEEE Press, Piscataway, N.J., 2000. [] R. Krishnamurthy, A. Alvandpour, G. Balamurugan, N. Shanbag, K. Soumyanath, and S. Borkar, A 30nm 6-GHz 256x32 bit Leakage-Tolerant Register File, IEEE Journal of Solid State Circuits, vol. 37, no. 5, pp. 624-632, May 2002. [2] S. Thompson, I. Young, and M. Bohr, Dual Threshold and Substrate Bias: Keys to High Performance, Low Power, 0.µm Logic Designs, Symposium on VLSI Technology, pp. 69-70. [3] R. Krishnamurthy, S. Hsu, M. Anders, B. Bloechel, B. Chatterjee, M. Sachdev, and S. Borkar, Dual supply voltage clocking for 5GHz 30nm integer execution core, Symposium on VLSI Circuits, pp. 28-29, 2002. [4] T. Kuroda, Low-Power, High Speed CMOS VLSI Design, Proceedings of the IEEE Conference on Computer Design, pp. 30-35, 2002. [5] J. T. Kao, and A. Chandrakasen, Dual-Threshold Voltage Techniques for Low-Power Digital Circuits, IEEE Journal of Solid State Circuits, vol. 35, no. 7, pp. 009-08, July 2000. [6] B. Chatterjee, M. Sachdev, S. Hsu, R. Krishnamurthy and S. Borkar, Effectiveness and Scaling Trends of Leakage Control Techniques for Sub-30nm CMOS Technologies, Proceedings of the International Symposium of Low Power Electronics and Design, pp. 22-27, 2003. 7. References [] J. D. Meindl, Low Power Microelectronics: Retrospect and Prospect, Proceedings of the IEEE, vol. 83, no. 4, pp. 69-635, 995.