CHAPTER 3 NEW SLEEPY- PASS GATE

56 CHAPTER 3 NEW SLEEPY- PASS GATE 3.1 INTRODUCTION A circuit level design technique is presented in this chapter to reduce the overall leakage power in conventional CMOS cells. The new leakage po leepy- es on static power dissipation in standby mode of operation using sleep transistors in pass gate (transmission gate) structure. However, unlike the sleep transistor technique, the Sleepy-pass gate technique retains the exact logic state; and, unlike the LECTOR technique, the Sleepy-pass gate technique can utilize high V th transistors with 180X (or greater compared with CMOS NAND gate at 100nm process) leakage savings. Therefore, far better than many prior approaches, the Sleepy-pass gate technique can achieve ultra low leakage power consumption while saving state. First, the structure of the Sleepy-pass gate technique is explained using a two input NAND gate. Then the detail of Sleepy-pass gate operation is described in active mode and sleep (standby) mode. 3.2 SLEEPY-PASS GATE 3.2.1 Structure of Sleepy-pass Gate The Figure 3.1 represents a pair of complementary MOS transistors connected in parallel known as the CMOS pass gate (transmission gate)

57 configuration, which pass both 0 and 1 well. When the sleep signal on gate terminal of PMOS is a Logic 0, then its complement sleep_bar (Logic 1) is applied to gate terminal of NMOS, allowing both transistors to conduct and pass the signal at IN to OUT. When the sleep signal on gate terminal of PMOS is a Logic 1, complementary Logic 0 is applied to NMOS gate, turning both transistors off and forcing a high-impedance condition on both IN and OUT nodes. This high-impedance condition represents third "state" (high, low, or high-z). Thus, pass gate acts as an open circuit offering high resistance. This design acts as a voltage controlled resistor connecting input and output providing true bidirectional connectivity without degradation of the input signal. Figure 3.1 Pass gate (transmission gate) logic The Figure 3.2 plots ON resistance (R ON ) of pass gate as input voltage is swept from Gnd to V DD, assuming the output voltage closely follows. The effective ON resistance is a parallel combination of two resistances and is relatively constant across full range of input voltages. However the OFF resistance is very high and it is in the range of several mega ohms.

58 Figure 3.2 ON resistance of pass gate (Weste 2005) The sleep transistor concept used for dynamic circuits in (Kursun 2004) was adapted and modified to work for leakage reduction in static CMOS complementary circuits. A combination of high V th and standard V th sleep transistors are used in implementation (Lakshmikanthan 2006), to provide a well balanced trade-off between high speed and leakage loss. The proposed technique facilitates in the creation of an ultra low power standard cell library, using sleep-circuitry embedded components. Figure 3.3 illustrates the topology of a generic CMOS complementary circuit with Sleepyinputs, input 1, input 2...input n, feeding the Pull-Up Network (PUN) as well as the Pull-Down Network (PDN). The transistors in both the PUN and PDN are standard V th devices. The sleep circuitry consists of two transistors, one PMOS device S1 and one NMOS device S2. Transistors S1 and S2 are high

59 V th devices. The sleep transistors S1 and S2 are connected in parallel to form pass gate (transmission gate) configuration between PUN and PDN as shown in Figure 3.3 feed the gates of S1 and S2 respectively. The CMOS circuit output can be drawn either between PUN and sleep circuit or between sleep circuit and PDN. Figure 3.3 Block diagram of generic Sleepy-pass gate CMOS circuit 3.2.2 Sleepy-pass Gate Operation The working of the Sleepy-pass gate CMOS circuit is discussed in this section. The sleep transistors S1 and S2 shown in Figure 3.3 are turned on during active mode and turned OFF during sleep mode. During the normal transistors S1 and S2 to turn ON and acts as a pass gate. The circuit behaves as a normal CMOS circuit without any hindrance from the sleep circuit. This can be seen from the DC characteristics obtained from HSPICE simulations.

60 Figure 3.4 shows a two input NAND gate with Sleepy-pass gate embedded. Figure 3.5 shows the DC characteristics of the NAND gate with the proposed method (the input A is fixed at 1 V and B is varied from 0 to 1 V). Figure 3.4 Two input NAND gate with Sleepy-pass gate structure Figure 3.5 DC characteristic of a two input NAND with Sleepy-pass gate

61 The ON resistance of the pass gate will be constant and lesser than its OFF resistance, allowing conduction between PUN and PDN. Even though the ON resistance of pass gate is not as high as its OFF state resistance, it increases the resistance of V DD to ground path, controlling the flow of leakage currents, resulting in leakage power reduction in active mode. In standby S2 to turn OFF forcing a high-impedance condition between PUN and PDN nodes. Thus, the introduction of Sleepy- pass gate increases the resistance of the path from V DD to ground during standby mode of operation resulting in reduction of leakage current. The leakage reduction of the Sleepy-pass gate structure occurs in two ways. First, leakage power is suppressed by high V th transistors, which are applied to the sleep transistors and the transistors parallel to the sleep transistors. Second, increases the resistance of the path from V DD to ground during standby mode, which also suppresses leakage power consumption. By combining these two effects, the Sleepy-pass gate structure achieves ultra low leakage power consumption during sleep mode while retaining exact logic state. Figure 3.6 shows the input-output curves of the NAND gate with proposed method simulated for 100-nm technology at 1V supply voltage. It can be observed from the curves that the proposed NAND gate produces exact output logic levels. For any given process technology, the standard V th transistors are unit-sized devices (the smallest width to length W/L ratio as defined by the technology). However, the high V th transistors S1 and S2 needs to be sized appropriately for the Sleepy-pass gate embedded CMOS cells to have a propagation delay comparable to that of the standard cells.

62 Figure 3.6 Input - output waveforms of a two input NAND with proposed method There is a nominal increase in both area and propagation delay of the Sleepy-pass gate embedded circuit, when compared to the standard CMOS circuits. This overhead of Sleepy-pass gate embedded cells is tradedoff against enormous power savings, when compared to the standard CMOS cells. In addition to that, output logic state is not lost when the circuit enters from mode to sleep mode and vice-versa. This seems attractive in comparison with some of the existing ways to use far lower V DD values and additional transistors to maintain logic state. Flautner (2002) also propose that some reduced V DD values sufficient to maintain the logic state. As an alternative option, the Sleepy-pass gate could be used between the supply voltage V DD and PUN and/or PDN and ground during active mode of operation. This will create virtual supply and/or ground rails. During standby mode, all sleep transistors are off, thus the actual power and

63 ground path are broken and the circuit experiences lower voltage. A very high resistance path is established between V DD and ground due to the parallel combination of the off resistance of sleep transistors and the leakage current flowing through the circuit reduces significantly and hence lowest power dissipation. 3.3 EVALUATION OF SLEEPY-PASS GATE TECHNIQUE The Sleepy-pass gate technique is evaluated by applying it for logic circuits, benchmark circuits and comparing with other existing well known techniques with HSPICE simulation/experimental setup shown in Figure 3.7. Figure 3.7 Experimental setup 3.3.1 Simulation Setup In order to compare the results of the Sleepy-pass gate method with the base case, an experiment was carried out with a set of combinational logic gates. The schematics are designed for all the mentioned techniques using extracted from the schematics are modified with respect to the Berkeley

64 Predictive Technology Models. The modified netlists are simulated using Synopsys HSPICE for power and delay measurements. 3.3.2 Applying Sleepy-pass Gate for Logic Circuits Various circuit applications of the Sleepy-pass gate technique are explored. The generic logic circuits including inverter, NAND2, NOR2, AND2, OR2, multiplexer and full adder gates are implemented using state empirical saving as well as state-destructive low leakage techniques for evaluation. Detailed experimental methodology is explained. All circuits were simulated at a temperature of 25 C. Standard combinational CMOS library cells, such as NOR2, NAND2, OR2, AND2, XOR2, XNOR2 and MUX2x1, were implemented (Sahni 2006) using and modified accordingly for respective process technologies. Transistor sizes in all these circuits were fixed, and taken as PMOS width W P = 2 NMOS width W N = 1 L=100nm. A supply voltage (V DD ) of 1V was used and transient analysis performed on all 7 cells listed above, using HSPICE. The output load for each of the 7 cells was a capacitor of 1pF. 3.3.2.1 Simulation results for logic gates The total area of each standard cell in CMOS base case and proposed method is listed in Table 3.1. There is a slight increase in area for the proposed technique compared with base case due to additional transistors. Figure 3.8 and Figure 3.9 shows the layout for basic NAND and NOR gates for example. The propagation delay of each cell was measured for the purpose of comparing between base case and Sleepy-pass gate embedded cells. Next, the circuits were simulated at a temperature of 25 o C and their leakage power measured. All possible input combinations were applied and leakage power loss measured in every case. Column 2 of Table 3.2 lists the average leakage power loss for each standard CMOS cell.

65 Figure 3.8 Two input CMOS NAND gate layout Next, the Sleepy-pass gate circuitry was introduced for all 7 standard CMOS cells. The sleep transistors S1 and S2 are unit sized as W S1 = W S2 100nm process technology. For each cell, transient analysis was performed in the normal mode of operatio _ propagation delays were calculated and compared to the standard circuit values as shown in Figure 3.10. Figure 3.9 Two input NOR gate layout

66 Figure 3.10 Propagation delay comparison @100nm with V DD =1 V The high V th sleep transistor(s) were sized such that the propagation delay of the Sleepy-pass gate cell was comparable to that of the standard cell. The Figure 3.10 shows that there is an increased delay for proposed technique compared to base case due to additional transistors. Finally, the Sleepy-pass gate embedded cell was simulated in the sleep (standby) mode of operation _ 3 of Table 3.2 lists the leakage power loss for all the Sleepy-pass gate embedded standard cells. Column 4 of Table 3.2 gives folds of leakage power savings on using the Sleepy-pass gate on combinational cells. Table 3.1 Area measurements for combinational cells @ 100nm process Area (µm 2 ) CMOS Gate Base Case Sleepy-pass Gate 2 input NAND 36.290 40.188 2 input NOR 42.835 46.456 2 input XOR 213.430 217.460 2 input AND 50.246 64.843 2 input OR 61.278 78.129 2 input MUX 202.109 218.213 1-bit Full Adder 623.765 651.340

67 Table 3.2 Leakage power for combinational cells @ 100nm process Average Leakage Power (W) with V DD = 1 Volt CMOS Gate Base Case Sleepy-pass Gate Leakage Savings 2 input NAND 5.1700 E-08 2.8710 E-10 180 X 2 input NOR 5.4830 E-08 3.1959 E-10 170 X 2 input XOR 12.3677 E-08 8.2780 E-10 148 X 2 input AND 6.3950 E-08 4.8477 E-10 140 X 2 input OR 6.9464 E-08 5.2567 E-10 131 X 2 input MUX 6.3748 E-07 7.3758 E-09 85 X 1-bit Full Adder 8.9000 E-07 3.5600 E-09 249 X 3.3.2.2 Increase in dynamic power dissipation The main emphasis till now has been on the standby (sleep) mode leakage power loss of the Sleepy-pass gate embedded cells. The dynamic power loss of these circuits has not been explored as yet. The dynamic power dissipation depends mainly on transient switching activity and frequency of operation, as well as on the square of the supply voltage. In this section, the effect of the additional sleep circuitry components on dynamic power dissipation of standard cells is studied. The combinational standard library cells were used, and their dynamic power measured. Table 3.3 gives the dynamic power dissipation comparison between standard CMOS cells (base case) and Sleepy-pass gate embedded cells. Column 2 of Table 3.3 gives the dynamic power loss of standard CMOS cells. Column 3 of Table 3.3 lists the dynamic power dissipation of the Sleepy-pass gate embedded cells. Analysis of results in Table 3.3 shows that the dynamic power penalty (increase) of the Sleepy-pass gate embedded, when compared to that of standard cells is due to the additional transistors introduced and the consequent capacitive increase in the sleep-embedded circuits.

68 Table 3.3 Dynamic power for combinational cells @100nm process Dynamic Power ( E -06 W) @ V DD =1V Combinational Cells Base Case Sleepy-pass Gate 2 input NAND 7.7644 8.1902 2 input NOR 7.8190 8.2811 2 input XOR 12.0038 13.7445 2 input AND 9.0923 11.7016 2 input OR 8.9202 11.7391 2 input MUX 11.4912 12.0230 1-bit Full Adder 73.0032 90.1091 The literature detailing various methods to reduce dynamic power has been analyzed and can be summarized as follows: Clock and Signal Gating This is the simplest and most straight forward method to reduce transient switching activity of the highly active nodes in a circuit. Control signal gating techniques, like those presented by Kapadia (1999), target reduction in switching power. Operand Isolation Techniques The input sharing problem is typically the cause of unnecessary switching activity in modules where there should be none. Consider a simple Arithmetic and Logic Unit (ALU) designed for 4 operations (add, subtract, multiply and shift), all sharing 2 input signals - shifter units are simultaneously active along with the subtractor, thereby

69 wasting power. Operand isolation techniques, like using multiplexers or using multiple registers to drive different modules, solve the input-sharing problem. However, this increases the area and the delay, and adds other overheads. Transistor Re-ordering Techniques Hossain (1996) used a probability based transistor re-ordering technique to reduce dynamic power dissipation in CMOS circuits. Circuits Comprised of Independent Voltage Islands Lackey (2002) presented a comprehensive background on methods used to design voltage islands. They present various voltage island scenarios, a system architecture and chip implementation methodology, which are used to reduce active and static power consumption in SOC designs. The design implications of voltage islands are also evaluated. Carballo (2003) proposed a semi-custom voltage island approach to build high-speed serial links. Their approach is a mixture of selective custom design and the transparent use of multiple supplies to reduce power. The digital circuitry on the chip runs at a low supply voltage, while the analog circuitry runs at a higher voltage level. An on-chip regulator converts low to high voltage, and vice-versa. MTCMOS transistors are used in the custom design process. Hillman (2005) focused on minimizing the operating voltage to reduce dynamic power. The library of components created was characterized for different voltages. Next, the whole SOC design was built with various components from this library, using voltage level-shifting circuits and voltage isolation cells.

70 Hung (2005) presented a voltage island partitioning and floor planning algorithm for architecting SOC designs. Their work explores the thermal impact of voltage islands. A hybrid optimization approach consisting of a genetic algorithm based (GA-based) voltage island partitioning algorithm and a simulated annealing based (SA-based) floor-planning algorithm, is presented. 3.3.3 Applying Sleepy-pass Gate for Benchmark Circuits The ISCAS'85 benchmark circuits are ten combinational networks provided to authors at the 1985 International Symposium on Circuits and Systems. They subsequently have been used by many researchers as a basis for comparing results in several areas of digital design, including test generation, timing analysis, and technology mapping. Experiments were conducted on a variety of combinational multi implemented using various deep submicron process technologies. The HSPICE simulator, in conjunction with the BPTM deep submicron technology, was used to simulate circuits and to estimate leakage power dissipation. All circuits (unless specified otherwise) were simulated at a temperature of 25 o C. The Berkeley Predictive Technology Models (BPTM) contained process parameters and values only for standard V th PMOS and NMOS transistors. No models are available for high V th transistors. Except the Sleepy-pass gate transistors, the width for all other transistors are taken as W p = 3 µm and W n =1µm for PMOS and NMOS respectively. Experiments using some proprietary technology models obtained directly from foundries showed an interesting trend in the threshold voltage

71 value of high V th transistors. For a variety of deep-submicron technologies, we observed that the threshold voltage value of a high V th PMOS or a high V th NMOS transistor was 25%-35% more than that of a standard V th transistor. Hence, models for high V th PMOS and NMOS transistors were incorporated into BPTM with threshold voltage values 25% more than that of standard V th transistors. DC simulations were run using HSPICE to ensure that the threshold values of these high V th transistors were only 25% more than those of standard V th transistors. Table 3.4 and Table 3.5 lists the supply and threshold voltage values for various BPTM models for PMOS and NMOS transistors respectively. The first columns in Tables 3.4 and 3.5 list the technology feature size. The supply voltage used for each feature size is listed in Column 2 of both Table 3.4 and Table 3.5. Column 3 of Table 3.4 gives the threshold voltage of a standard PMOS transistor, while Column 3 of Table 3.5 gives the threshold voltage of a standard NMOS transistor. The threshold voltage of a high V th PMOS transistor is listed in Column 4 of Table 3.4 and the threshold voltage of a high V th NMOS transistor is listed in Column 4 of Table 3.5. Table 3.4 PMOS threshold voltage for BPTM models BPTM Process V DD PMOS Standard V th PMOS High V th 180nm 1.8V -0.42V -0.35V 130nm 1.3V -0.35V -0.32V 100nm 1.0V -0.30V -0.28V 70nm 0.85V -0.22V -0.18V

72 Table 3.5 NMOS threshold voltage for BPTM models BPTM Process V DD NMOS Standard V th NMOS High V th 180nm 1.8V 0.41V 0.55V 130nm 1.3V 0.33V 0.38V 100nm 1.0V 0.26V 0.34V 70nm 0.85V 0.21V 0.39V 3.3.3.1 85 leakage values Ten experimental characteristics given in Table 3.6 with Sleepy-pass gate embedded. They were sized appropriately for 4 different deep-submicron technologies - 180 nm, 130 nm, 100nm and 70 nm. The PMOS and NMOS are sized with width to length ratio as W/L = 6 and W/L = 3 respectively except for the Sleepy-pass gates. Table 3.6 ISCAS '85 benchmark circuit characteristics Circuit Input Output Circuit Function Total Gates Name Lines Lines C432 Priority Decoder 160(18 XOR) 36 7 C499 32-Bit Single-Error- Correcting Circuit 202(104 XOR) 41 32 C880 ALU and Control 383 60 26 C1355 C1908 32-Bit Single-Error- Correcting Circuit 16-bit error detector/ corrector 546 41 32 880 33 25 C2670 ALU and Control 1193 233 140 C3540 ALU and Control 1669 50 22 C5315 ALU and Selector 2307 178 123 C6288 16-bit Multiplier 2406 32 32 C7552 ALU and Control 3512 207 108

73 The circuit C7552, containing approximately 3512 gates, is the largest design among all the benchmarks chosen, while circuit C432 is the smallest circuit with 160 gates. The supply voltages for the respective technologies are given in Column 2 of Table 3.4 and Table 3.5 for PMOS and NMOS respectively. Simulations were carried out, using HSPICE in the standby mode of operation, and their leakage loss measured. Since exhaustive testing for many of the benchmarks was impossible, a representative sample of randomly generated input vector combinations was applied to each of the circuits, and leakage loss was measured in every case. The average leakage power dissipation values are listed Table 3.7. Table 3.6 give the leakage values of the various benchmarks implemented using the 180 nm BPTM. Similarly, Column 3 give leakage values of the benchmarks for the 130 nm BPTM; Column 4 give leakage values of the benchmarks for the 100nm BPTM; and Column give leakage values of the benchmarks for the 70 nm BPTM. Table 3.7 R method rk circuits with proposed Circuits Leakage Power ( nw) 180-nm 130-nm 100-nm 70-nm C432 8.385 3.816 1.034 0.374 C499 11.285 4.230 1.322 0.578 C880 22.127 8.503 1.673 1.376 C1355 35.634 13.214 3.428 1.245 C1908 52.460 19.409 5.160 2.736 C2670 73.519 28.016 5.916 3.178 C3540 102.835 38.285 10.271 4.119 C5315 147.277 56.291 13.157 7.432 C6288 157.268 60.172 14.049 7.163 C7552 213.497 78.642 21.580 11.039

74 3.3.4 Prior Low Leakage Techniques Considered for Comparison Purposes The Sleepy-pass gate technique is compared to a conventional CMOS approach, which is the base case, and four other well-known previous approaches, i.e., the forced stack, sleep, zigzag and LECTOR techniques. The four bit adder circuit is chosen for implementation and comparison. 3.3.4.1 Four bit adder By use of the one bit full adder shown in Figure 3.11, four bit adder shown in Figure 3.12 is implemented. A full adder is an example of a typical complex CMOS gate. In Figure 3.11, a and b are two inputs and c is a carry input. Carry and Sum are outputs. The transistor sizing of the full adder is noted in Figure 3.11. Figure 3.11 One bit full adder

75 Figure 3.12 Inputs of four bit adder 3.3.4.2 Base case CMOS technique has shown in Figure 3.13 and described in a classic textbook by Weste (2005). Figure 3.13 shows a pull-up network and a pulldown network using as few transistors as possible to implement the Boolean logic function desired. The base case of a four bit adder is sized as explained in Section 3.3.4.1. Figure 3.13 Base case

76 3.3.4.3 Forced stack technique Figure 3.14 shows the forced stack technique, which forces a stack structure by breaking down an existing transistor into two half size transistors. When the forced stack technique is applied, each existing transistor is replaced with two half sized transistors as shown in Figure 3.14. Figure 3.14 Forced stack 3.3.4.4 Sleep transistor technique The sleep transistor technique shown in Figure 3.15 uses sleep transistors between both V DD and the pull-up network as well as between Gnd and the pull-down network. Generally, the width/length (W/L) ratio is sized based on a trade-off between area, leakage reduction and delay. For simplicity, the sleep transistor is sized to the size of the largest transistor in the network (pull-up or pull-down) connected to the sleep transistor. The PMOS and NMOS sleep transistors have W/L = 6 and W/L = 3, respectively.

77 Figure 3.15 Sleep technique 3.3.4.5 Zigzag The zigzag technique in Figure 3.16 uses one sleep transistor in each logic stage either in the pull-up or pull-down network according a particular input pattern. In this thesis, an input vector is used that can achieve the lowest possible leakage power consumption. Then, either assign a sleep transistor to the pull- it is assigned to the pull-. For Figure 3.16, it is assumed that the minimum leakage inputs are asserted. Therefore, a pull-down sleep transistor is assigned for the first stage and a pull-up sleep transistor for the second stage. Similar to the sleep transistor technique, the size of the sleep transistors are sized to that of the largest transistor in the network (pull-up or pull-down) connected to the sleep transistor. The PMOS and NMOS sleep transistors have W/L = 6 and W/L = 3, respectively.

78 Figure 3.16 Zigzag 3.3.4.6 LECTOR LECTOR is an adaptation of the technique of effective stacking of transistors in order to reduce leakage power. Figure 3.17 shows the generic block diagram of a LECTOR CMOS circuit. Two Leakage Control Transistors (LTs), LT1 and LT2, are introduced between PUN and PDN. These LTs act as self-controlled stacked transistors. The LECTOR structure is shown in the Figure 3.17 where the LTs are unit sized with the ratio W/L = 6. Figure 3.17 Generic block diagram of LECTOR

79 3.3.5 Experiments on Sleepy-pass Gate 3.3.5.1 Delay The worst case propagation delay of each benchmark is measured. Input vectors and input and output triggers are chosen to measure delay across trigger input edge reaching 50% of the supply voltage value and the circuit output edge reaching 50% of the supply voltage value. Input waveforms have a 4ns period (i.e., a 250 MHz rate) and rise and fall times of 100ps. 3.3.5.2 Active power Active power is measured by asserting semi-random input vectors and calculating the average power dissipation during this time. Input vectors are chosen so that a large number of possible input combinations are included in the set. The average power dissipation reported by HSPICE is taken as the estimate of active power consumption. This active power includes dynamic power as well as static power during the time measured. However, static power consumption is subtracted to calculate pure dynamic power consumption. All sleep transistors are turned on when active power is measured for the sleep, zigzag and Sleepy-pass gate techniques. For the four bit adder, input vectors covering every possible input are asserted. The waveform in Figure 3.18 shows input vectors asserted for each one bit adder, where the input vector changes in every 4ns. The same signal timing is used while scaling technology from 0.18 signal timing is not customized )

80 because in this way the effect of technology scaling on a fixed clock can be observed. However, it is known that reducing cycle time along with technology feature size is possible and may reveal additional insights and tradeoffs. 3.3.5.3 Static power HSPICE is also used to measure static power consumption. Since static power varies according to input state, either a full combination of input vectors or subset of possible input combinations is considered. When static power is measured, first an input vector is asserted and power consumption is measured after signals become stable (e.g., after 30ns). Each measured static power consumption over 30ns is averaged to derive static power consumption of each circuit. Figure 3.18 One bit adder input-output signals for dynamic power measurement For the four bit adder, all possible input vectors of a full adder are considered for leakage power measurement. The sleep transistors of the sleep,

81 zigzag and Sleepy-pass gate techniques are turned OFF during sleep mode in which the leakage power dissipation is measured. 3.3.5.4 Area particular design style (e.g., base case) is measured using layout. For a four bit adder, an actual full layout of adder is directly measured and estimated scaling the area of each benchmark layout for each particular design style Around 10% area overhead is added in order to consider non-linear 2 2 x (0.130/0.180) x 1.1 is estimated. m process, the area for To estimate technologies, extra area needed to wire gates is not taken into account but the absence of a wiring penalty equally affects all techniques considered (i.e, base case, sleep, forced stack, zigzag, LECTOR and Sleepy-pass gate). Figure 3.19 shows the layout of a full adder.

Figure 3.19 Layout of full adder 82

83 3.3.6 Comparative Results First the Sleepy-pass gate is compared to the base case and LECTOR techniques (self triggered sleep circuitry) in terms of leakage power and delay using a two input NAND gate at temperature 25 o C. Secondly with well-known techniques, i.e., sleep, zigzag, and forced stack, in terms of active power, leakage power, area and delay using four bit full adder circuit explained in Section 3.3.4.1. 3.3.6.1 Results of two input NAND gate Tables 3.8, 3.9 and 3.10 gives the leakage power, savings and delay penalty comparison for the base case, LECTOR and Sleepy-pass gate embedded NAND gates respectively. For a fair comparison, the supply voltage was set to 1V for the 100nm. In Table 3.8 the values reported in rows 3 list the leakage power values for a base case NAND gate, using BPTM 100nm. In Table 3.8 row 4 list the leakage values for the LECTOR NAND gate, using BPTM 100nm. Table 3.8 row 5 gives the leakage (in standby mode) values for the Sleepy-pass gate embedded NAND gate, using BPTM 100nm. Column 2 to column 5 of Table 3.8 indicates the corresponding input vectors applied to the gates. Analysis of the results in Table 3.8 shows that the leakage power is dependent on input vector applied to the circuits. (Abdollahi 2004). Column 2 and column 3 of Table 3.9 presents the average leakage power and leakage savings obtained for base case, LECTOR and Sleepy-pass gate NAND gates. It shows that the Sleepy-pass gate technique has the least leakage power dissipation and largest leakage savings of 180% when compared to the conventional CMOS NAND gate. Column 2 and column 3 of Table 3.10 gives the delay obtained by introducing the additional transistors and the corresponding delay penalty respectively. It shows that the

84 conventional NAND gate has the least propagation delay value compared to LECTOR and Sleepy-pass gate techniques. Also it is seen that the Sleepy-pass gate technique has less leakage power and reduced delay penalty than LECTOR at 100nm from Figure 3.20 and Figure 3.21. Table 3.8 Leakage power comparison for two input NAND gate 100nm Process Technology with V DD = 1 Volt Leakage Power(W) for Input Vectors Method 00 01 10 11 Base CMOS 4.7335E-09 4.1158E-08 4.2515E-08 1.1841E-07 Lector 6.6976E-09 4.6137E-09 4.1881E-09 3.3269E-09 Sleepy-pass gate 3.8344E-10 3.7341E-10 3.5013E-09 3.8653E-10 Table 3.9 Leakage power savings for two input NAND gate 100nm Process Technology with V DD = 1 Volt Method Average Leakage (W) Avgerage Leakage Savings Base CMOS 5.170E-08 - Lector 4.7065E-09 10.98 X Sleepy-pass gate 2.8710E-10 180 X Table 3.10 Delay penalty for two input NAND gate 100nm Process Technology with V DD = 1 Volt Method Delay (s) Delay Penalty% Base CMOS 1.3E-10 - Lector 1.8E-10 38.46 Sleepy-pass gate 1.5E-10 15.38

85 Figure 3.20 Average leakage power for two input NAND gate Figure 3.21 Propagation delay for two input NAND gate 3.3.6.2 Results of four bit full adder The impact of technology scaling is explored from the results of a four bit full adder in terms of static power, propagation delay, dynamic power

86 and area as tabulated in Table 3.11 to Table 3.14. Table 3.11 gives the static power dissipation for 180nm, 130nm, 100nm and 70nm process technologies. Table 3.11 shows that the Sleepy-pass gate achieves large leakage reduction over the base case and the other compared leakage reduction techniques. From Figures 3.22, 3.23, 3.24 and 3.25 it is observed that static power increases as technology feature size shrinks. Table 3.12 gives the propagation delay of a four bit full adder implemented in base case as well as other leakage reduction techniques including Sleepy-pass gate at 180nm, 130nm, 100nm and 70nm process technologies. From Figure 3.26 it is observed that propagation delay decreases as technology feature size shrinks. Compared to the base case, Sleepy-pass gate has increasing delay and smaller when compared with LECTOR technique. Table 3.11 Static power dissipation for various process technologies Static Power (W) of Four Bit Full Adder 4-bit adder 180nm 130nm 100nm 70nm Base case 9.39E-10 9.34E-09 9.57E-08 9.31E-07 Forced stack 9.47E-11 8.36E-10 8.29E-09 7.28E-08 Sleep 7.28E-10 5.20E-09 6.40E-08 6.27E-07 Zigzag 4.28E-10 1.82E-09 4.09E-08 4.89E-08 LECTOR 8.11E-11 9.31E-11 9.87E-09 1.62E-08 Sleepy-pass gate 6.13E-11 1.52E-11 7.21E-09 2.84E-09

87 Table 3.12 Propogation delay for various process technologies Propagation Delay (s) of Four Bit Full Adder 4-bit Adder 180nm 130nm 100nm 70nm Base case 7.21E-10 4.45E-10 3.71E-10 3.42E-10 Forced stack 1.71E-09 1.42E-09 1.03E-09 6.21E-10 Sleep 1.12E-09 6.10E-10 5.35E-10 4.98E-10 Zigzag 1.12E-09 6.10E-10 5.35E-10 4.98E-10 LECTOR 1.39E-09 1.15E-09 7.85E-10 7.51E-10 Sleepy-pass gate 1.09E-09 9.08E-10 7.71E-10 7.28E-10 Table 3.13 gives the dynamic power dissipation of a four bit full adder at 180nm, 130nm, 100nm and 70nm process technologies. It is observed from Figure 3.27 that dynamic power decreases as technology feature size shrinks. Sleepy-pass gate has increased dynamic power dissipation compared to base case due to the additional sleep transistors used as sleep circuitry. Sleepy-pass gate has reduced dynamic power dissipation compared to zigzag and LECTOR techniques. From Table 3.14 and Figure 3.28, Sleepy-pass gate technique has increased area when compared to base case and decreases as technology feature size shrinks. Finally, compared to other techniques, the overhead of increased delay and area are judged to be worth. Therefore, our Sleepy-pass gate approach can be used where state preservation and ultra low leakage power consumption is needed and is judged to be worth the area overhead.

88 Table 3.13 Dynamic power dissipation for various process technologies Dynamic Power Dissipation (W) of Four Bit Full Adder 4-bit Adder 180nm 130nm 100nm 70nm Base case 4.81E-04 1.20E-04 3.82E-05 1.86E-05 Forced stack 5.70E-04 1.42E-04 4.38E-05 2.31E-05 Sleep 5.53E-04 1.37E-04 4.25E-05 2.21E-05 Zigzag 6.54E-04 1.67E-04 5.18E-05 2.83E-05 LECTOR 5.74E-04 1.43E-04 4.42E-05 2.93E-05 Sleepy-pass gate 5.66E-04 1.41E-04 4.35E-05 2.82E-05 Table 3.14 Area measured for various process technologies Area (µm 2 )of Four Bit Full Adder 4-bit Adder 180nm 130nm 100nm 70nm Base case 59.54 43.00 33.07 23.15 Forced stack 77.35 55.86 42.97 30.08 Sleep 74.15 53.55 41.18 28.8 Zigzag 69.89 50.47 38.83 27.18 LECTOR 75.13 54.26 41.73 29.2 Sleepy-pass gate 74.25 53.62 41.25 28.8

89 Figure 3.22 Static power for four bit full adder at 180nm Figure 3.23 Static power for four bit full adder at 130nm

90 Figure 3.24 Static power for four bit full adder at 100nm Figure 3.25 Static power for four bit full adder at 70nm

91 Figure 3.26 Propagation delay for four bit full adder at various process technologies Figure 3.27 Dynamic power dissipation for four bit full adder at various process technologies

92 Figure 3.28 Area measured for four bit full adder at various process technologies 3.4 SUMMARY In this chapter, the Sleepy-pass gate structure was introduced and its operation for leakage power reduction is explained. The Sleepy-pass gate technique can achieve smaller transistor delay and larger leakage power savings than the other existing leakage reduction techniques compared. Scaling down of the CMOS technology feature size and threshold voltage for achieving high performance has resulted in increase of leakage power dissipation. This chapter presented an efficient methodology for reducing leakage power in CMOS VLSI design. Throughout logic design, the proposed method could be used to reduce the static power of CMOS circuits. Some of the implications in implementing this technique are as follows: Minimal additional circuitry is used to modify the original logic design to force the circuit into a low leakage state during standby mode of operation which can be a major implication in implementing this technique.

93 This technique requires a controller / power management system to automatically generate sleep signals during standby mode and also to activate it when necessary. There is a tradeoff between area, delay and power by utilizing this technique for power savings. DT-LECTOR is proposed in next chapter which could be used in applications that demand high speed and where sleep signal control circuitry is not available.