Dual-Threshold Voltage Assignment with Transistor Sizing for Low Power CMOS Circuits

390 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 2, APRIL 2001 Dual-Threshold Voltage Assignment with Transistor Sizing for Low Power CMOS Circuits TABLE I RESULTS FOR BENCHMARK CIRCUITS Pankaj Pant, Rabindra K. Roy, and Abhijit Chatterjee Abstract We demonstrate a novel algorithm for assigning the threshold voltage to the gates in a digital random logic complementary metal oxide semiconductor (CMOS) circuit for a dual-threshold voltage process. The tradeoff between static and dynamic power consumption has been explored. When used along with device sizing and supply voltage reduction techniques for low power, the proposed algorithm can reduce the total power dissipation of a circuit by as much as 50%. I. INTRODUCTION With the rapid growth of the portable electronics market in the last few years, the emphasis in VLSI design is shifting from high speed to low power. Portable applications like wireless communication and imaging systems (digital diaries, smart cards) demand high-speed computations, complex functionalities, and often real-time processing capabilities along with low power consumption. Traditional approaches to minimizing the power consumption of static complementary metal oxide semiconductor (CMOS) logic networks have advocated straightforward reduction in the supply voltage. Since the dynamic power is proportional to the square of the supply voltage, this is the most effective technique for power reduction. The resulting increase in delay can be effectively compensated through increased data-path parallelism in special-purpose signal-processing applications and by careful transistor sizing [1], [2]. In some situations it has been recommended that the threshold voltage of the transistors be reduced to improve the circuit performance [3], [4]. However, the static power component of the power dissipation has an inverse exponential dependency on the threshold voltage. This implies that reducing the threshold voltage could cause a significant increase in the static power component. Methodologies for minimizing the sum total of static and dynamic energy consumption in general-purpose CMOS circuits have been proposed in [5] [7]. Total power is minimized through careful selection of supply and threshold voltage values and device sizes such that the leakage and switching components of the dissipation are equal. The use of dual-threshold voltages for power reduction has been examined in [8] and [9]. In both the methods, all the transistors in the circuit are initially set to have a low threshold voltage. Subsequently, using the algorithms developed, the threshold voltage of some of the gates that do not lie on critical paths is increased. The leakage power can be reduced by up to 50% without affecting the performance of the circuit. However, the authors have not considered the effects of such a dual-threshold voltage selection scheme on the dynamic power dissipation of a given circuit. In this paper, we develop algorithms that target the reduction of the total power dissipation of the circuit. II. PROBLEM DESCRIPTION We illustrate a novel algorithm for selecting the threshold voltage of the gates of a random logic circuit from a choice of two predefined Manuscript received February 17, 1999; revised June 28, 2000. This work was supported by NSF under Grant MIP-9502575 at the Georgia Institute of Technology. P. Pant is with Compaq Computer Corporation, Shrewsbury, MA 01545 USA. R. K. Roy is with Mobilian Corporation, Hillsboro, OR USA. A. Chatterjee is with the Georgia Institute of Technology, Atlanta, GA 30332 USA. Publisher Item Identifier S 1063-8210(01)00804-6. voltage values. Unlike previous approaches [5], [8], we assume that the values of the threshold voltages are specified by the process technology and cannot be variables in the optimization process. This is usually true since the threshold voltage is very difficult to control accurately and the designers are not at the liberty of selecting a value that is not supported by the process. The goal of the selection process is to enable a subsequent power optimization of the circuit to reduce the dynamic component of the power dissipation. Henceforth, the low and high threshold voltage values will be called VtL and VtH, respectively, and the corresponding gates will be referred to as low-vt and high-vt gates, respectively. Since the threshold voltage of a gate is restricted to one of two possible choices, the optimization becomes a constrained 0 1 programming problem. Any optimal solution, such as the convex programming approach in [10], cannot be used to solve the problem because the constraint set is no longer convex. The complexity of the problem can be better understood with the aid of an example. Consider a circuit with N gates. There are 2 N possible assignments of the threshold voltages of the gates from the set fvtl; VtHg. To obtain the globally optimal solution, we would theoretically have to apply an optimal sizing algorithm (such as in [10]) for each of these configurations. In order to reduce the problem complexity, we have divided it into two parts. First, we heuristically assign the threshold voltages of the gates to either VtL or VtH. Then a power minimization algorithm is applied that selects the gate sizes and the supply voltage for minimal power operation. The maximum percentage of low-vt gates could be specified by the designer based on leakage power considerations and noise characterization studies. Our approach reduces the complexity of the solution by requiring only one application of a global optimization algorithm as opposed to 2 N applications. III. CIRCUIT MODELS Usually the delay models selected for estimating the delay of a gate assume that the signals at the inputs of the gate consist of ideal step waveforms. This is not realistic and practical waveforms will always have nonzero transition times. In general [11], there are two distinct 1063 8210/01$10.00 2001 IEEE

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 2, APRIL 2001 391 Fig. 1. Power versus K (V = 400 mv, V = 100 mv). contributions to the delay of an inverter. The first part is the intrinsic delay of the inverter under a step input and the second part is a correction factor that accounts for the transition time of the signal waveform at the input of the gate. It was demonstrated in [11] that this input-dependent factor could account for as much as 50% of the total delay. The delay of the inverter GINV for a nonstep (ramp) input (with rise time )is TINV = 1INV + ainv (1) where 1INV is the delay of GINV for a step input and the factor ainv indicates the extent to which the delay of the gate depends on the input slope [11]. Henceforth, we shall call it the slope factor of the gate. Equation (1) was extended for NAND (NOR) gates in [12] by constructing an equivalent inverter and using its slope factor. The input slope used is the rise time of the input that is in the path being considered. However, it should be noted that this approach may not accurately model the delay of the gate in all cases and could lead to suboptimal solutions when used in an optimization algorithm. It has been shown [12] that the slope of the output transition of a gate is linearly related to the delay of the gate. Hence, we can approximate the input slope in (1) by the delay of the gate driving the input. To calculate the step-input delay and the power dissipation of a static CMOS gate, we use the accurate analytical expressions described in [5]. IV. ALGORITHM The overall algorithm design is motivated by the following observation. Consider a critical path in a circuit which consists of high-vt gates only. Suppose that we arbitrarily convert one of the gates in the critical path to a low-v t gate. The delay of this gate decreases and this creates a certain amount of slack in the path delay, where the slack is defined as the difference between the specified clock period and the path delay. Now we can afford to slow down the path by reducing the size of the gates in the path or lower the supply voltage (or both) such that the path still meets the timing deadline. Both of these modifications lead to a reduction in the dynamic power dissipation which is usually the dominant component of the power dissipation. Hence, the total power dissipation is also lowered. This idea is formalized in algorithm Optimize (). Algorithm: Optimize () begin 8gates gi: gi:vt VtH S-I: Run power optimization step, i.e., select V dd and g i :W (8g i ) to minimize total power S-II: Heuristically select a subset (S) of the gates 8gates g i 2 S : g i :V t V tl S-III: Rerun power optimization step end All the gates in the circuit are initially set to the high threshold voltage VtH. The gates are then sized and a supply voltage V dd is selected such that the circuit meets the timing criteria and the power dissipation is minimized. After the power optimization step, the critical paths of the circuit are studied to extract a subset of the gates which are set to a low threshold voltage, V tl. The designer can specify the number of low-vt gates (K) desired. The circuit is subsequently re-optimized for low power operation.

392 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 2, APRIL 2001 Fig. 2. Power versus K (V = 400 mv, V = 200 mv). A. Circuit Partitioning for Dual-Threshold Voltage Clearly, the most significant part of the algorithm is the selection of the gates that are assigned a low threshold voltage. Given a critical path, if we were to select a single low-v t gate, then we should pick the gate that would lead to the largest reduction in the path delay. In other words, we should try to maximize the path delay slack that gets created. This would allow a significant reduction in the dynamic power component. In the rest of the section, we quantify which gate we consider to be the best candidate to have a low-v t in a given critical path. Consider a path P consisting of n gates g 1;g 2;...;g n numbered sequentially from the primary input to the primary output. The delay of the path is T P = T 1 + T 2 + 111 + T n, where the delay of g i is T i = 1 i + a it i01, as in (1). Here, 1 i is the delay of g i for a step input and a i is the slope factor of g i. As discussed in Section III, the input slope has been replaced by T i01, the delay of g i01. If the threshold voltage of the ith gate is changed, then the delays of all the gates occurring after g i in the path are affected, due to the dependence of the delays on the input slope. The change in the delay of g k (k>i), for a small change in the threshold voltage of g i,isgiven by @T k = @ [1 k + a k T k01] @V t @V t =(a k a k01... a i+1 ) @T i @V t (2) where the first term (a k a k01... a i+1 ) is a path dependent constant which represents the cascading effect of the change in the delay of g i on T k. Thus, if the delay of g i changes by 1T i, then the change in the delay of g k is 1T k =(a k a k01... a i+1 )1T i. The change in the delay of path P is then given as 1T P =1T 1 +1T 2 + 111+1T n =[1+a i+1 + a i+1a i+2 + 111+ a i+1a i+2...a n]1t i = A i 1T i (3) where A i is a constant indicating the sensitivity of the delay of the path P on the delay of g i. Suppose that g i and g i+1 are successive gates in P. We can readily verify that A i =(1+a i+1a i+1). Thus, for a given path, we can calculate the A s of the gates by starting from the output and proceeding toward the input. Now, let us revisit our original problem of selecting a single low-v t gate from a circuit which has all high-v t gates. Ideally, we would like to find a gate that lowers the delay of many critical paths. Our experiments indicate that the gate with the largest value of 1T i is usually one that has a number of paths passing through it. This is due to the fact that a gate with many fanin or fanout nets has to drive a large capacitive load and hence has a large delay. It can be shown that if a gate has a large delay then it also has a large value of 1T i. Hence, it follows that the gate with the largest value of the product A i 1T i is likely to be a very suitable candidate. This forms the basis of the selection procedure SelectLowV t Gates (). Procedure: SelectLowV t Gates () for K steps P 0 most critical path 8g i 2 P : calculate A i and 1T i Find g j :(A j 1T i ) is maximum in P and g j :V t 6= V tl g j :V t 0 V tl Recompute delay of all gates end for

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 2, APRIL 2001 393 From the most critical path, the gate that creates the maximum reduction in the delay is chosen and its threshold voltage is set to V tl. The updated delays of all the gates are then recalculated. This process is repeated K times, where K is the prespecified number of low-v t gates that we desire. Note that the same path might be selected in more than one iterations if the reduction in its delay was not enough to make it subcritical. This ensures that we reduce the critical delay of the circuit in each iteration. TABLE II COMPARISON WITH RANDOM EXPERIMENTS B. Optimization for Minimum Power In steps S-I and S-III of algorithm Optimize (), we need a circuit optimization procedure that would select the supply voltage and the gate sizes in such a way as to minimize the power consumption. In [5], the authors have demonstrated a heuristic algorithm to perform optimization for low-power operation of digital random logic CMOS circuits. This technique has been shown to be fast and performs well over a wide range of circuits. We have used a modified version of this algorithm that performs the optimization assuming that the threshold voltages of the gates are fixed and cannot be used as optimization variables. V. EXPERIMENTAL RESULTS We ran the optimization algorithm described in the previous sections on the combinational portions of several ISCAS89 benchmark circuits. Table I provides the results obtained by running our algorithm on the benchmark circuits. The values of the power supply voltage (V dd ), the average transistor width (w), and the total power (P opt) is tabulated for the single-v t optimization (all the gates are at the high-v t ) and the dual-v t optimization. The values of the high and low threshold voltages were set to 400 and 200 mv, respectively, and 30 low-v t gates were selected for the dual-v t solution. It can be seen that in some instances the total power reduces by more than 50%. The CPU times for the three steps in algorithm Optimize () are also provided. Most of the time is spent in the two optimization steps and the dual-threshold partitioning takes only a small fraction of the total time. To obtain solutions that were robust to noise and process fluctuations, the optimization was performed assuming a 5% clock skew and a 10% variation in the nominal value of the threshold voltages. The worst case values of the threshold voltages were assumed at all stages, i.e., maximum value (1:1 2 V t ) for delay computations and minimum value (0:9 2 V t ) for power calculations. All the optimizations were performed for a critical delay of 3.17 ns (corresponding to a 300-MHz clock with a 5% skew). Note that further constraints could be easily incorporated into the optimization. For instance, we could ensure that the V dd =V t ratio does not become smaller than a specified limit. Fig. 1 shows the effect of the number of low-v t gates, K, on the power savings achieved by our optimization algorithm. For this experiment, we have assumed that the values of V th and V tl, provided by the manufacturing process, are 400 and 100 mv, respectively. At this value of V tl there is a significant subthreshold current and after a point the increasing static dissipation starts to dominate the total power dissipation. It is clear that the power dissipation with all the gates set to the same threshold voltage is not necessarily the best solution. We see that in the best case we can achieve about 40% reduction in the power dissipation. It should be pointed out at this point that some of the curves in Fig. 1 are not smooth due to our use of a suboptimal algorithm [5] to perform the power optimization step. In Fig. 2, we plot the same variation for a different process which provides a low-v t of 200 mv. In this case, the static leakage power component is not a significant part of the total power consumption. Hence, the total power consumption tracks the dynamic component which, as discussed in Section IV, decreases with increasing K. For this situation, we can obtain up to 70% reduction in the total power consumption. However, after a point very little additional savings are obtained by increasing the number of low-v t gates. By judiciously selecting the value of K, even in this case, we could obtain a good solution without having a circuit with all low-v t gates. Note that it may not be desirable (due to noise considerations, etc.) to use the all low-v t solution. It is difficult to predict the optimal value of K. In practice the designer should incrementally increase K until the static power component becomes significant. Our experiments indicate that approximately 30 35% of the gates corresponds to a good tradeoff between the static and dynamic power components. For a given value of K, we would like to find the difference in the total power consumption of the circuit with our choice of low-v t gates and that of the circuit with the optimal choice of K low-v t gates. However, it is extremely difficult to obtain the optimal solution. For instance, let us consider s344 which is our smallest circuit with 207 gates. If we take K to be 10, there are 207 C 10 (>3 2 10 16 ) choices for the low-v t gate selection. It is obvious that we cannot enumerate all these possible combinations to find the best selection. In order to evaluate the quality of the results obtained by selecting the gates according to our heuristic, we compared the results against random experiments on the circuits. Instead of the heuristical dual-v t selection step, we selected a random set of K low-v t gates from the circuit. For each value of K, the random experiment was conducted 1000 times and the minimum, maximum and mean values of the total power

394 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 2, APRIL 2001 obtained over the 1000 experiments are presented in Table II for four of the benchmark circuits (similar trends were observed for the other circuits). We can see that our dual-threshold selection heuristic performs much better than a random selection of K gates. By repeating the random experiments a large number of times (1000), we attain a high level of confidence that our algorithms can provide a very fast and near-optimal solution, as opposed to randomized optimization algorithms like simulated annealing. VI. CONCLUSION We have demonstrated a new approach to low power optimization of digital static CMOS circuits for dual-threshold voltage manufacturing processes. The algorithms developed allow the designer to assign one of two threshold voltages to all the gates in the circuit. The assignment is performed in such a way that subsequent optimization for low power operation yields a significant reduction in the total power consumption of the circuit. Experiments were conducted on several ISCAS89 benchmark circuits and results indicate that significant improvement in power consumption, over single high-v t circuits, can be achieved. The algorithm is fast and typically completes in a few CPU seconds. REFERENCES [1] A. Chandrakasan and R. Brodersen, Minimizing power consumption in digital CMOS circuits, Proc. IEEE, vol. 83, pp. 498 523, Apr. 1995. [2] J. Cong and C.-K. Koh, Simultaneous driver and wire sizing for performance and power optimization, IEEE Trans. VLSI Syst., vol. 2, pp. 408 425, Dec. 1994. [3] D. Liu and C. Svensson, Trading speed for low power by choice of supply and threshold voltages, IEEE J. Solid-State Circuits, vol. 28, pp. 10 17, Jan. 1993. [4] Z. Chen and J. Plummer, Low threshold voltage quarter micron MOS- FETs for low power applications, in Proc. IEEE Symp. Low Power Electronics, 1995, pp. 78 79. [5] P. Pant, V. De, and A. Chatterjee, Simultaneous power supply, threshold voltage and transistor size optimization for low power operation of CMOS circuits, IEEE Trans. VLSI Syst., vol. 6, pp. 538 545, Dec. 1998. [6] R. Gonzalez, B. M. Gorden, and M. Horowitz, Supply and threshold voltage scaling for low power CMOS, IEEE J. Solid-State Circuits, vol. 32, pp. 1210 1216, Aug. 1997. [7] J. Burr and J. Shott, A 200 mv self-testing encoder-decoder circuit using stanford ultra low power CMOS, in Proc. Int. Solid-State Circuits Conf., Feb. 1994, pp. 84 85. [8] L. Wei, Z. Chen, M. Johnson, K. Roy, and V. De, Design and optimization of low voltage high performance dual threshold CMOS circuits, in Proc. Design Automation Conf., 1998, pp. 489 494. [9] Q. Wang and S. Vrudhula, Static power optimization of deep submicron CMOS circuits for dual v technology, in Proc. Int. Conf. Computer- Aided Design, 1998, pp. 490 494. [10] S. S. Sapatnekar, V. B. Rao, P. M. Vaidya, and S. M. Kang, An exact solution to the transistor sizing problem for CMOS circuits using convex optimization, IEEE Trans. Comput.-Aided Design, vol. 12, pp. 1621 1632, Nov. 1993. [11] N. Hendenstierna and K. Jeppson, CMOS circuit speed and buffer optimization, IEEE Trans. Comput.-Aided Design, vol. 6, pp. 270 281, Mar. 1987. [12] B. Hoppe, G. Neuendorf, D. Schmitt-Landsiedel, and W. Specks, Optimization of high-speed CMOS logic circuits with analytical models for signal delay, chip area and dynamic power dissipation, IEEE Trans. Computer-Aided Design, vol. 9, pp. 236 247, Mar. 1990. Low-Power CMOS with Subvolt Supply Voltages Mircea R. Stan Abstract We first present a circuit taxonomy along the space and time dimensions, which is useful for classifying generic low-power techniques, followed by an analysis of optimal power supply and threshold voltages and transistor sizing for minimizing the energy-delay product of a class of complementary metal oxide semiconductor (CMOS) digital circuits. Index Terms Digital-complementary metal oxide semiconductor (CMOS) VLSI, low-power design, low voltage, power consumption model. I. INTRODUCTION Power consumption in complementary metal oxide semiconductor (CMOS) has two components: ac (dynamic) power that varies with operating frequency and dc (static) power that is independent of frequency [1] [3]. The two major sources of dynamic power are the capacitive current for charging and discharging load capacitances and the short circuit (or overlap) current [4]. When the supply voltage is aggressively scaled down the percentage of short circuit power becomes smaller and tends to zero as V dd gets close to V th [5]. The two major sources of static power are the subthreshold current [6], [7] and the junction leakage current. In deep submicron technologies the junction leakage becomes negligible compared to the subthreshold current, but other leakage phenomena like gate oxide tunneling and gate induced drain leakage (GIDL) are likely to become important [8], [9]. Although recognized as an important method to reduce power [1], scaling the power supply voltage has been historically driven by reliability concerns (gate oxide breakdown voltage and leakage) and not by power reduction strategies. The SIA Technology Roadmap [10], [11] predicts a V dd =0:9V00:6Vin the year 2009 for a 70 nm technology, and a V dd =0:6V00:5Vin the year 2012 for a 50 nm technology. In what follows we show that a V dd as low as 0.8 V should be used for low-power circuits even with current 0.25 and 0.18 processes as it provides the optimum energy-delay product for the design. A. Figures of Merit for Low-Power Design The classic two-dimensional VLSI design space tries to minimize the circuit area (A) and delay (T) in order to reduce cost and improve performance, by using optimizations with objective functions such as A, AT, and AT 2 [12]. The new emphasis on low power adds a third dimension (power) to the previously two-dimensional design space [13], but, except for a few cases [14], most of the research in lowpower design is still two-dimensional with objective functions such as P (power), PT (energy), and PT 2 (energy-delay product). 1 The power P itself is a poor candidate for optimization as it can always be lowered trivially by reducing the clock frequency. The energy PT is an appropriate figure of merit for applications without stringent performance requirements, but, when performance is critical, the energy-delay product PT 2 is a good compromise between the need to reduce power while still operating at reasonable speed [15]. Manuscript received February 20, 1999; revised September 23, 1999. This work was supported in part by NSF CAREER Award MIP-9703440. The author is with the Electrical Engineering Department, University of Virginia, Charlottesville, VA 22903 USA (e-mail: mircea@virginia.edu). Publisher Item Identifier S 1063-8210(01)00699-0. 1 The notation PT for energy and PT for energy-delay product is used to underscore the replacement of area by power in the classic AT and AT figures of merit. 1063 8210/01$10.00 2001 IEEE