Active Decap Design Considerations for Optimal Supply Noise Reduction

Active Decap Design Considerations for Optimal Supply Noise Reduction Xiongfei Meng and Resve Saleh Dept. of ECE, University of British Columbia, 356 Main Mall, Vancouver, BC, V6T Z4, Canada E-mail: {xmeng, res}@ece.ubc.ca Abstract Active decoupling capacitors (decaps) are more effective than passive decaps at reducing local IR-drop problems in the power distribution network. In the basic active decap, two parallel decaps are stacked in series whenever the voltage drop exceeds a given budget to boost the local supply voltage. However, the effectiveness of stacking three or more decaps has not been previously explored. In this paper, we investigate these higher stack height configurations to assess the degree of improvement in supply noise reduction. Extensive simulation results correlated with a test chip indicate that an active decap with a stack height of three provides the best noise reduction if the supply noise level is between 7%-4%, but a stack height of two is best if the noise level is between 4%-6%. Keywords MOS integrated circuits, decoupling capacitors, power supply noise. Introduction The increase of clock frequency and on-chip current demand makes power grid design a challenging task. Decoupling capacitors (decaps) are generally used to reduce IR drop and Ldi/dt effects, and hence keep the power supply relatively constant. Starting from 90nm, simply placing passive decaps in the available open areas of the chip may not be sufficient []. Large power supply noise levels in localized regions (called hot spot IR-drop violations) may unexpectedly be present in high-speed applications. These unresolved hot spots cause timing closure problems or result in functional failures in extreme cases. To remove them, active decaps [][3][4] have been proposed for use as a dropin replacement of the passive decaps. The use of active decaps saves time and effort near the tapeout deadline, and therefore provides an attractive solution. The concept of the active decaps is to switch two passive decaps from parallel to series to achieve a local voltage boost on the supply rail. This paper aims to explore extensions of the basic design by increasing the number of parallel decaps (called stack height, n) that are switched into series. Increasing the stack height of decaps ideally raises the boosted voltage to a higher levels but, in practice, these higher levels cannot be reached. The non-idealities of the circuit limit the achievable boosted supply voltage. In fact, there exists a stack height in the active decap design that provides optimal supply noise reduction. The proper choice of the stack height for the active decap is the primary objective of this paper. Other associated design tradeoffs such as area and power will be also addressed.. Active decap concept and architecture.. Basic active decap configuration Figure : Active decap with n=. The basic idea of an active decap is to switch a pair of passive decaps, C decap, from parallel to series to provide a local boost in the supply voltage []. As illustrated in Fig. (a), the decaps are initially in a parallel configuration with a full charge developed across both capacitors. In this standby state, the equivalent capacitance is C decap. When placed in a series stack, as in Fig. (b), the boosted voltage is ideally V DD although the equivalent capacitance is reduced to C decap /. When switched back in parallel, the voltage returns to the original value of V DD. In this case, the stacking level n is. The active decap circuit implemented using NMOS and PMOS transistors is depicted in Fig. (c) []. When C decap s are in parallel, both Mn and Mp are on while Mn and Mp are off (i.e., subthreshold). When the capacitors are in series, both Mn and Mp are off while Mn and Mp are on. The switches exhibit finite on resistances, indicated as R on, and there is also thin-oxide gate leakage through the decaps, I leak, especially in 90nm and 65nm CMOS technologies. Both of these effects reduce the performance of the active decap, as described below. For the general case of stacking n parallel decaps in a series chain, the maximum improvement can be characterized in terms of a gain, G []. If k is the voltage regulation tolerance, where kv DD is the permissible drop in voltage, then the charge delivered by n parallel capacitors is: Qparallel = kvdd ncdecap () When the capacitors are stacked in series, the charge delivered for the same voltage drop is: Qseries = [ nvdd ( k) VDD ] Cdecap / n () The overall charge gain is: Q [ nvdd ( k) VDD ] Cdecap / n series (3) Q parallel kv DD nc decap Therefore, the gain is controlled by n and k []: 978--444-953-0/09/$5.00 009 IEEE 765 0th Int'l Symposium on Quality Electronic Design

n+ k G = (4) There exists a value of k such that the regular decap outperforms the active decap. For example, setting G= and n=, we find that k=33%. For values of k > 33%, the active decap is of no value. However, if k is below this value, the active decap is able to deliver more charge. For example, if k=0% and n=, then G=.75. This implies that.75 times more charge can be delivered by the active decap before its output voltage drops to the same level as the passive decap. In practice, this level of improvement is not possible due to the switch resistances and leakage currents. In fact, the boosted voltage cannot reach nv DD but instead reaches a lower voltage of bv DD. Therefore, the gain equation should be rewritten as: [ bvdd ( k) VDD ] Cdecap / n b+ k (5) kvdd ncdecap where b= n f( R ) g( C ) (6) 0.9 on decap solving for a in the following equation: ( bvdd avdd ) Cdecap / n b a (7) kvdd ncdecap In this case, with b=.7, k=0% and n=, we obtain a=.3, which implies that the active decap will be boosted initially to.7v (instead of V) and then falls back to.3v due to the charge demand of a nearby logic circuit. In the passive case, the initial voltage of V would be reduced to 0.9V, so the active decap is still superior even with the non-idealities included. To design the sizes of the MOS switches, a number of issues must be considered. From the above analysis, a small resistance value is preferable to increase the voltage boosting capability of the active decap, and to improve transient response times. The on resistances also provide ESD protection because they are in series with the decaps. Thus, this resistance must be large enough to safely protect the thinoxide gates. Considering the factors of boosted voltage level, decap performance, and ESD reliability, the on resistances should be designed to be in the range of 0-0Ω by proper selection of transistor widths... Active decap architecture 0.8 f (Ron) 0.7 0.6 0.4 0 00 400 600 800 000 On Resistance of the Switches Ron (Ω) g (Cdecap) 0.9 0.8 0.7 0.6 0.4 0 0 0 30 40 50 Decap Value Cdecap (nf) Figure : Reduction factors f(r on ) and g(c decap ). The reduction factors, f(r on ) and g(c decap ), are both and depend on the switch resistance, R on, and the leakage current which, in turn, is proportional to C decap. Using circuit simulation, normalized plots of f(r on ) and g(c decap ) are provided in Fig.. The switch resistance has a more pronounced effect on b as compared to the leakage current. For example, with R on =0Ω and C decap =700pF, we obtain f=0.9 and g=0.95 from Fig.. If we combine the two effects, then b=(0.9)(0.95).7 instead of. With k=0%, the achievable gain is now reduced to G=.0. The actual final voltage value, av DD, when the active decap supplies the same charge as the passive decap is determined by setting G= and Figure 3: Active decap architecture. Fig. 3 illustrates the complete active decap design containing four blocks [4]: a reference voltage generator, a pair of high-pass filters, two comparators, and the switched decaps. The user logic circuit block shown in the figure is considered to be the main cause of power supply noise violation. The switch control circuit for the active decap is realized using two comparators. The differential inputs of each comparator determine the standby voltage levels at the outputs of the comparators. In the standby mode, the top comparator has an output at V DD, whereas the bottom comparator is set to V SS. When the power grid discharges, V DD will drop and V SS will rise. The voltage variations are passed through the high-pass filters to the comparator inputs causing the comparators to reverse their output values and switch the decaps from parallel to series. After the power grid charges up, the comparator inputs and outputs switch back to their original values. An enable signal is provided for testing purposes. In Fig. 3, the reference voltages are generated by a simple voltage divider and are set to roughly V DD /. Inserting a small resistor, R, between the two transistors is intended to separate the reference voltages by approximately 30mV. Then, if the comparators are designed to switch when the

voltage difference at the inputs is 0-5mV (plus an additional 5mV of hysteresis), the overall design will trigger at approximately 50mV. The two comparators must be able to sense voltage variations that exceed the pre-specified sensitivity level (i.e., 0-5mV) and switch quickly enough to respond within a clock cycle [5]. When the decaps are in parallel, the subthreshold leakage from the switches consumes considerable power due to the large sizes of the switch transistors. To reduce leakage current, the outputs of the comparators should be as close as possible to either V DD or V SS. The supply noise budget for a V power supply is normally less than 50-00mV and the output is full swing, indicating the need for high gain in the switching region. A latch-based comparator was selected for this application [4]. Complementary designs are used for the top and the bottom comparators to have roughly equal switching delays of ns. In this case, our design goal was set to a maximum clock speed of about GHz, which makes it suitable for today s high-end ASICs, and even for medium-speed custom designs. When the supply voltage drops to 0.9V (implying 00mV of noise, i.e., k=0%), the average switching delay for a full output swing was designed to be ns, which should allow proper operation up to GHz. The bias voltages for the comparators are generated by simple current mirrors. Variations in process, voltage and temperature (PVT) on the comparator and the bias generation can cause delay differences in the comparator outputs. This delay difference acts to further reduce the boosted voltage. During the layout stage, great care has been given to ensure that the delay differences are within 00ps under all PVT variation simulations, which results in an additional 0% loss in the boosted voltage, i.e.,.6v rather than.7v. The charge demand of the logic circuit itself will cause an additional voltage drop of 0.4V, resulting in an expected final voltage of.v. In addition, the current drive of the comparators will act to reduce the supply voltage further, but is able to keep the value above 0.9V. 3. Extended active decap (n 3) 3.. Design of extended active decap The motivation to have a larger stack height (n 3) is to generate a higher boosted voltage nv DD that is linearly proportional to the stack height n to potentially achieve a better improvement for supply noise reduction. For n=3, the ideal boosted voltage is 3V DD and for n=4, it is 4V DD. Therefore, it seems that the stack height should be designed as large as possible to obtain a high enough local boost so that the supply noise can be reduced to an arbitrarily small level. However, this is not true in practice. In fact, a higher stack height requires more switches to turn the decaps from parallel to series. For example, the active decap circuit n=3 are illustrated in Fig 4. In the figure, it is assumed that the total area available for the decaps is fixed at C decap. Therefore, each decap occupies an area of (/3)C decap. The case for n=4 is shown in Fig. 5. It has more switches and decaps than for n=3. The goal of this paper is to determine which configuration is best suited for a given level of supply noise, k. Figure 4: Extended active decap with n=3. Figure 5: Extended active decap with n=4. The practical constraints of stacking the decaps can be illustrated by first examining the implementation of the extended active decap (n=3) in Fig. 4. The number of switches required is 3(n-) for n. For n=3, we require 6 switches, and for n=4, we require 9 switches. Note that each horizontal switch is implemented with two transistors (NMOS and PMOS), while each vertical switch requires only one transistor (NMOS or PMOS). Due to the added number of switches in the extended active decap, two methods of design exist: one is to expand the area occupied by the switch transistors, and the other is to reduce the size of each switch transistor if a relatively constant total area needs to be maintained. The first method causes a longer comparator delay because of the additional loading capacitance. The longer delay results in a reduced bandwidth that makes the active decap suitable at lower operating frequencies. The second method has the same area overhead by fixing the total area occupied by the switch transistors. The delay of the comparators remains roughly the same since there is only a minimum change in the loading capacitance. However, packing more switch transistors into the same area causes the on resistance R on of each transistor to increase, which reduces the boosted voltage bv DD. The latter method to implement the extended active decap was used since the decap bandwidth should not be compromised for high-speed applications. In practice, the reduced bv DD limits the improvement of using extended active decaps and an optimal n exists to produce the highest final voltage av DD. 3.. Layout and fabrication of active decaps The basic (n=) and extended (n=3) active decaps were designed, and their layouts are illustrated in Fig. 6 and Fig. 7, respectively. Although not shown here, a similar layout would be obtained for n=4. The switching circuitry is located near the center, with the parallel decaps on either side. The

decaps on the left are PMOS, and the one on the right is NMOS. The total layout area for the active decap is about 600μm x 4μm = 0.085mm, in which the passive decaps on either side combine for an area of 0.077mm. Each switch transistor for the case n=3 was layed out with only a half of the area as of the case of n=, resulting in an almost doubling of the on resistance. The sensing and switching circuitry, including the switch transistors, accounts for 0% overhead of the total area. where n n and n>n. Plugging in numbers, the crossover noise value from n= to n=3 is k,3=% and the crossover noise value from n=3 to n=4 is k3,4=8%. Effectively, the crossover noise value of k4,5=5% determines the boundary where a passive decap should be used since the case of n=5 is impractical. Similarly, k,=7% produces the same final voltage for n= and n= (passive decap). When k is above 7%, the passive decap should be used. The results are summarized in Table. Figure 6: Layout of n= active decap. Figure 7: Layout of n=3 active decap. k=5% k=0% k=5% k=0% a VDD (V).5 as in the case of k=0%. From the figure, one can conclude that a higher stack height level of n should be used if the supply noise k is small, and a lower value of n should be used if k is large. Therefore, the optimal choice of n should be based on the k range where the final voltage avdd is the highest. The supply noise crossover point,,n, for two different stacking levels, n=n and n=n, is defined when both cases produce the same final voltage. This can be used to identify suitable ranges for each stacking level. To obtain,n, (8) can be rearranged into the following form: (b b )( x )( x n ), n = n n n (9) (n n ) + n x n n x n Table : Optimal stack height n selection (from formula). Condition Optimal n k < 5% use passive decap 5% < k < 8% n=4 8% < k < % n=3 % < k < 7% n= k > 7% no help.5 Passive Decap (n=) Active Decap (n=) Active Decap (n=3) Active Decap (n=4) 0 3 4 k 3,4=8% Figure 8: Optimal n generating the highest avdd. The area overhead consumed by the sensing and switching circuitry should be considered as an additional penalty for the active decap performance. The percentage area overhead can be defined as x, then (7) should be re-written as: (bvdd avdd ) ( x)cdecap / n (b a )( x) (8) G= = kvdd ncdecap The basic active decap (n=) was fabricated in a 90nm process, and measurement results for the test chips were obtained [4]. 3.3. Optimal stack height n from formula The final voltage avdd can be obtained by varying the stack height n according to (8), as illustrated in Fig. 8. In the figure, the final voltage avdd with different supply noise levels, k, is plotted. Note that the case of n= indicates passive decaps. Clearly, for k=0%, the optimal n value that produces the highest final voltage is 3. For an increased level of k=5%, the optimal value reduces to n=. As the supply noise k further increases, the use of n= provides no benefit a VDD (V) n.5 k,3=% k,=7% 0 0% 5% 0% Supply Noise k 5% 0% Figure 9: Final voltage avdd with different n. The results are also presented in a graphical form in Fig. 9. The four lines represent n=,, 3 and 4, respectively. The line with the highest value in each region is the optimal value for n. For low values of k, the best choice is n=4. Starting at the k3,4 crossover point, the best choice is n=3. At the k,3 crossover point, the best choice becomes n=. At the k, crossover point, the best choice is n= from that point onward. As described earlier, if the supply noise is above 7%, the use of any form of the active decap cannot boost the supply

voltage to a satisfactory level. Other design approaches to reduce the supply noise may have to be used in that situation. However, the more interesting range of k is from 8% to 7%, since this noise level is typically unacceptable. If the supply noise k is below % but above 8%, then the active decap should be designed to have a stack height of 3 to produce the minimum noise. Alternatively, if k is above % but below 7%, the basic active decap with n= is optimal. 3.4. Simulation results correlated from a test chip To validate the results from formula, the design of the extended active decaps with n=3 and n=4 was simulated after correlation with measurement results for n= [4]. The supply voltage waveforms were obtained at different supply noise levels. By changing the size of the user logic buffer, the noise k level was varied. The simulated average V DD voltage per clock cycle is important in determining the delay in a logic circuit critical path [5][6]. Therefore, this value is used to compare the improvement, and it should be close to the final voltage av DD value obtained from formula. From simulation, the average V DD for the basic (n=) and the extended active decaps with n=3 and n=4 when the supply noise k varies from % to 0% are plotted in Fig. 0. The corresponding optimal stack height n as a function of the supply noise k is highlighted in Table. Clearly, the crossover points are similar between the formula and simulation. Unlike the formula, when k is low (<5%), the active decaps do not switch due to the fixed triggering voltage at about 50mV, so that the active decaps produce similar average V DD voltages to the passive decap. However, when k<5%, active decaps need not to be applied as the use of passive decap is enough. The important crossover point of the basic (n=) and the extended (n=3) active decaps from simulation is k,3 =4%, somewhat higher than the calculated value of k,3 =%. When k is above 4%, no approach can raise the supply level back to above 900mV, making the use of active decaps ineffective. On the other hand, the lower bound of k 3,4 =7% is slightly below the calculated level of 8%. Therefore, the wider k range of 7%-4% for the optimal use of the n=3 extended active decap makes it superior to the basic active decap. Although the n=4 active decap is the best when k is in the range of 5%-7%, it only has limited use since this k range is small and the improvement over the n=3 active decap is marginal. Thus, it can be concluded from simulation that the use of the extended active decap (n=3) provides the optimal level of the average supply voltage across a wide supply noise range of below 4%. If the supply noise is above 4%, no drop-in replacement of active decaps is satisfactory and a larger area must be used. Table : Optimal stack height n selection (from simulation). Condition Optimal n k < 5% use passive decap 5% < k < 7% n=4 7% < k < 4% n=3 4% < k < 6% n= k > 6% no help Average VDD Voltage (mv) 980 960 940 90 900 880 860 840 80 800 k 3,4=7% Passive Decap (n=) Active Decap (n=) Active Decap (n=3) Active Decap (n=4) 0% 5% 0% 5% 0% Supply Noise k Figure 0: Average V DD voltage with different n. k,3=4% k,=6% 4. Conclusions This paper described the design of basic and extended active decaps. Practical limitations on design and layout were identified and design formulas were provided. By varying the stack height n to an optimal value, depending on the supply noise k level presented at the local supply, a maximum supply boost can be achieved. It was found that the extended active decap with n=3 provided a superior performance by delivering a higher average supply voltage than the n= and n=4 cases when the supply noise k is in the range of 7%-4%. When k is above 4%, n= must be used and beyond 6%, the area for the drop-in replacement of active decaps must be expanded to produce satisfactory improvement over the passive decap. 5. Acowledgments The authors would like to thank NSERC and PMC-Sierra Inc. for financial support to this work. 6. References [] H. Yamamoto and J. A. Davis, Decreased effectiveness of on-chip decoupling capacitance in high-frequency operation, IEEE Transactions on VLSI Systems, vol. 5, no. 6, pp. 649-659, Jun. 007. [] M. Ang, R. Salem, and A. Taylor, An on-chip voltage regulator using switched decoupling capacitors, IEEE International Solid-State Circuits Conference, pp. 438-439, Feb. 000. [3] J. Gu, H. Eom, and C. Kim, A switched decoupling capacitor circuit for on-chip supply resonance damping, IEEE Symposium on VLSI Circuits, pp. 6-7, Jun. 007. [4] X. Meng and R. Saleh, An improved active decoupling capacitor for hot-spot supply noise reduction in ASIC designs, IEEE Journal of Solid-State Circuits, to be published. [5] K. Arabi, R. Saleh, and X. Meng, Power supply noise in SoCs: metrics, management, and measurement, IEEE Design and Test of Computers, vol. 4, no. 3, pp. 36-44, May-Jun. 007. [6] E. Alon and M. Horowitz, Integrated regulation for energy-efficient digital circuits, IEEE Journal of Solid- State Circuits, vol. 43, no. 8, pp. 795-807, Aug. 008.