A New Method for Design of Robust Digital Circuits

A New Method for Design of Robust Digital Circuits Dinesh Patil, Sunghee Yun, Seung-Jean Kim, Alvin Cheung, Mark Horowitz and Stehen oyd Deartment of Electrical Engineering, Stanford University, Stanford, CA 9435-95 ddatil,sunghee.yun,akcheung,sjkim,horowitz,boyd stanford.edu Abstract As technology continues to scale beyond nm, there is a significant increase in erformance uncertainty of CMOS logic due to rocess and environmental variations. Traditional circuit otimization methods assuming deterministic gate delays roduce a flat wall of equally critical aths, resulting in variation-sensitive designs. This aer describes a new method for sizing of digital circuits, with uncertain gate delays, to minimize their erformance variation leading to a higher arametric yield. The method is based on adding margins on each gate delay to account for variations and using a new soft maximum function to combine ath delays at converging nodes. Using analytic models to redict the means and standard deviations of gate delays as osynomial functions of the device sizes, we create a simle, comutationally efficient heuristic for uncertainty-aware sizing of digital circuits via Geometric Programming. Monte-Carlo simulations on custom 32bit adders and ISCAS 85 benchmarks show that about % to 2% delay reduction over deterministic sizing methods can be achieved, without any additional cost in area.. Introduction Extensive research has been done on automatic circuit sizing for minimizing delay under an area or ower constraint, using both model based [2 and simulation based [ aroaches. While these aroaches have been very successful, they assume deterministic gate delay models and invariably result in large number of equally critical aths that form a wall in the ath delay histogram [3. As we scale technology to the sub-nm feature size, both intrinsic device variations and rocess lithograhy control issues are increasing the statistical variability of each gate in a circuit [4. This delay variation causes the exected delay for a circuit, which is the exected value of the maximum of all the ath delays, to grow larger as the wall of critical aths gets taller. Statistical Static Timing Analysis (SSTA) using Monte Carlo simulations show that such deterministic Distribution 6 4 2 8 6 4 2 delay redicted by deterministic design cost of variation 9% delay by Monte Carlo analysis worst case guard band delay.6.7.8.9..2.3.4 Circuit delay in ns Figure. Monte Carlo Analysis on a deterministically sized 32-bit adder otimization drives the sizing into an extremely variation sensitive corner. On the other hand, guard banding designing the circuit to meet secs in the worst-case corner, results in an exceedingly essimistic timing estimate, due to the inherent assumtion of comlete correlation among all the gates. Figure comares the delay of a deterministically otimized 32 bit adder as estimated by Deterministic Static Timing Analysis (DSTA) using tyical and worst case models to the delay distribution of the same circuit obtained after Monte Carlo SSTA. Clearly, the statistical variability causes a significant error, about 2%, if we look at the difference between the estimated deterministic delay and the mean of the exected delay with statistical variations. The error is even larger if we want the delay that say 9% of the distribution will meet. This large error has led a number of researchers to create statistical timing analyzers, using aroaches that range from Monte Carlo analysis to roagating delay Probability Distribution Functions (PDFs) through the netlist -7695-23-35 $ 2. IEEE

, n : q : q s 5 [,, 2. While these techniques have been successfully integrated into SSTA, it is less clear how to extend these techniques to solve the circuit sizing roblem. Furthermore, while they are clearly needed to accurately estimate the timing of these circuits, it is less clear that this degree of fidelity is needed to otimize their device sizes. Our aroach is based on the intuition that the circuit sizing roblems tend to have large relatively flat minima. The sizer mostly needs to avoid making bad choices (or having variation ush the solution into a bad case) rather than choosing the recisely correct value. As a result we took a different aroach to the roblem. We asked what small changes could we add to the current sizing aroach to imrove its erformance when dealing with circuits with statistical variation. Our goal was to see how well we could do by extending current otimization techniques. Following this aroach, the new uncertainty aware sizing algorithm we resent here is an extension of the deterministic method. We augment the gate delay models using margins related to the standard deviation. The ath delays at converging nodes are combined using the soft maximum function in order to correctly cature the statistical behavior of the of a set of random variables. The next section rovides a quick overview of the sizing roblem, and reviews the solution for deterministic circuits. These techniques are then extended in 3 to rovide the otimizer some indication of the uncertainty of each gate delay. While the techniques in 3 can be used with any delay model, the models we used for the exected delay and the sigma of the delay are described in 4. These models are then used to roduce the results that are shown in 5. Throughout this aer, bold caital letters, e.g.,,, and, denote random vectors or variables, while the corresonding lower case letters denote their articular realizations. We use to denote the exected value of a random variable, to denote its standard deviation, and to denote the ercentile oint on its robability distribution curve. For a vector,the comonent is denoted as. 2. Circuit Sizing Assuming the objective is minimizing the circuit delay " % (i.e., the maximum of the set o of signal arrival times at the circuit oututs), under constraints on total area A, the deterministic otimization roblem can be formulated as shown below () [. " % minimize " % subject to 2 4 5, 7 8 > :, C () Here 2 4 5 is a given limit on total circuit area, : is the vector of transistor sizes (or cell sizes in case of standard cell design) and 7 8 : reresent, for each gate >, a set of constraints on its device sizes, signal sloes, and delay roagation from its inuts to the outut. J K J L J N P K Q R P L Q R P N Q R o J U V W X Y U [ \ Figure 2. Gate delay constraints for roblem () To other gates For instance, Figure 2 shows a tyical gate for which we can write: ^ _ ` b d f d g h (2) where is the signal arrival time of inut and,the tyical gate delay from inut to the outut m is a function of the load caacitance n 4 o, transistor sizes :, channel length, suly voltage q o o, threshold voltage q, oxide thickness s 5, and mobility t : 7 n 4 o q o o (3) Rising and falling delays are considered searately. The comlexity of solving this otimization roblem deends on the form of the and other constraints. In articular, if is a generalized osynomial [2, this becomes a geometric rogram, which can be efficiently solved using convex otimization techniques. 3. Sizing for Robust Design In resence of variations each is a random variable with mean given by (3) and the standard deviation modeled as a function of 4 o q o o q t t etc.. The deterministic algorithm only considers and results in many equally critical aths, which is mainly resonsible for the statistical delay sread. The exact statistical sizing roblem considering detailed distribution and roagation of each gate delay is comutationally intractable. We want to achieve statistical tuning without having to roagate PDFs but instead, roagate a delay number that reresents the tail of the distribution. -7695-23-35 $ 2. IEEE

?? F M 3.. Augmenting the mean delay We roose to use gate delays l l 8 h defined as (4) in (2) in lace of. In other words l for the > gate includes extra margins (scaled ) to account for the 8 variation and uncertainty in the gates delays. We call margin coefficients. This can be interreted as adding a delay enalty term to each gate that is roortional to its delay uncertainty. 3.2. Use of soft maximum Since ^ _ in (2) is a maximum of a set of inut delays that are random, the distribution of is shifted to the right of all the inut delay distributions. This shift is more ronounced when several of the inut arrival time distributions are near the maximum, and negligible when, say, one of the inuts arrive much later than the others. To take into account the right shift caused by taking the maximum of a set of random variables, we roose to use a soft maximum function defined as b where is the exonent that reresents the enalty for closeness of arguments and the sum accounts for increase in uncertainty with every extra inut. This steers the otimizer away from making the aths equally critical. The soft max retains in sirit the fact that under variations even a ath with smaller mean can contribute to the delay sread at the converging node, while it asymtotically aroaches the function. Combining the two techniques we can write (2) for the gate in Figure 2 as: b ^ _ h l ` b d f d g Since these techniques retain the comutational merits of the deterministic sizing roblem (like sarsity), the algorithm is easily scalable to larger circuits. Moreover, if the and of gate delays are generalized osynomials (which is the case if we use the Elmore delay model [6, the velocity saturated delay model [8, or curve fit model [2), then the roblem can be cast as a generalized geometric rogram (GGP) [2, which can be solved globally with great efficiency. A crude search loo in the sace around the basic otimization routine can easily be imlemented to obtain the best statistical sizing (as validated by SSTA). 3.3. Validation Consider two (Gaussian for convenience) random variables b and f. Let b and b while we swee f from.7 to.3 and f.let b f.wefind! and " $ (i.e. b & and f & ) of the random variable using Monte Carlo samles. These are lotted as solid curves of varying f for three values of f in figure 3. We then define! and " $ as:! ( $, t h " $ (., t h (5) in order to fit! and " $ by choosing the right and (lotted as dashed curves). A => 9 : oint - GA => 9 E oint -.4.2.7.8.9..2.3.4.2 H I H I 2 K 7 2 K 7 N O P 2 2 4 T 7 V 2 4 5 7 7 2 4 5 7 2 4 7 decreasingy 2 4 5 7.7.8.9..2.3 Figure 3. Soft max with margins validation for and 2 oints The lots in figures 3 shows that our and margins give close estimates of the! and " $ for secific values of and.here b and f reresent the delay of two converging critical aths which can vary by 3% from each other in their mean and differ by 5% in their standard deviation. For simlicity, we consider a uniform and for all gates. A value of between 3 and -7695-23-35 $ 2. IEEE

o q n n!! h 5 and margin coefficient between.5 and 2.5 give good statistical sizing in most circuits. Of course, in a real netlist, the number we obtain for the signal arrival time at any net, using our heuristic, is certainly not the exact (for a secific ) on its timing distribution. It just reresents a measure of the criticality of the arrival time to the overall delay. The timing results we resent are always from a SSTA done after the robust otimization; SSTA is the only trustworthy method for comaring results. We use our soft max function, and the simle augmented delay exression only to design the circuit, and not to analyze it. 4. Statistical Delay Model While the above techniques can be used with a number of different timing models, we have been using a simle analytical model to estimate delays. Ideal quadratic transistors can be nicely modeled as resistors [7, but all modern transistors are current velocity saturated. Although this makes the analysis a little more difficult, it is still ossible to create simle, accurate analytical timing models, that are comatible with GP solvers. Our model uses Channel Connected Comonent (CCC) as the basic gate structure. This is a grou of transistors that have their sourcedrain connected, with some transistors connecting to q o o and others to ground. For full custom designs, each transistor can be otimized individually, while in cell based designs, all the transistors in the cell are sized together. In ISCAS 85 benchmarks, each cell may contain one or more CCCs and cell sizes are the design variables. 4.. Mean Delay Model We have extended the Meyer velocity saturated current model described in [5, 8 to obtain the delay of CMOS CCCs. In this model, the current througha MOS transistor is: 4 5 f q o o h where q q o o q, 4 is the saturation velocity and is the electric field that sets the onset of velocity saturation. To use this model for gates, we need to find the effective and effective for a chain of NMOS transistors. We estimate the current by creating an effective transistor where : b : 2 : ` b Using this current equation, we estimate the fall delay (time to discharge the outut to q o o ) by 4 o q o o, where is the inut sloe and is a constant determined by q o o and q. While formulating the gate delay constraints, the added delay due to is absorbed in the delay of the fan-in gate. If the inut is at the bottom of the chain, then it has to discharge all the intermediate nodes. In this case we decomose the fall delay as sum of fall delays where each intermediate caacitor is discharged by the chain below it, just like in the Elmore delay calculation. The accuracy of the delay model remains well within 8% for chains of uto 4 transistors, for reasonable n 4 o and signal slews ( ). Similar exressions can be written for the rise delay through PMOS chains The can be formed by considering all chains that contain the inut and hel to drive the outut m and taking the maximum of these delays, for static roblem formulation. The mean delay of a CCC thus obtained is a generalized osynomial [2 of its transistor widths. For ISCAS 85 circuits, the cell size are design variables. The cell delay models are obtained using osynomial fitting [2 on the cell library data. 4.2 Standard Deviation Model We use Pelgrom s model [3 for the variation of a device current, which states that arameter variations tend to reduce as the area of the fabricated MOS structure increases. We extend this idea to the chain of transistors by exressing the relative as where is a constant deending on the fabrication rocess. Thus the variance of the drain current of an transistor chain is inversely roortional to the electrically effective area of the chain. The effective area is a weighted sum of the device areas, so that the contribution to the variations is weighted according to the contribution to. From, the standard deviation of delay ( )is obtained as For ISCAS 85 benchmarks we use " -7695-23-35 $ 2. IEEE

where " is the footrint area of the cell s layout. For simlicity, we have not included the variation in delay due to wire width variation or variation in n 4 o of fan-out gates. These can be easily included in the detailed framework. Also, correlation between gates can be incororated by adding additional margins to in (4). 4 35 3 Robust Sizing 5. Results The otimization algorithm was tested on two custom 32 bit adders, a Kogge-Stone (KS) and a Ladner-Fischer (LF) [7 designed in TSMC.8.8V CMOS with an FO4 delay of 8s and a ISCAS 85 benchmark in bulk TSMC.3.8V with an FO4 delay of 3s. For a chain of transistors we used of for equivalent to that of a single minimum length transistor with. For ISCAS 85 cell based design the 5% variation was for the minimum sized cell. Internal wire caacitances, wherever significant, are also included in the otimization. The circuits were otimized under identical load, area and other constraints for deterministic and statistical cases. For custom circuits, the area is the sum of the widths of all devices while for ISCAS circuits, it is the sum of all cell areas modeled from the library as a function of cell sizes. Signal slew rate constraints are indirectly rovided by constraining the delay er logic stage, which is GGP friendly. The otimizations are done using the MOSEK [2 convex otimization ackage. The results of Monte Carlo timing analysis are shown in Table. Table. Monte Carlo timing analysis J Y J Y circuit det. sizing stat. sizing imrovement in in ns (FO4) in ns (FO4) timing 32-bit LF adder.6 (3.3).84 (.5) 2.6% 32-bit KS adder.98 (2.3).8 (.) 7.7% ISCAS c88 2.2 (6.9).99 (5.3) 9.5% Figure 4 shows the PDFs of the delay of 32-bit LF adder[7 along with the " $ oints. The imrovement in " % " $ is quite significant after statistical sizing. The imrovement in c88 is not significant due to lack of freedom caused by having only one size reresent the entire cell and one cell containing ossibly multile CCCs sized to a fixed ratio. We have observed that the results are very weakly deendent on the kind of distribution, but are slightly deendent on the model used for. The imrovement increases as deends more strongly on the effective device area than that rovided by the Pelgrom s model. Also the Distribution 25 2 5 5 Deterministic Sizing.75.8.85.9.95.5..5 in ns Figure 4. Deterministic versus robust sizing: Delay PDF for a 32-bit LF adder. imrovement increases with increasing number of arallel oututs. The region around the otimal is largely flat, so that a change of 5 around the otimal or.5 around the otimal " % results in a change of only a few s in the " $.So a crude search suffices, drastically reducing the otimization time. Each iteration tyically consists of about 3s for otimization and 3s for samle Monte Carlo on a 2GHz Pentium PC with G memory, for the resented circuits. Figure 5 shows the vs. scatter lots of all the ath delays in the LF adder for deterministic and statistical sizing. Clearly, the wall in the deterministic case is broken and the variation reduced for the statistical case at the exense of increased mean deterministic delay. 6. Conclusions Statistical variations in device arameters will likely continue to worsen as we scale technology. It will be critical to account for these variation in both analog and digital circuits. While accurately accounting for uncertainty while sizing a digital circuit is difficult, we have shown that a few simle heuristics imrove the exected erformance of the resulting circuit, and can be easily fit into today s otimization tools. Our method adds a enalty to each gate that is roortional to its uncertainty, and then changes the max function to account for the added delay that occurs when a set uncertain inuts with similar exected times combine. Our method attemts to strike a balance between the goal of having the smallest delay for a given area or ower, or vice versa, and reventing excessive downsizing of non critical -7695-23-35 $ 2. IEEE

σ(delay) in ns σ(delay) in ns.5..5 Deterministic sizing.3.35.4.45.5.55.6.65.7.75.8 µ(delay) in ns.5..5 Statistical sizing.3.35.4.45.5.55.6.65.7.75.8 µ(delay) in ns Figure 5. scatter lot for all aths of 32bit LF adder. aths that lead to new critical aths. We are currently working on extending this framework to otimize other transistor and circuit arameters, like q o o and q to better otimize our designs. Acknowledgements This work was funded in art by C2S2, the MARCO Focus Center for Circuit and System Solution, under MARCO contract 23-CT-888 and by Philis Inc. References [ A. Conn, I. Elfadel, W. Molzen, P. O rien, P. Strenski, C. Visweswariah, and C. Whan, Gradient-based otimization of custom circuits using a static-timing formulation, Proc. Design Automation Conference (DAC), 999,. 452-459. [2 S. Saetnekar, V. Rao, P. Vaidya, and S.-M. Kang, An exact solution to the transistor sizing roblem for CMOS circuits using convex otimization, IEEE Transactions on Comuter- Aided Design of Integrated Circuits and Systems,. 62-634, November 993. [3 M. Pelgrom, C. Duinmaijer, and A. Welbers, Matching roerties of MOS transistors, IEEE Journal of Solid State Circuits,. 433-44, October 989. [4 S. Nassif, Within-chi variability analysis, Proc. of IEDM, 998,. 283. [5 K. Chen, C. Hu, P. Fang, M. Lin, and D. Wollesen, Predicting CMOS seed with gate oxide and voltage scaling and interconnect loading effects, IEEE Transactions on Electron Devices,. 95-957, November 997. [6 J. Rubenstein, P. Penfi eld and M. Horowitz, Signal delay in RC tree networks, IEEE Transactions on Comuter-Aided Design of Integrated Circuits and Systems,. 22-2, July 983. [7 M. A. Horowitz, Timing Models for MOS circuits, Ph.D. Thesis, Stanford University, 983. [8 K.-Y. Toh, P.-K. Ko, and R. Meyer, An engineering model for short channel MOS devices, IEEE Journal of Solid-State Circuits,. 95-958, August 988. [9 S. orkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshabvarzi and V. De, Parameter variations and imact on circuits and architecture, Proc. Design Automation Conference (DAC), 22,. 58-63. [ M. Orshansky and A. andyoadhyay, Fast statistical timing analysis handling arbitrary delay correlations, Proc. Design Automation Conference (DAC), 24,. 337-342. [ A. Gattiker, S. Nassif, R. Dinakar, and C. Long, Timing yield estimation from static timing analysis, Proc. International Symosium on Quality Electronic Design (ISQED), 2,. 437-442. [2 C. Vishweswariah, K. Ravindran, K. Kalafala, S. Walker, and S. Narayan, First-order incremental block-based statistical timing analysis, Proc. Design Automation Conference (DAC), 24,. 33-336. [3 X. ai, C. Vishweswariah, P. Strenski, and D. Hathaway, Uncertainty-aware circuit tuning, Proc. Design Automation Conference (DAC), 22,. 338-342. [4 M. Hashimoto and H. Onodera, A erformance otimization method by gate sizing using statistical static timing analysis, Proc. ACMSIGDA International Symosium on Physical Design, 2,. -6. [5 S. Raj, S. Vrudhula, and J. Wang, A methodology to imrove timing yield in the resence of rocess variations, Proc. Design Automation Conference (DAC), 24,. 448-453. [6 E. Jacobs and M. erkelaar, Gate sizing using a statistical delay model, Proc. of Design, Automation, and Test in Euroe, 2,. 283-29. [7 S. Knowles, A family of adders, Proc. 5th IEEE symosium on Comuter Arithmetic, 2,. 77-82. [8 S. Kim, S. oyd, S. Yun, D. Patil, and M. Horowitz, A heuristic for otimizing stochastic activity networks with alications to statistical digital circuit design, Technical reort, Stanford University, Stanford, CA 9435, 24. Available from www.stanford.edu boydheur_san_ot.html. [9 S. oyd, S. Kim, L. Vandenberghe, and A. Hassibi, A tutorial on geometric rogramming, Technical reort, Stanford University, Stanford, CA 9435, 24. Available from www.stanford.edu boydg_tutorial.html. [2 S. oyd, and L. Vandenberghe, Convex Otimization, Cambridge University Press, 23. [2 MOSEK AS, The MOSEK Otimization Tools Version 2.5. User s Manual and Reference, 22. Available from www.mosek.com. -7695-23-35 $ 2. IEEE