Compact Variation-Aware Standard Cell Models for Statistical Static Timing Analysis

Size: px
Start display at page:

Download "Compact Variation-Aware Standard Cell Models for Statistical Static Timing Analysis"

Transcription

1 Compact Variation-Aware Standard Cell Models for Statistical Static Timing Analysis A Dissertation Presented to The Academic Faculty By Seyed-Abdollah Aftabjahani In Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy in Electrical Engineering School of Electrical and Computer Engineering Georgia Institute of Technology August, 2011 Copyright 2011 by Seyed-Abdollah Aftabjahani

2 Compact Variation-Aware Standard Cell Models for Statistical Static Timing Analysis Approved by: Dr. Linda S. Milor, Advisor School of Electrical and Computer Engineering Georgia Institute of Technology Dr. Jeffrey A. Davis School of Electrical and Computer Engineering Georgia Institute of Technology Dr. Sung-Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology Dr. Yorai Wardi School of Electrical and Computer Engineering Georgia Institute of Technology Dr. Michael F. Schatz School of Physics Georgia Institute of Technology Date Approved: June 01, 2011

3 DEDICATION To my beloved parents

4 ACKNOWLEDGEMENTS This dissertation would not have been possible without the help and support that I have received from many individuals. I would like to express my gratitude to all who assisted me in this endeavor to push my limits of knowledge even further. I will not forget their impact upon my life. I would like to express my deep appreciation to my thesis advisor, Professor Linda Milor for her continuous support, encouragement, and supervision on my research. Throughout the years that I have been her research assistant, I have had excellent opportunities to acquire many academic and research skills, which I will use throughout my life. I have learned many intricate details on how to conduct high-quality academic research from the conception of a research idea, to the conduction of a literature review, to the formulation of a problem, to the utilization of creative thinking to address a problem, to the construction of models and prototypes for analysis and evaluation of a solution, and to the publication and presentation of results. I would also like to thank my committee members, Professor Jeffrey A. Davis, Professor Sung Kyu Lim, Professor Yorai Wardi, and Professor Michael Schatz for spending their precious time on the guidance and review of my research. I would like to thank the Semiconductor Research Corporation (SRC) for support of this research project under task I am grateful to the SRC for providing me with personal and professional development opportunities by funding my attendance at the related conferences, specifically TechCon, to network with experts in the field, to iv

5 present the research to the leaders in academia and industry, and to receive appropriate feedback to improve the quality of the research. I would like to acknowledge my dear colleagues, especially Fahad Ahmed and Muhammad Bashir for all their constructive discussions on my research, and others who have contributed technical or editorial assistance including Professor Azad Naimee, Dr. Reza Sarvari, Dr. Alireza Shapoori, and Alex Anderson. I would like to thank the Dr. Kevin Martin and all technical personnel and staff of the Microelectronics Research Center (MIRC) at the Georgia Institute of Technology for providing a superb environment conducive to my research. Last but not least, I wish to thank my family, especially my parents, for all their love and support. They have provided a solid foundation for me to grow in all aspects of my life, including my education. v

6 TABLE OF CONTENTS ACKNOWLEDGEMENTS... IV LIST OF TABLES... X LIST OF FIGURES... XIII SUMMARY... XX CHAPTER I: INTRODUCTION... 1 CHAPTER II: BACKGROUND... 5 CHAPTER III: MODELING AND ANALYSIS OF COMPACT VARIATION- AWARE STANDARD CELLS Experimental Platform and Model of Variation Construction of the Waveform Model Comparison of PCA Methods for Waveform Modeling Construction of the Cell Model and Timing Analysis Comparison of Experimental Design Methods for Cell Modeling Complexity Analysis Conclusions CHAPTER IV: EXTENDING AND ENHANCING THE METHODOLOGY Constructing a Cell Model for Deep Submicron Technology More Accurate Transistor Models with More Parameters Non-binned Transistor Model vs. Binned Transistor Model vi

7 Support of Symmetric Parameter Variation for all Parameters Variation Parameters Chosen for Cell Models Constructing a Cell Model for Very Large Parameter Variations Constructing a Cell Model for Resistive-Capacitive Loads Timing Characterization of Complex Loads Mapping RC-Interconnect Networks to Pi-Models Cell Characterization with a Pi-Model Load RC-Interconnect Network Characterization Test Circuit and its Pi-Model-Converted RC-Interconnect Networks Timing Analysis Engine for Our Cell Models and RC-Interconnect Models Timing Simulation and Simulation Results Conclusions and Future Work Investigating Accuracy Improvement Methods Accuracy Analysis of the PCA Waveform Model Accuracy Analysis of the PCA Waveform Model Number and Location of Points Accuracy Analysis of the PCA Waveform Model for TSMC180RF Waveform Dataset Selection, Range of Parameter Variations, and Model Subranging Accuracy Analysis of the PCA Waveform Model for FreePDK45 Waveform Dataset Selection, Design of Experiment, Discretization Level, Range of Parameter Variations, and Model Subranging vii

8 Accuracy Analysis of the PCA Waveform Model for FreePDK45 The Iterative Method for Finding the Common PCs Accuracy Analysis of the Cell Models Conclusions CHAPTER V: FAST VARIATION-AWARE STATISTICAL DYNAMIC TIMING ANALYSIS Introduction VVCCP A compiled-code SDTA tool Fault simulation framework Transformation process of models Experiments Experimental results Conclusion and future work CHAPTER VI: FUTURE RESEARCH DIRECTIONS Short-Term Research Plan Long-Term Research Plan APPENDIX A: PRINCIPAL COMPONENTS ANALYSIS EQUATIONS A.1. Assumptions A.2. Singular value decomposition A.3. Properties of U A.4. Principal Component Transformation A.5. Generalized Measures of Variability viii

9 A.6. Scaling of Characteristic Vectors A.7. Overall Measure of Variability A.8. Residual Analysis APPENDIX B: THE STANDARD CELLS USED IN THE RESEARCH APPENDIX C: COMPARING EXPERIMENTAL DESIGNS FOR GENERATING THE WAVEFORM AND CELL MODELS C.1. Full-factorial Designs C.2. Fractional-Factorial Designs C.3. Designs Based on Latin Hypercube Sampling C.4. Central Composite Designs APPENDIX D: THE SPREAD OF C1, R, AND C2 OF THE PI-MODEL FOR RC- INTERCONNECT NETWORKS OF THE INVERTER CHAIN APPENDIX E: COMPLEXITY ANALYSIS FOR CELLS THAT USE A SATURATED RAMP WAVEFORM MODEL APPENDIX F: THE 3-DIMENSIONAL PLOTS FOR MODEL ACCURACY COMPARISION APPENDIX G: RESOURCES AND FACILITIES USED IN OUR RESEARCH. 202 RESEARCH CONTRIBUTIONS REFERENCES VITA ix

10 LIST OF TABLES Table 3.1. Variation model parameters Table 3.2. Residuals of PCA waveform models Table 3.3. Fraction of outliers in PCA waveform models (%) Table 3.4. Designs used for cell model construction Table 3.5. Number of terms (number of operations) in cell models Table 3.6. Adjusted coefficient of multiple determination for cell models (%) Table 3.7. Sum of squares of residuals for cell models Table 3.8. Comparing space complexity of methods for a cell (per delay/transition entry per input) Table 3.9. Comparing simulation time complexity of methods for a cell (per delay/transition entry per input) Table Comparing characterization time complexity of methods for a cell (per delay/transition entry per input) Table 4.1. New or enhanced BSIM4.5.0 model parameters used in FreePDK Table 4.2. BSIM4.5.0 model selectors/controllers Table 4.2. (cont.) BSIM4.5.0 model selectors/controllers Table 4.3. Example BSIM4.5.0 model process parameters Table 4.4. Variation model parameters for our VTC's (FreePDK45) Table 4.5. Variation model parameters (FreePDK45) Table 4.6. Variation model parameters (FreePDK45 - large variations ) Table 4.7. Variation model parameters (FreePDK45 - subrange) x

11 Table 4.8. Variation model parameters (FreePDK45 subrange & large variations) Table 4.9. The 2-level full-factorial design variation parameters for verifying Pi-models and H'(s) models Table Comparing output waveform transition time and delay errors (%) of our Pimodel and H'(s) model for all 11 RC-interconnect networks Table Comparing the inverter output waveform transition timing points errors (%) at 10%, 50%, and 90% points and equal resistance errors (%) of our Pi-model and H'(s) model for all 11 RC-interconnect networks Table Variation model parameters for (resistive-capacitive) Pi-model loads Table Designs used for cell model construction with a Pi-model load Table Sum of squares of residuals for cell models Table Comparing the model adequacy of the full-factorial models using coefficient of multiple determination (%) and adjusted coefficient of multiple determination (%) in parentheses Table Comparing the model prediction accuracy of the full-factorial models using coefficient of multiple determination for prediction Table Number of terms (number of operations) in cell models Table 4.18 Comparing STA time, the characterization time, the total number of operations, the memory usage and the accuracy of all the STA methods Table Waveform Models Compared for Accuracy TSMC180RF Table Waveform and cell models accuracy improvement methods Table Waveform models compared for accuracy FreePDK Table Which options improve waveform models accuracy FreePDK xi

12 Table Waveform models compared for adequacy and prediction accuracy FreePDK Table 5.1. Run time for different experiments Table 5.2. Number of inputs, outputs, gates, compile and run-times Table E.1. Comparing space complexity of methods for a cell that uses a saturated ramp waveform model (per delay/transition entry per input) Table E.2. Comparing simulation time complexity of methods for a cell that uses a saturated ramp waveform model (per delay/transition entry per input) Table E.3. Comparing characterization time complexity of methods for a cell that uses a saturated ramp waveform model (per delay/transition entry per input) Table E.4. Comparing the complexity of methods for a cell that uses a saturated ramp waveform model (per delay/transition entry per input) xii

13 LIST OF FIGURES Figure 3.1. The dataset of time domain rising and falling waveforms generated using a full factorial experimental design Figure 3.2. The waveforms corresponding to rising and falling transitions transformed to the PCA domain Figure 3.3. (a) The acceptability region in the PCA domain, together with some points, labeled as A, B, C, and D corresponding to corners of the PCA domain. (b) Time domain waveforms corresponding to the corner points A, B, C, and D in (a) Figure 3.4. Data points corresponding to the waveforms in Figure 3.1 and the acceptability region Figure 3.5. A common set of PC basis functions for all waveforms associated with all cells simplifies timing analysis by avoiding (a) conversions to and from the time domain and (b) storage of a unique set of basis functions for each cell in the library. 19 Figure 3.6. The final acceptability region, including the limit imposed the convergence requirement Figure 3.7. The coefficients of the principal components basis functions, computed after each of the iterations: (a) PC1 and (b) PC Figure 3.8. Narrow tree of inverters used to evaluate the accuracy of the PCA method.. 27 Figure 3.9. Comparison of delay for the three methods for (a) a fast rising input transition and (b) a slow rising input transition Figure Average relative error of delay for methods 1 (tabular) and 3 (PC) in comparison with Hspice using data from the outputs of each of the 21 stages xiii

14 Figure Average relative error (a) and relative error variance (b), of the delay for models in comparison with Hspice using data from the outputs of each of the 21 stages of 12 samples at nominal values of parameters Figure Average relative error (a) and relative error variance (b) of delay for models in comparison with Hspice using data from the outputs of each of the 21 stages of 12 samples at random values of parameters Figure 4.1. VTC s are affected by variations of 20% and 5% in L and Vt, respectively. 58 Figure 4.2. Output swings are affected by process parameter variations Figure 4.3. Output transitions are affected by parameter variations (Input transitions are the saturated ramps in blue) Figure 4.4. VTC s are affected adversely by increasing the variation in L to 30% while keeping the variation in Vt at 5% Figure 4.5. VTC s are affected adversely by increasing the variation in Vt to 90% while keeping the variation in L at 5% Figure 4.6. VTC s are acceptable when the variations in L and Vt are set to 20% Figure 4.7. For a logic gate, there are different levels for load modeling: (a) original RCinterconnect network, (b) Pi-model, and (c) Ceff-model Figure 4.8. Two RC-interconnect networks with different topologies are mapped to a simple RC network (i.e. Pi-model) just with different values for the parameters Figure 4.9. For an RC-interconnect network with fanout branches, the load is modeled by a Pi-model that incorporates all interconnect segments (e.g. INT 0, INT5, and INT6); the voltage at the end of each output interconnect segment is determined by passing the input voltage of the interconnect network affected by a low-pass filter (e.g. R5C5 or xiv

15 R6C6) to accommodate for the slope and delay change of the signal through each specific interconnect segment to the output (e.g. INT5 and INT6) Figure Our H'(s) is a 2-pole and 1-zero reduced order model for H(s) Figure Our test circuit is a JPEG2 Encoder clock tree Figure Choosing one of slowest critical paths of the JPEG2 Encoder clock tree Figure The abstracted inverter chain used for our timing analysis. Each RCinterconnect network is made of one or more RC-interconnect segment(s) and each interconnect segment is made of one or more cascaded RC low-pass filters Figure The interconnect network types reduced to a Pi-models by our tool (GT_MOR) Figure To test our Pi-model in the inverter chain at Spice level the RC-interconnect networks were replaced by sets of Pi-model and H'(s) Figure Timing simulation algorithm with RC-interconnect networks support Figure Comparison of delay for the three methods (Hspice, Pi-Model, and FF) at (a) nominal parameter values and (b) random parameter values Figure Cell characterization using a saturated ramp could result in a delay error. 104 Figure Comparison of delay errors, in percentage, using Pi-Model and FF-Model and its variations, i.e. FF3-Model and FF2-Model, at (a) nominal parameter values and (b) random parameter values Figure PCA waveform discretization patterns for the voltage scale Figure Increase in accuracy of the PCA waveform as a function of the number of discretization levels Figure Waveform accuracy - Mahalanobis distance (TSMC180RF) xv

16 Figure Waveform accuracy 50% point (TSMC180RF) Figure Waveform model accuracy compared using Mahalanobis distance Figure Waveform model accuracy compared using the absolute value of relative errors Figure Waveform model accuracy compared using absolute value of relative errors of 50% points Figure Waveform model accuracy compared using relative errors Figure Waveform model accuracy compared using absolute errors Figure The coefficients of the principal components basis functions for the inverter based on FreePDK45 technology, computed after each of the iterations: (a) PC1 and (b) PC Figure Waveforms for iteration 1 of the inverter based on FreePDK45 technology, (a) Time Domain and (b) PCA Domain Figure Waveform accuracy for a waveform model based on a common set of PCs Mahalanobis distance (FreePDK45 Iterations 0-4) Figure Waveform accuracy for a waveform model based on a common set of PCs Max, Average, and Max. Ave (FreePDK45 Iterations 0-4) Figure Waveform accuracy for a waveform model based on a common set of PCs 50% point (FreePDK45 Iterations 0-4) Figure Waveform accuracy for a waveform model based on a common set of PCs Mahalanobis distance (FreePDK45 Iterations 1-4) Figure Waveform accuracy for a waveform model based on a common set of PCs Max, Average, and Max. Ave (FreePDK45 Iterations 1-4) xvi

17 Figure Waveform accuracy for a waveform model based on a common set of PCs 50% point (FreePDK45 Iterations 1-4) Figure 5.1. Fault simulation framework Figure 5.2. VVCCP block diagram Figure 5.3. Model transformations Figure 5.4. Parametric and random experiments Figure 5.5. Delay profile generation experiments Figure 5.6. Delay profile generation pseudocode Figure 5.7. Fault simulation framework Figure C.1. Two-level full-factorial design Figure C.2. Two-level fractional-factorial design Figure C.3. Design based on Latin hypercube sampling Figure C.4. Circumscribed central composite design Figure C.5. Inscribed central composite design Figure C.6. Faced central composite design Figure F.1. Waveform model accuracy compared by the maximum of the absolute value of relative errors using 11 points of waveforms Figure F.2. Waveform model accuracy compared by the average of the absolute value of relative errors using 11 points of waveforms Figure F.3. Waveform model accuracy compared by the maximum of the average of absolute value of relative errors for each of 11 points of waveforms Figure F.4. Waveform model accuracy compared by the maximum of the average of the absolute value of relative errors for each of 9 points of waveforms xvii

18 Figure F.5. Waveform model accuracy compared by the maximum of the absolute value of relative errors using 9 points of waveforms Figure F.6. Waveform model accuracy compared using the maximum Mahalanobis distance (11 points) Figure F.7. Waveform model accuracy compared using the maximum Mahalanobis distance (9 points) Figure F.8. Waveform model accuracy compared using the average Mahalanobis distance (11 points) Figure F.9. Waveform model accuracy compared using the standard deviation of Mahalanobis distance (11 points) Figure F.10. Waveform model accuracy compared using the average Mahalanobis distance (9 points) Figure F.11. Waveform model accuracy compared using the standard deviation of Mahalanobis distance (9 points) Figure F.12. Waveform model accuracy compared using the maximum of the absolute value of relative errors of 50% points Figure F.13. Waveform model accuracy compared using the average of the absolute value of relative errors of 50% points Figure F.14. Waveform model accuracy compared using the average of relative errors of all points (11 points) Figure F.15. Waveform model accuracy compared using the standard deviation of relative errors of all points (11 points) xviii

19 Figure F.16. Waveform model accuracy compared using the average of relative errors of 50% points (11 points) Figure F.17. Waveform model accuracy compared using the standard deviation of relative errors of 50% points (11 points) Figure F.18. Waveform model accuracy compared using the average of absolute errors of all points (11 points) Figure F.19. Waveform model accuracy compared using the standard deviation of absolute errors of all points (11 points) Figure F.20. Waveform model accuracy compared using the average of absolute errors of 50% points (11 points) Figure F.21. Waveform model accuracy compared using the standard deviation of absolute errors of 50% points (11 points) xix

20 LIST OF SYMBOLS AND ABBREVIATIONS ASM AWE BSIM BSIM3v3 CAD C eff DC DRC DSPF Fanout FreePDK45 Asymetic Standardized Model Asymptotic Waveform Evaluation Berkeley Short Channel IGFet Model Version 3 of BSIM3 Computer Aided Design Effective Capacitance Direct Current Desing Rule Check Detailed Standard Parasitic Format The fanout of a gate (we used as an approximate of the load) NCSU 45 nm Process Technology IBM International Business Machines ITRS International Technology Roadmap for Semiconductors k n Process Transconductance (for an n-channel transitor) L LMIN LMAX LVS MNA MOR MOS Transistor Channel Length Minimum Channel Length of a Transitor Maximum Channel Length of a Transistor Layout Versus Schematic Modified Node Analysis Model Order Reduction Metal Oxide Semiconductor xx

21 MOSFET NIMO NCSU NM H NM L NMOS OSU PC PC1 PC2 PCA PCM PCMI PCS PMOS PT PTM RF ROM RSPF S/D Slope SNM MOS Field-Effect Transistor Nanoscale Integration and Modeling North Carolina State University Noise Margin High Noise Margin Low N-Channel MOS Oklahoma State Univeristy Principal Component First Principal Component Second Principal Component Principal Components Analysis PCA model transformation matrix The Inverse of PCM Principal Components Score P-Channel MOS Path Tracing Predictive Technology Model Radio Frequency Reduced Order Model Reduced Standard Parasitic Format Source/Drain The slope of a waveform transition (we used the transition time) Symetric Non-standardized Model xxi

22 SOS SPEF SSM SSTA STA SVD T TPHL TPLH Sum of Squares of Error Standard Parasitic Exchange Format Symetric Standardized Model Statistical Static Timing Analysis Static Timing Analysis Singular Value Decomposition Temperature Waveform Propagation Time for a High-to-Low Output Transition Waveform Propagation Time for a High-to-Low Output Transition TSMC Taiwan Semiconductor Manufacturing Company Limited TSMC180RF Vdd V IL V IH V OL V OH Vt VTC WMIN WMAX TSMC 180 nm Process Technology for RF Supply Voltage of a gate Maximum acceptable Low Voltage for an Input of a gate Minimum acceptable High Voltage for an Input of a gate Maximum acceptable Low Voltage at the Output of a gate Minimum acceptable High Voltage at the Output of a gate Threshold Voltage of a Transistor Voltage Transfer Curve Minimum Transistor Width Maximum Transistor Width xxii

23 SUMMARY This dissertation reports on a new methodology to characterize and simulate a standard cell library to be used for statistical static timing analysis. A compact variation-aware timing model for a standard cell in a cell library has been developed. The model incorporates variations in the input waveform and loading, process parameters, and the environment into the cell timing model. Principal component analysis (PCA) has been used to form a compact model of a set of waveforms impacted by these sources of variation. Cell characterization involves determining equations describing how waveforms are transformed by a cell as a function of the input waveforms, process parameters, and the environment. Different versions of factorial designs and Latin hypercube sampling have been explored to model cells, and their complexity and accuracy have been compared. The models have been evaluated by calculating the delay of paths. The results demonstrate improved accuracy in comparison with table-based static timing analysis at comparable computational cost. Our methodology has been expanded to adapt to interconnect dominant circuits by including a resistive-capacitive load model. The results show the feasibility of using the new load model in our methodology. We have explored comprehensive accuracy improvement methods to tune the methodology for the best possible results. The following is a summary of the main contributions of this work to the statistical static timing analysis: (a) accurate waveform modeling for standard cells using statistical waveform models based on principal components; xxiii

24 (b) compact performance modeling of standard cells using experimental design statistical techniques; and (c) variation-aware performance modeling of standard cells considering the effect of variation parameters on performance, where variation parameters include loading, waveform shape, process parameters (gate length and threshold voltage of NMOS and PMOS transistors), and environmental parameters (supply voltage and temperature); and (f) extending our methodology to support resistive-capacitive loads to be applicable to interconnect dominant circuits; and (e) classifying the sources of error for our variational waveform model and cell models and introducing of the related accuracy improvement methods; and (f) introducing our fast block-based variation-aware statistical dynamic timing analysis framework and showing that (i) using compiler-compiler techniques, we can generate our timing models, test benches, and data analysis for each circuit, which are compiled to machine-code to reduce the overhead of dynamic timing simulation, and (ii) using the simulation engine, we can perform statistical timing analysis to measure the performance distribution of a circuit using a high-level model for gate delay changes, which can be linked to their parameter variation. xxiv

25 CHAPTER I INTRODUCTION Dealing with variability of semiconductor process and environmental parameters has become a major challenge for designing modern high-performance complex system-onchip integrated circuits. Hence, it is essential to the electronic design and automation industry that the next generation of computer-aided-design (CAD) tools be variation aware to maintain a high manufacturing yield for profitability. Variability is a significant issue because controlling process and environmental variations, such as temperature and supply voltage, has become very difficult or even impossible for modern nanometer technologies. Therefore, innovative tools, methodologies, and algorithms are required to implement the concept of variation awareness in CAD tools for at least the next two decades. The large number of state-of-the-art papers that address various aspects of statistical performance analysis demonstrates the significance of this research area for both industry and academia. More than 200 publications were cross-referenced in two recent literature survey journal publications on statistical static timing analysis authored by Dr. David Blaauw et al. [1] and Dr. Critiano Forzan et al. [2]. Considerable research effort is still required to advance the concept of statistical static performance analysis to maturity. Toward enabling the variation-aware-ability of the current generation of performance analysis tools, specifically the timing analysis tools that use standard cells, we propose a methodology based on design of experiments, analysis of variance, and principal component analysis to create a realistic variational waveform model to deal with 1

26 variation systematically and build compact variation-aware timing models for a standard cell library. Moreover, we present the necessary timing analysis framework capable of using variation-aware cell models for statistical static timing analysis. Traditional timing analysis tools use tabular timing models that can capture effectively the nonlinear interactions of waveform transitions and loading to perform static timing analysis while waveform transitions are abstracted by slopes. Intuitively, it is possible to build a tabular timing model to take into account variability by adding a dimension per variable; however, the characterization time and memory requirement of a cell grow exponentially with increasing the dimensionality of a table. We use experimental design techniques to characterize cells with a minimum number of simulations to cover the whole multi-dimensional variation space and take advantage of the analysis of variance to make our compact variation-aware models. Historically, the accuracy needed to handle waveform shape models used in CAD tools for digital systems has caused the evolution of waveform shape models from a vertical transition to a slope. The continuous trend of downscaling transistors has required downscaling voltages to make the power tolerable for transistors; therefore, the transition distance of a waveform from one logic level to another has become smaller. This shorter voltage swing along with higher transistor switching frequency has made the shape of waveforms more important and, consequently, a transition can no longer be accurately modeled by a slope. Chapter II, presents a literature review on waveform modeling and its connection to statistical static timing analysis, static timing analysis, and variation-aware cell modeling. Moreover, a statistical waveform model based on principal component analysis has been proposed in the literature to model a variational 2

27 waveform. In Chapter III, we propose a waveform model based on such a model; we formalize and generalize the methodology to make the models practical to accurately capture a realistic waveform shape. We also propose a methodology to construct compact variation-aware standard cell models for timing analysis using our variational waveform model. The tabular cell models that are still in use in industry are not variation-aware. A multi-dimensional tabular model that accounts for variability is not practical despite being more accurate theoretically; therefore, we allow for the trade-off between accuracy and practicality by reducing the memory requirement and characterization time of the cell. To generalize the methodology, we propose some extensions to it in Chapter IV and present our guidelines and techniques on how to handle: (1) finding and covering the largest possible range of parameter variations, (2) adopting a resistive-capacitive load model to extend our methodology to interconnect dominate circuits, and (3) improving the accuracy of our statistical models by classifying our possible accuracy improvement methods. We also show that our methodology will give reasonable approximations for timing analysis even if the defacto industry standard saturated ramp waveform, instead of our variational waveform model, is used in constructing our compact variation-aware timing models. In Chapter V, we present our proposed high-level fast variation-aware dynamic statistical static engine in contrast with our static timing analysis engines that we built to verify our methodology in our compact variation-aware models. In general, dynamic timing analysis is more time intensive than static timing analysis; however, we improve the simulation times by using compiled-code techniques to make it possible to build our 3

28 fast variation-aware dynamic statistical timing engine. Our dynamic simulation engine does suffer from neither the pessimistism of block-based static timing analysis nor the possible missing of the critical paths of path-oriented static timing analysis. Block-based static timing analysis engines use the maximum input-to-output gate delays without considering logic values of the gate input and output ports that affect the delay; therefore, they can be pessimistic. Path-oriented static timing analysis engines require a preselected set of critical paths while parameter variations can make a non-critical path critical or vice versa; consequently, they can fail in the including of all the necessary critical paths in the analysis. These shortcomings are not present in our dynamic statistical timing analysis engine since the paths are selected automatically during the simulation depending on the logic values of the gate input and output ports. In our dynamic statistical static timing analysis engine, the delay variation of a gate is a linear function of a few delay increase parameters sampled from a pre-defined set of distributions where as in our compact variation-aware timing models, the delay variation is an arbitrary function of the parameter variation distributions as well as the gate output loading and the input waveform shape parameters. Finally, in Chapter VI, we present several paths to continue this research or start some related research in the short term as well as the long term. 4

29 CHAPTER II BACKGROUND Circuit timing analysis is needed to ascertain if a design meets timing requirements before manufacturing. The standard approach to estimate circuit timing is through static timing analysis (STA). STA involves converting a circuit into a timing graph, where each edge represents the delay of a gate between its inputs and outputs. STA then performs a graph traversal to find the longest path, based on a project planning technique, called the Critical Path Method [3]. The delay through gates is a function of the slope of the input signals. Hence, the traditional approach to accounting for the input slope is to characterize cell delay through tables, which pre-compute delay and output slew as a function of input slew for each gate in a standard cell library. To account for slew, STA requires an additional step, a preliminary backward traversal through the timing graph to determine the relationship between slew and delay to the output for each node in the network [4]. Circuit timing is increasingly impacted by variation due to the manufacturing process and the operating environment. The standard approach to account for variation is through worst-case analysis [5]. Worst-case analysis assumes that parameters are constant within a chip, but vary between chips. Designers ensure that their design satisfies specifications for all process corners by simulating the circuit with a small set of corner models that represent process extremes. The corner models consist of tables relating delay and output slew to input slew and loading for these process extremes. 5

30 Circuit timing, however, has become increasingly susceptible to within-die variation due to both the manufacturing process and the operating environment. Hence, it has become imperative to take into account these variations in device and interconnect characteristics during design. Worst-case design does not take into account within-die variation. To account for within-die variation, we need to perform statistical static timing analysis (SSTA) at corners that define die-to-die variation [1],[6]-[13]. SSTA can determine the variation in critical path delays as a function of random and systematic variation within paths. SSTA resembles STA, except that gates are characterized by delay distributions. The gate delay and arrival time distributions result in distributions of output delays and correlations among these delays. Graph traversal involves applying the statistical sum to arrival time distributions and the delay distribution for each gate, and the statistical maximum operation to the resulting gate delay distributions. Clearly, for SSTA we need compact models of standard cells that are accurate over parameter and environmental variations, not just at process extremes, as in worst-case design. Our proposed models can be used to generate the delay distribution functions, which can account for spatial correlations, as needed, using methods as in [1],[8]-[13]. Our models can also be used directly in Monte-Carlo-based SSTA, which involves path enumeration, Monte Carlo analysis of critical paths, and the statistical maximum operation on the resulting path delays, as described in[1],[14]-[18]. We must take into account two delay components for timing analysis: gate delay and interconnect delay [19]-[21]; however, we narrow the scope of our research just to deal with the complexity of modeling variation-aware standard cells, while modeling 6

31 variation-aware interconnects has its own significance as discussed in [22]-[24]. Hence, the goal of this work is to develop a methodology to construct compact variation-aware timing models for standard cells in a cell library that are accurate over process and environmental variations. The cell models utilize our compact waveform models. We show that these compact waveform and cell models, when used for static timing analysis, are almost as accurate as the models based on the well-known tabular method [25] and comparable in terms of computational cost. The tabular cell models, still in use in industry, are neither variation-aware nor compact, and a multi-dimensional tabular model that accounts for variability is not practical despite being more accurate theoretically; therefore, we allow for the trade-off between accuracy and practicality by reducing the memory requirement and characterization time of a cell, which are of exponential order for the tabular method. The compact waveform models are constructed via principal component analysis (PCA) [26] of waveforms, where the waveforms are described by principal component scores (PCSs), which can reconstruct the waveforms. Moreover, since the principal component basis functions are shared among all waveforms, cell library characterization requires that we only store the equations that describe the transformations of the principal component scores as the waveform passes through the cell. The equations also describe changes in cell performance as a function of variations in the process and operating environment. This method differs from traditional static timing analysis (a) by working with waveforms with realistic shapes, 7

32 (b) by storing the waveform transformation through a cell as an equation rather than a table, and (c) by including equations that describe any changes in cell performance as a function of variations in the process and operating environment. This is not the first attempt to accurately model waveforms for timing analysis. Recent work has considered accurate modeling of waveform propagation through standard cells. In [27],[28], it is shown that realistic waveforms do not resemble the idealized ramp, and in [29] it is shown that realistic waveform modeling results in more accurate timing analysis. Examples of waveform modeling include [30], where a Weibull shape parameter is added to waveform characterization to account for the differences between real waveforms and their approximation by a ramp. Other work has aimed to model realistic waveforms with a set of basis functions [31]-[34]. The basis functions have been selected in a variety of ways, including an error minimization heuristic involving shifting and scaling of waveforms [31],[32], PCA [33], and singular value decomposition (SVD) [34]. All prior work has shown that a few basis functions can be used to approximate realistic waveforms. Like [32],[35], the proposed work considers the impact of process and environmental variations on waveforms. In the proposed work, the basis functions are derived by PCA. Hence, the proposed approach extends prior work in [33],[34] by including in PCA waveform model construction for large variations in parameters related to the process and environment. This work formalizes, generalizes, and specifies restrictions for the approach and proposes methods to make the waveform models practical. 8

33 The cell models differ from prior work on modeling cells as equations [11],[12],[36]- [39], since the cell models operate on parameters that describe waveforms, not just process parameters, waveform slew, and environmental parameters. The parameters are not required to be independent, and the compact model consists of multivariate polynomials with a minimum number of terms, which are selected based on analysis of variance and accuracy. Since cells operate on waveforms in the PCA domain, several new problems arise. First, we need to determine the set of PCSs that correspond to realistic waveforms, i.e., PCSs that can be transformed back to the time domain. Second, we need common principal component basis functions for both the inputs and outputs of cells. This is because PCA is a data-driven methodology. Hence, each set of input waveforms and each standard cell can generate a unique set of principal component basis functions describing the output waveforms. Therefore, some additional steps are needed to generate a common set of basis functions for all inputs and cells. Additionally, for our model involving PCA waveform modeling and cell characterization with equations, we show that, unlike the tabular static timing analysis method, where memory usage increases exponentially as a function of accuracy in the discretization of parameters that characterize the input and output waveforms (slope and fanout), our proposed method is typically quadratic in memory usage as a function of the parameters describing the waveforms, process, and environmental variations. Finally, we apply the PCA model to static timing analysis and examine the accuracy of delay calculations for long chains of gates. 9

34 Chapter III explains our proposed waveform model and cell model, and it explores several other options to construct both types of models; the chapter ends with the complexity analysis of the cell model considering characterization time, simulation time, and memory requirement. Chapter IV focuses on enhancing our methodology by constructing a cell model for very large variations, extending the methodology to a cell with a resistive-capacitive load, and investigating accuracy improvement methods; we conclude this chapter with possible paths for continuing this research. Chapter V elaborates our high-level variation-aware statistical dynamic timing engine that uses complied-code techniques to reduce the overhead of dynamic timing analysis. Chapter IV describes our future research directions. Although we have included most important equations for PCA that we used in this chapter, Appendix A can be referenced for more information. 10

35 CHAPTER III MODELING AND ANALYSIS OF COMPACT VARIATION- AWARE STANDARD CELLS This chapter is organized as follows. Section 3.1 describes the experimental platform and the parameters modeling variability for cells and waveforms. Sections 3.2 and 3.3 discuss waveform model construction and accuracy analysis, respectively. Section 3.4 describes cell model construction and evaluates its accuracy for a path delay in comparison with Hspice [40] and tabular static timing analysis. Section 3.5 explores several alternative experimental designs to construct a cell model to study the trade-off between accuracy and experimental design sampling strategies because each strategy affects the memory requirement, characterization time, and simulation time of the cell. Section 3.6 elaborates on memory usage and the computational complexity of all the presented cell models and compares their order of complexity. Section 3.7 concludes with a summary of the research results Experimental Platform and Model of Variation Traditionally, input waveforms have been represented by delay-slope pairs. In this work, the slope is replaced by a set of PCSs. The number of PCSs determines the accuracy of the model. In one extreme, if all the scores are used, the model can reconstruct the exact waveform. An inverter, designed and laid out with TSMC 180 nm technology, was used to develop the methodology. This technology was the most advanced one available for our 11

36 CAD tools. After design rule check (DRC) and layout versus schematic (LVS), parasitics were included in the model through parasitic extraction [41]. Advanced features of Hspice automated the large number of simulation runs, which included generating input waveforms based on a model and capturing the data points of the output waveforms at predetermined relative voltage intervals. The dataset was imported and manipulated using Matlab [42] to construct the 2-level full-factorial model [43] for each output parameter. The significant effects were determined to form the compact models. Timing characteristics of standard cells are primarily a function of loading capacitance (fanout) [44], the input waveform [44], variations of device parameters, i.e., the channel lengths and the threshold voltages of transistors [45]-[47], and the environment, i.e., the power supply voltage and temperature [47]-[48]. The ranges of parameters in the model are listed in Table 3.1. These parameters include the fanout, parameters that describe the input waveform (either slope or principal components, [PC1,PC2] or [L,Θ], described in Section 3.2), the gate length and threshold voltage of the NMOS and PMOS transistors, temperature, and supply voltage. The ranges for process parameters were chosen to be small relative to realistic die-todie process parameter variations, which are on the order of ± 30%. This is because dieto-die variation is effectively handled with corner models, and the focus of this work is to supplement these models with variation-aware compact models at each corner that can account for within-die variation, whose range is smaller than die-to-die variation. A set of models describes the stage delay and output waveform shape, characterized by its principal components ([PC1,PC2] or [L,Θ]), as a function of all parameters in Table 3.1. The models were designed to be valid over a wide range of variations by using 12

37 a full factorial experimental design covering all extreme corners of the experimental space. Table 3.1. Variation model parameters. Variable Variation Variable Variation Lp 0% to 5% Ln 0% to 5% Vtp -5% to 5% Vtn -5% to 5% T 0 C to 70 C Slope Fanout 1 to 64 Vdd -10% to 10% [PC1, PC2] [L, Θ] 0.4 to 8.0 ns (for slope) Dataset Range (otherwise) 3.2. Construction of the Waveform Model To develop the waveform models, a dataset of 256 falling and 256 rising waveforms was generated by running a 2-level full factorial experiment varying the parameters in Table 3.1, i.e., characterizing fanout, parameters that describe the input waveform (either slope during the first iteration or principal components for other iterations, [PC1,PC2] or [L,Θ]), the gate length and threshold voltage of the NMOS and PMOS transistors, temperature, and supply voltage. The datasets for rising and falling waveforms were merged by converting the fall times to rise times by subtracting the fall times voltages from the maximum voltage. A set of waveforms is shown in Figure 3.1. Figure 3.1. The dataset of time domain rising and falling waveforms generated using a full factorial experimental design. 13

38 The resulting 512 timing waveforms were discretized by partitioning the voltage scale into equal intervals to form 19 voltage and time point pairs. This discretization was chosen to have enough points to accurately capture the waveform shapes and match the minor scale of our supply voltage. A comprehensive analysis of the impact of discretization on accuracy is presented in Chapter IV. Analysis of the inverter waveforms revealed that two PCSs cover 99.8% of variation for both rising and falling transitions. Hence, only two PCSs (PC1 and PC2) serve as weights for the two waveforms, whose linear combination is used to reconstruct the time domain transition waveform. Moreover, each transition maps to a single point in the twodimensional PCA domain (shown in Figure 3.2). The points in the PCA domain that correspond to the waveforms in Figure 3.1 are shown in Figure 2. The figures indicate how the use of a full-factorial experimental design to explore a wide range of parametric variations can create clusters of waveforms in the time domain in Figure 3.1 that are mapped into clusters of points in the PCA domain in Figure 3.2. Figure 3.2. The waveforms corresponding to rising and falling transitions transformed to the PCA domain. 14

39 Mapping between the time domain to the PCA domain and vice versa can be represented by a pair of transformations. If the data are not standardized, the transformation equations are as follows: PCS = PCM * (T-U) T = U +PCMI * PCS (3.1) (3.2) where PCS is a 19-element vector of scores, T is a 19-element vector of time points describing the waveform, U is a 19-element vector that is the average of all T s in the dataset, PCM is the PCA model transformation matrix from the time domain to the PCA domain, and PCMI is the inverse of PCM. For a 19-element vector, PCM and PCMI are 19x19-dimensional matrices. PCM is found by computing the eigenvectors of the 19x19 covariance matrix from the dataset. The rows of PCM are the normalized eigenvectors of this covariance matrix. Based on (3.1), for a 19-element vector, there are 19 mapping functions (3.3); each maps the 19 time points describing a waveform to a point in the 19-dimensional PCA space. { } pc1= pcm(1,1)*(t1-u1) + pcm(1,2)*(t2-u2) + pc2 = pcm(2,1)*(t1-u1) + pcm(2,2)*(t2-u2) + pc19 = pcm(19,1)*(t1-u1) + pcm(19,2)*(t2-u2) + (3.3) The elements of the PCM matrix are coefficients of the linear equations for the transformation. 15

40 If the data are standardized, equations (3.1) and (3.2) are replaced by equations (3.4) and (3.5): PCS = PCM * (T-U) * D -1 T = U +PCMI * PCS * D (3.4) (3.5) where D is a diagonal matrix of standard deviations associated with each of the 19 elements of the dataset. The significant PCSs are found through determining the eigenvalues of the covariance matrix. Small eigenvalues correspond to insignificant PCSs. Dimensional reduction is achieved by setting the coefficients of PCM that correspond to the eigenvectors associated with insignificant PCSs to zero. It is worth mentioning that the sum of the eigenvalues corresponding to the eigenvectors selected for the model determines the variance coverage. The inverse of the PCM matrix, PCMI, is used to reconstruct waveforms. PCMI is the transpose of PCM. For non-standardized data, the significant PCSs weight the waveforms stored in PCMI to generate time domain transition waveforms as follows. { } t1 = u1 + pcmi(1,1)*pc1 + pcmi(1,2)*pc2 + t2 = u2 + pcmi(2,1)*pc1 + pcmi(2,2)*pc2 + t19 = u19 + pcmi(19,1)*pc1 + pcmi(19,2)*pc2 + (3.6) All of the points in the PCA domain don t necessarily map to valid transition waveforms. Valid transitions require that the waveform does not move backwards in time. Accordingly, it is required that t19 > t18 > >t1. (3.7) 16

41 This creates an acceptability region restriction on the PCA space, which is obtained by substituting equation (3.2) or (3.5) into (3.7) to create 18 linear relationships, as follows for the case with non-standardized data. { u1 + pcmi(1,1)*pc1 + pcmi(1,2)*pc2 } < u2 + pcmi(2,1)*pc1 + pcmi(2,2)*pc2 u2 + pcmi(2,1)*pc1 + pcmi(2,2)*pc2 (3.8) < u3 + pcmi(3,1)*pc1 + pcmi(3,2)*pc2 The acceptability region is also restricted by the maximum and minimum of the PCSs from the dataset. Linear programming is used to find the acceptability region. The resulting acceptability region is shown in Figure 3.3(a). (a) (b) Figure 3.3. (a) The acceptability region in the PCA domain, together with some points, labeled as A, B, C, and D corresponding to corners of the PCA domain. (b) Time domain waveforms corresponding to the corner points A, B, C, and D in (a). 17

42 Figure 3.3(a) also contains some points, marked by A, B, C, and D in the PCA domain. They correspond to the waveforms in the time domain in Figure 3.3(b). Waveforms A and B in Figure 3.3(b) are not valid waveforms because they contain segments where time moves backward. They correspond to points A and B in Figure 3.3(a), which are outside the acceptability region. Waveforms C and D in Figure 3.3(b) are monotonic and valid. They are inside the acceptability region illustrated in Figure 3.3(a). Some of the original data points in Figure 3.2 lie outside the acceptability region, as can be seen in Figure 3.4. This is because of dimensional reduction. These waveforms can be reconstructed by augmenting the original dataset with waveforms containing negative time points by reflecting the transitions across the voltage axis. This process is similar to what is done in Fourier analysis, where negative frequencies are used to help construct a model. The addition of these waveforms with negative time points for model construction widens the acceptability region. It does not invalidate the model because the PCA model uses only the positive time points. Additionally, the PCA model generated from the resulting dataset has the property that U=0. As a result, the line segments bounding the acceptability region determined by equation (3.7) always pass through the origin. Initial analysis modeled the input waveforms with a slope. However, it is desirable to determine a set of universal PCA basis functions for both input and output transitions to avoid extra mapping steps. This is because, if we do not have a common set of basis functions, we would need to store a separate set of PCs basis functions for all outputs of all cells in the cell library, and consequently we would increase the memory requirement 18

43 of the cell models. Also, timing simulation would require numerous transformations between basis functions, as illustrated in Figure 3.5, and hence we would increase the simulation time of the cell models during timing analysis. Figure 3.4. Data points corresponding to the waveforms in Figure 3.1 and the acceptability region. To find a common set of principal component basis functions for all waveforms associated with all cells, the corners of the PCA space that define the extreme waveforms must be determined for 2-level factorial analysis. But, as can be seen from Figure 3.3(b), two of the PCSs that correspond to corners of the PCA space lie outside the acceptability region and correspond to invalid waveforms. PCA Domain Time Domain PCA Domain Input Waveform Parameter Cell Output Waveform Parameter Conversion? Input Waveform Parameter Cell Output Waveform Parameter Figure 3.5. A common set of PC basis functions for all waveforms associated with all cells simplifies timing analysis by avoiding (a) conversions to and from the time domain and (b) storage of a unique set of basis functions for each cell in the library. 19

44 This problem was tackled by using a polar coordinate system, instead of a Cartesian coordinate system, for defining the corners of the PCA space for full factorial experimental design. Note that the shape of the acceptability region in Figure 3.4 is triangular rather than square. To map PC1 and PC2 to polar coordinates, one finds the magnitude (L) and angle (Θ) of a vector from the origin, as follows: { L = + θ = 2 2 PC1 PC2 arctan( PC2 ) PC 1 } (3.9) This coordinate conversion assumed only two significant PCSs. If more than two PCSs are significant, pairs of PCSs can be converted to polar coordinates with the same transformation. The acceptability region in the polar domain is then determined to guarantee valid waveforms, and a rectangle of maximum size is fit into the acceptability region to define the corners for full factorial experimental design, denoted as L (min), L (max), θ (min), and θ (max). A common set of principal components for the input and output waveforms of a cell is generated by running the following iterations: (a) find the principal components of the output waveforms; (b) determine the acceptability region in the PCA space in terms of polar coordinates; (c) fit a rectangle into the acceptability region to find the corners for full factorial experimental design; (d) generate the waveforms corresponding to these corners and apply these waveforms as inputs to the cell; (e) simulate the cell to determine the corresponding output waveforms; 20

45 (f) find the principal components of the output waveforms; and (g) go to (a) if the coefficients of significant principal components basis functions in equation (3.3), e.g., pcm(1,*) and pcm(2,*), have not converged yet. If the coefficients of input waveform principal components basis functions match the coefficients of output waveform principal components basis functions, then the principal components have converged to a set of waveforms appropriate for both the input and output of the cell. Convergence is only possible by restricting the time window for valid PCs because a slow-rising input for the cell will create an output with a slower transition. In our example, we have restricted the time window to be from 0.4 to 8 ns. This window size impacts the acceptability region. A larger window size creates a larger acceptability region but reduces model accuracy and reduces the speed of convergence. This time restriction imposes an additional limit on the acceptability region, illustrated by the diagonal line in Figure 3.6. With this limit, convergence was achieved in two iterations. Figure 3.6. The final acceptability region, including the limit imposed the convergence requirement. 21

46 The resulting principal components are shown in Figure 3.7. Principal component basis functions related to the original 512 waveforms are labeled as GN ( th 0 iteration), where the input was a ramp. The following iterations are designated as IT(i), where i is the iteration number. For these iterations, the input had a realistic shape and was defined by the extremes of the PCA space in Figure 3.6. Principal component basis functions from the two iterations using realistic input waveform shapes were almost indistinguishable, and hence the model has converged Comparison of PCA Methods for Waveform Modeling PCA waveform models can be constructed in a variety of ways, including (a) the symmetric non-standardized model (SNM), obtained from a dataset formed by augmenting the original dataset with waveforms with negative time points, (b) the symmetric standardized model (SSM), obtained like the SNM method, but with a standardized dataset (equations (3.4) and (3.5)), and (c) the asymmetric standardized model (ASM), obtained with the standardized dataset, but without augmenting the dataset with waveforms containing negative time points. Note that the asymmetric nonstandardized model was not considered because a large number of the original data points are outside the acceptability region. 22

47 Comparing PC1s (Trise) Coefficients of PC1s GN IT(1) IT(2) Data Points (a) Comparing PC2s (Trise) Coefficients of PC2s GN IT(1) IT(2) Data Points (b) Figure 3.7. The coefficients of the principal components basis functions, computed after each of the iterations: (a) PC1 and (b) PC2. 23

48 Several criteria have been suggested to select the appropriate number of PCs for a model [26]. They include the following methods: the Broken Stick, the Average Root, Variability Explained by PCs, the Scree Plot, the Residual Trace, the Velicer Partial Correlation Function, the Index of Correlation Matrix, Imbedded Error, and the Indicator Function. These criteria recommend very different numbers of principal component basis functions, ranging from one to 17. To keep our models compact, we have selected two principal component basis functions. Two principal component basis functions cover 99.8% of variation for both rising and falling transitions for all models. The accuracy of the standard cell model is dependent on the accuracy of (a) the mapping of a waveform from the time domain to the PCA domain, (b) the mapping of input PCSs to output PCSs through a cell, and (c) the mapping of output PCSs back to the time domain. We analyzed the PCA modeling accuracy by determining the residuals at each voltage level for all 512 transition waveforms used to construct the model. Residuals are expressed as time domain errors for a fixed voltage level. Table 3.2 summarizes the results for all of the models. The table shows the maximum, average, and maximum of the averages of the residuals for each voltage point. It also includes the maximum of the average of the residuals for the 15 middle voltage points, which correspond to the 10% to 90% range of the transition, which is more critical for accurate timing analysis [49]. It was found that larger errors are associated with longer transitions and the tails of the waveforms. Specifically, it can be seen that residuals associated with the center of the waveform are close to zero. The symmetric models appear to be the more accurate. 24

49 Table 3.2. Residuals of PCA waveform models. SNM SSM ASM Max. (19-pt) Average (19-pt) Max. Ave. (19-pt) Max. Ave. (15-pt) The Q-Statistic [26] and 2 T -Statistic [26] were used to analyze the adequacy of the models by determining the number of outliers in the original dataset. Outliers correspond to waveforms in the original dataset that are not accurately modeled by PCA. Table 3.3 shows the fraction of outliers considering each of the screening statistics. The table indicates that the number of outliers for all three models is very similar. Table 3.3. Fraction of outliers in PCA waveform models (%). SNM SSM ASM Significant Level T -Statistic Q-Statistic Both Construction of the Cell Model and Timing Analysis We have applied the SNM waveform model with two significant principal component basis functions to our dataset to find a relationship between the input parameters (listed in Table 3.1) and output parameters (L, Θ, and Stage_Delay) for the inverter cell. L and Θ characterize the shape of the output waveform. Stage_Delay is the delay from input to output measured at 50% of the supply voltage [49]. The relationship between the input parameters and output parameters is computed using Yate s algorithm [43] to determine 25

50 all 511 effects (linear coefficients and interactions) and the average. Because of the lack of experimental error, significant effects were found using normal probability plots [43]. The resulting model indicates how the shape of the output waveform and delay vary as a function of the shape of the input waveform, process parameters, and variations in the operating environment (temperature and supply voltage). The output waveform is characterized by L, Θ, and Stage_Delay. L was found to be a function of the input waveform L and Θ, fanout, supply voltage, temperature, n-channel threshold voltage, and p-channel length. Θ was found to be a function of the input waveform L and Θ and fanout. Stage_Delay was found to be a function of the input waveform L and Θ, fanout, supply voltage, temperature, n- and p-channel threshold voltages, and p-channel length. In evaluating the accuracy of the model, we do not consider the presence of variation in process parameters, supply voltage, and temperature, which is a function of the number of terms on the model equations. Instead, we just consider accuracy in the presence of variations in the shape of the input waveform and fanout. This enables us to make a direct comparison with tabular static timing analysis. The accuracy of the PCA model is evaluated by estimating the delay of a narrow tree of inverters, with a depth of 20 and fanout ranging from two to five, as shown in Figure 3.8. This provides a way to determine the accuracy of the model for timing analysis of paths in large circuits, with the only simplification being that the same cell is used for all stages. The number of fanouts at each stage and the slope of the input to the first gate were varied. The total delay from the input to the output of each stage was determined using the following three methods. 26

51 Vin Vout t t Figure 3.8. Narrow tree of inverters used to evaluate the accuracy of the PCA method. Method 1: Tabular (Slope, Fanout) propagation. The inverter timing is characterized for combinations of (Slope, Fanout) in tables. Delay is estimated through linear and bilinear interpolation from the tables. This method requires the following functions, where i is the index for the stage. { Slope(i+1) = Slope_Function(Slope(i), Fanout(i)) Stage_Delay(i+1) = Delay_Function(Slope(i), Fanout(i)) Total_Delay(i+1) = Total_Delay(i) + Stage_Delay(i+1) } (3.10) Our implementation included 595 elements in the table: 35 slopes and 17 fanouts. We need four tables for the inverter, which are two sets of output slope and delay for each rising and falling transitions of the output; therefore, we need storage for 2380 elements. Method 2: Simulation using Hspice. This method solves numerical differential equations to find delay. Method 3: PCA for delay propagation. Delay is calculated as follows, where i is the index for the stage. 27

52 { } L(i+1) = Length_Function(L(i), Θ(i), Fanout(i)) Θ (i+1) = Angle_Function(L(i), Θ(i), Fanout(i)) Stage_Delay(i+1) = Delay_Function(L(i),Θ(i),Fanout(i))) Total_Delay(i+1) = Total_Delay(i) + Stage_Delay(i+1) (3.11) Our implementation requires the storage of 281 coefficients for all six equations, which are two sets of three equations considering both rising and falling outputs; therefore, we need storage for 281 elements or even fewer (e.g., 102 elements just by considering just the 2 nd order terms of the full-factorial design) based on Table 3.5 in Section 3.5. The input to the first stage for all methods was a ramp. Therefore, the input to the first gate must be mapped to the PCA domain for Method 3; this introduces some error at the beginning of the chain. The delays obtained using the three methods are compared in Figure 3.9. For a fast rising input transition with low fanout, as shown in Figure 3.9 (a), we see that Method 3 tracks Hspice and Method 1 at the early stages of the chain, but it diverges from Hspice at the later stages even faster than Method 1. For a slow rising input transition with high fanout, as shown in Figure 3.9 (b), we see that both Method 3 and Method 1 have some error at the early stages, but Method 3 makes up for the error at later stages. Delays from Hspice are used as the basis of comparison to obtain errors for each method. The average relative errors are compared in Figure 3.10, using data from the outputs of each of the stages, i.e., from stage 2 to 21 (20 points), which shows that Method 3 (PC) is more accurate than Method 1 (tabular) in most cases. 28

53 The simulation of the circuit were performed on a 4-CPU Ultra Sparc II 400MHz server with a Sun Solaris operating system to compare the three methods. The simulation time for Methods 1 and 3 was 0.2 s, while the simulation time for Method 2 was 21.8 s. We constructed our cell model to be both compact and variation-aware although we compared the accuracy with a variation-unaware cell model. We conclude this section by listing the sources of error and explaining why an alternative variation-aware tabular cell model is not practical even if it is more accurate. The errors could originate from (a) modeling error for the input waveform, (b) modeling error for the cell model, and (c) accumulation of errors through the stages. The waveform input to the first gate must be mapped to the PCA domain for Method 3; this introduces some error at the beginning of the chain. Moreover, we construct our compact cell models using just the corner values of the parameters and not the values at the center points, while we construct the tabular models using a range of values for parameters including the center points. Therefore, we reduce cell characterization time dramatically by introducing some modeling error. We discuss some accuracy improvement methods in Chapter IV. Constructing a variation-aware tabular cell model that even uses slope instead of PCs requires storage for 2380 * 10 6 elements, assuming 10 levels for the six remaining parameters in Table 3.1, and hence the resulting model needs one million times more memory than the original tabular method. Also, the cell characterization time increases at the same rate. As a result, this makes the variation-aware tabular model impractical. Section 3.7 discusses this matter further and shows that even a tabular model based on a sensitivity analysis is not as scalable as our model. 29

54 Comparison of Total Delay (fast rising transition, fanout =2) Total Delay (ns) Hspice Tabular PC Stage Node (a) Comparison of Total Delay (slow rising transition,fanout=5) Total Delay (ns) Hspice Tabluar PC Stage Node (b) Figure 3.9. Comparison of delay for the three methods for (a) a fast rising input transition and (b) a slow rising input transition. 30

55 Average Relative Error Average Relative Error (0.32,2) (1.76,2) Tabular (3.2,2) (0.32,3) (1.76,3) (3.2,3) (0.32,4) (1.76,4) (3.2,4) PC (0.32,5) (1.76,5) (3.2,5) -0.5 (Slope,fanout) Figure Average relative error of delay for methods 1 (tabular) and 3 (PC) in comparison with Hspice using data from the outputs of each of the 21 stages Comparison of Experimental Design Methods for Cell Modeling A cell model indicates how the shape of the output waveform and delay vary as a function of the shape of the input waveform, process parameters, and variations in the operating environment (temperature and supply voltage). We use the waveform model with two significant principal component basis functions to find a relationship between the input parameters (listed in Table 3.1) and output parameters ([L(Rise),Θ(Rise)], Tplh, [L(Fall),Θ(Fall)], and Tphl). [L(Rise),Θ(Rise)] or [L(Fall),Θ(Fall)] characterizes the shape of the output waveform for rising and falling output transitions, respectively. Tphl and Tplh are the high-to-low and low-to-high propagation time, respectively. The propagation times are the delays from input to output measured at 50% of the supply voltage [49]. 31

56 The data to generate the relationship between the input and output parameters was found using several experimental design techniques: 2-level full factorial [43], a 2-level fractional factorial [43], and Latin hypercube sampling [50],[51]. The factorial and fractional-factorial sampling plans build models based on data points at the corners of the sampling domain (all combinations of the maximum and minimum values of all variables in Table 3.1). Since we have nine variables in Table 3.1, full-factorial experimental designs require 9 2 samples. The number of samples for fractional-factorial experimental designs is a function of the desired model complexity. We have chosen 7 2 samples in this experiment. Latin hypercube sampling generates data based on quasi-uniform sampling of the sampling domain. The number of samples used for Latin hypercube designs was set to match the number of samples used for the full factorial and fractionalfactorial experimental designs. Table 3.4 shows all the designs used for cell model construction along with their number of sampling points, term selection criteria, and abbreviations. The relationship between the input parameters and output parameters for a cell was determined using a variety of methods. For full factorial experimental designs, the relationship was computed using Yate s algorithm [43] to determine all 511 effects (linear coefficients and interactions) and the average. Because of the lack of experimental error, significant effects were chosen automatically using analysis of variance [52]. Models FF2 and FF3 included only significant effects, up to 2 nd and 3 rd order terms, respectively. Similarly, model FRF included only significant effects, up to 3 rd order terms. The models for Latin hypercube experimental designs were found by polynomial regression [53], with at most quadratic terms. 32

57 Table 3.5 shows the number of terms and the number of operations for each of the output parameters ([L(Rise),Θ(Rise)], Tphl, [L(Fall),Θ(Fall)], and Tplh), where operations consist of addition and multiplication. FF2 is the most compact model, and FRF is the next, while both are almost as accurate as FF. The more compact the cell model, the lower its memory requirement and the simulation time. Table 3.4. Designs used for cell model construction. Design Points Model Terms Abbr. (2 9 ) All factors FF Full Factorial (2 9 ) Significant up to 3 factors FF3 (2 9 ) Significant up to 2 factors FF2 Fractional Factorial (1/4*2 9 ) Significant up to 3 factors FRF Latin Hypercube (2 9 ) All quadratic factors LHC Sampling (1/4*2 9 ) All quadratic factors LHCQ Table 3.5. Number of terms (number of operations) in cell models. Model Functions for Each Model [L(Rise), Θ(Rise)] Tplh [L(Fall), Θ(Fall)] Tphl FF 512(3328) 512(3328) 512(3328) 512(3328) 512(3328) 512(3328) FF3 42(131) 34(112) 42(132) 31(95) 24(72) 45(142) FF2 20(50) 17(44) 17(41) 17(41) 14(35) 17(43) FRF 20(50) 13(36) 19(50) 21(59) 12(32) 21(59) LHC 55(154) 55(154) 55(154) 55(154) 55(154) 55(154) LHCQ 55(154) 55(154) 55(154) 55(154) 55(154) 55(154) The coefficient of multiple determination, 2 R, provides a measure of the adequacy of models in explaining variation in a dataset. The coefficient of multiple determination approaches one when the model explains most of the variation in the dataset. To penalize overfitting a model, the adjusted coefficient of multiple determination, It is defined as follows: 2 R, is often used. 33

58 R 2 2 ( 1 ) n 1 = 1 R (3.12) n p where n is the number of observations in the dataset and p is the number of coefficients in the regression equation. Table 3.6 shows 2 R for each of the output parameters ([L(Rise),Θ(Rise)], Tphl, [L(Fall),Θ(Fall)], and Tplh). The results show that most of the models fit the data well. The full-factorial model for Tphl with at most 2 nd order terms and the Latin hypercube models for Θ(Rise), Θ(Fall), and Tphl fit the dataset least well. Note that the values for 2 R were computed based on the dataset used to develop the models. Hence, the lower values for 2 R indicate that the responses display nonlinearity not captured by the models. As a result, we can see that the models with 3 rd order terms fit 2 R better than models with just 2 nd order terms for Tphl and Tplh. This is similarly the case for the model for Θ(Rise). However, the model indicates the existence of nonlinearity for the Latin hypercube dataset, but not the fractional-factorial dataset. The Latin hypercube dataset is based on a quasi-uniform sampling throughout the input domain, specified in Table 3.1, while the fractional-factorial dataset only samples the corners of the input domain. Table 3.6. Adjusted coefficient of multiple determination for cell models (%). Model [L(Rise), Θ(Rise)] Tplh [L(Fall), Θ(Fall)] Tphl FF FF FF FRF LHC LHCQ

59 Table 3.7 provides an indication of predictive cell model accuracy by comparing the sum of squares of residuals, which incorporates the effects of both the residual errors and their standard deviation, for each experimental design methodology and each of the output parameters ([L(Rise),Θ(Rise)], Tplh, [L(Fall),Θ(Fall)], and Tphl). The residuals were not computed with the original dataset used to develop the model, but instead with a separate checking dataset containing data randomly distributed throughout the parameter space (Table 1). The results show that error bounds are a function of sample size and are tighter for Latin hypercube experimental designs. Table 3.7. Sum of squares of residuals for cell models. Model [L(Rise), Θ(Rise)] Tplh [L(Fall), Θ(Fall)] Tphl FF FF FF FRF LHC LHCQ We compared the number of terms, adequacy, and prediction accuracy of each output parameter for all types of our cell models; we found that the accuracy of a cell model is a function of the number of terms in the model equations and experimental design method. However, we should take into account all sources of error during timing simulation, as mentioned in Section 3.4, for evaluating the overall accuracy of the cell models. The overall accuracy of all the models was evaluated using the narrow tree of inverters (Figure 3.8) in the presence of variations in the shape of the input waveform and fanout for two cases: (1) in the absence of variation in process parameters, supply voltage, and temperature in compared with tabular STA similar to Section 3.5, and (2) in the presence 35

60 of variation in process parameters, supply voltage, and temperature using random samples in the parameter space. We used equation (3.11) to perform the simulations. In the equation, Stage_Delay is equal to Tplh or Tphl for rising and falling output transitions, respectively, and [L,Θ] is equal to [L(Rise),Θ(Rise)] or [L(Fall),Θ(Fall)] accordingly. The input to the first stage for all methods was a ramp and therefore had to be mapped to the PC domain for our models. This is an additional source of error because the ramp is mapped to a non-ramp waveform. Now, we explain each case. Case 1: In the absence of variations in process parameters, supply voltage, and temperature compared with tabular STA. Delays from Hspice are used as the basis of comparison for each method. Figure 3.11(a) shows that average relative error for the chain for factorial models is very close and 5% to 7% more than the tabular method, while relative variances are less than 5% for both factorial and Latin hypercube models. This is the price to pay for compactness of the models. Latin hypercube models can offer better or worse results because these models are dependent on the location of samples as well as the number of samples. This is in agreement with their poor prediction capability in Table 3.6. We had invalid waveforms at the early stages of the interconnected cells and stopped the simulation. Figure 3.11(b) shows that the relative error variance for all models is less than 5%. The simulation times for Method 1 and the worst case for Method 3 (FF) were 0.2s, while the simulation time for Method 2 was 21.8s. 36

61 0.30 Average Relative Error FF FF3 FF2 FRF LHC LHCQ Tabular (a) Relative Error Variance FF FF3 FF2 FRF LHC LHCQ Tabular (b) Figure Average relative error (a) and relative error variance (b), of the delay for models in comparison with Hspice using data from the outputs of each of the 21 stages of 12 samples at nominal values of parameters. Case 2: In the presence of variation in process parameters, supply voltage, and temperature using random samples in the parameter space. Figure 3.12(a) presents the average relative errors for the delay of each stage of the inverter chain. It indicates errors of under 5% for the chains for the factorial models. This is comparable to prior work that does not take into account process variations. For example, in [18], it is shown that the relative errors for one cell using an input ramp are 14% for delay and 19% for transition time. The results in [18] improve errors to 1.5% for delay and 5% for transition time. Our results in Figure 3.12 are not just for a single cell, but for a chain of cells, where 37

62 errors can accumulate. Figure 3.12(b) shows that the relative error variance for all models but LHCQ is around 5%. The simulation time for FF was 0.19s, while the simulation time for FF3, FF2, FRF, LHC, and LHCQ was 0.3s Average Relative Error FF FF3 FF2 FRF LHC LHCQ (a) Relative Error Variance FF FF3 FF2 FRF LHC LHCQ (b) Figure Average relative error (a) and relative error variance (b) of delay for models in comparison with Hspice using data from the outputs of each of the 21 stages of 12 samples at random values of parameters. A comparison between Figures 11 and 12 shows that, in general, factorial models offer similar accuracy relative to each other (10% average relative error, and 5% relative error variance). Latin hypercube models have similar relative error for each case while 38

63 being at a different level, but the case involving variation has a much higher variance for a smaller sample size. We want our models to have both good model adequacy and prediction accuracy. The model adequacy of the factorial models is superior to that of the Latin hypercube models (Table 3.5), but the model prediction capability of the factorial models is inferior to that of the Latin hypercube models (Table 3.6). However, our simulation results show better stability in accuracy for both cases of the factorial models because the average and variance of relative errors are at almost the same level. The overall accuracy of the Latin hypercube models was better or worse than the factorial models; this is in agreement with our findings that Latin hypercube models offer less model adequacy but better prediction capability Complexity Analysis Table 3.8 compares the estimated space complexity per transition entry per input for each cell for Methods 1 and 3. Let p be the number of parameters characterizing a cell. For Method 1, p = 2, i.e., slope and fanout. For Method 3, p = 3, since this method requires a pair of PCSs to characterize the waveform shape plus a value. Let us also suppose that we take into account q sources of variation, deriving from the process and environment (temperature and supply voltage). Method 1 requires a p-dimensional table of numbers with k levels in each dimension. If we take into account q sources of variation by computing sensitivities to each of these parameters for each of the table entries (i.e., we are postulating a linear model), we then require q+1 tables with p k entries. Otherwise, a table with ( p+ q) k entries is needed for a model with all interactions. 39

64 ( p + q ) Hence the space complexity of Method 1 is O ( k ), which is reduced to p O ( q. k ) if we assume a linear model. Table 3.8. Comparing space complexity of methods for a cell (per delay/transition entry per input). Method Complexity Tabular (general case) O( k ( p+ q) ) Tabular (linear case) O ( qk p ) FF O ( pw ( p + q ) + 2 p ) FF2, LHC, LHCQ (2 nd order case) O ( p( w + ( p + q) 2 ) FF, FRF, LHC (linear case) O ( p( w + p + q)) Method 3 discretizes waveforms into w voltage steps. A model with p parameters has p-1 significant eigenvalues. Consequently, (p-1)w numbers must be stored. In addition, for a factorial experimental design, the model produces a maximum of ( p+q) 2 coefficients for each of the p expressions. The resulting worst-case model space complexity for ( p + q ) Method 3 is O ( pw + 2 p ). For models with at most 2 nd order terms, the model will have O (( p + q) 2 ) terms, resulting in space complexity of O ( p( w + ( p + q) 2 )). On the other hand, if only linear terms are significant, each of the p expressions has O ( p + q) coefficients, resulting in a space complexity of O ( p( w + p + q)). Table 3.9 compares the estimated simulation time complexity per transition entry per input for each cell for Methods 1 and 3. The simulation time complexity of Method 1 is proportional to the table lookup time and the number of stages (s). A table lookup requires a search in each of the p dimensions among the k entries, which has complexity 40

65 O ( pk ). Once the appropriate entry is selected, the delay is computed, which takes into account the q sensitivities and has complexity O (q) if the model is linear. This process is repeated for each of the s stages, resulting in a simulation time complexity of O ( s( pk + q)). For a nonlinear model the time complexity is O ( sk( p + q)). Method 3 requires the evaluation of p expressions, each with at most ( p+q) 2 terms, for each of the s stages. This results in a computational complexity of O( sp2 ( p + q) ). However, typical expressions contain at most (p+q) linear terms, corresponding to a simulation time complexity of O ( sp( p + q)). Table 3.9. Comparing simulation time complexity of methods for a cell (per delay/transition entry per input). Method Complexity Tabular (general case) O ( sk( p + q)) Tabular (linear case) O ( s( pk + q) FF O( sp2 ( p+ q) ) FF2, LHC, LHCQ (2 nd order case) O ( sp( p + q) 2 ) FF, FRF, LHC (linear case) O ( sp( p + q)) Table 3.10 compares the estimated characterization time complexity per transition entry per input for each cell for Methods 1 and 3. The characterization time complexity of Method 1 is proportional to the number of simulations needed to obtain each number in its lookup table and hence is the same as the model s space complexity. 41

66 Table Comparing characterization time complexity of methods for a cell (per delay/transition entry per input). Method Complexity Tabular (general case) O( k ( p+ q) ) Tabular (linear case) O ( qk p ) FF O ( w 3 + w 2 n + pn ln n) FF2, LHC, LHCQ (2 nd order case) FF, FRF, LHC (linear case) O ( w + w n + n ( p + q) + p( p + q) ) 3 2 O( w + w n + n ( p + q ) 2 + p ( p + q ) 3 ) Method 3 requires several steps. First, n simulations are performed. For Latin Hypercube experimental designs, n is arbitrary. For full factorial and fractional-factorial experimental designs, n ( p+ q) = 2. The number of simulations for a fractional-factorial experimental design is a function of resolution. For a model with up to 3 rd order terms, we require n = O(( p + q) 3 ) simulations. The simplest fractional-factorial design with only linear terms requires O ( p + q) simulations. The simulations result in wn points to be analyzed by PCA. The generation of the appropriate w-dimensional covariance matrix and its eigendecomposition using SVD have computational costs of ( w 2 3 O n) and O ( w ), respectively. Iterative methods exist that avoid finding the covariance matrix. They reduce the computational cost to O (rw) per iteration, where r<w and where an iteration involves sequentially inputting each of the 2(p+q) w-dimensional vectors. One such method is Sanger s generalized Hebbian algorithm [54]. Next, all of the waveforms are converted to the PC domain at a cost of O( w2 ( p+ q) ). 42

67 To develop p expressions, we analyze the resulting PC domain data. The factorial models require that we determine the model for each of the p expressions at a cost of O ( pn), sort the resulting dataset to find the significant factors, at a cost of O ( pn ln n), and select the significant factors, at a cost of O ( pn) Table The dominant terms are shown in Latin hypercube experimental designs use regression to find the models for each of the p expressions. Assembly of the regression matrices with at most 2 nd order terms requires O ( n( p + q) 4 ) operations, and solving for p models involves O ( p( p + q) 6 ) operations. On the other hand, if we had limited models to include only linear terms, the characterization cost to set up and solve the regression matrices would be O ( n( p + q) 2 + p( p + q) 3 ). The resulting overall computational cost is shown in Table It can be seen from Table 3.10 that Method 1 is linear in characterization time complexity, while Method 3 is exponential. However, characterization is done only once for a cell library. Model users are only impacted by space and simulation time complexity. In addition, if a fractional experimental design [43], rather than a full- factorial experimental design, were performed to generate the dataset, characterization would be polynomial in q. Method 1 is exponential in model space and characterization time complexity as a function of p, which limits the discretization of the space that describes the input waveforms (slope and fanout), while Method 3 is not. As a result, for Method 3, memory usage does not increase rapidly with increasingly accurate waveforms. 43

68 Moreover, as we add more parameters, q, Method 1 requires p k more entries for each additional parameter, while Method 3 requires only p additional entries for the linear case. Therefore, memory usage does not increase as rapidly for Method 3 as the number of parameters increases Conclusions We proposed a method to develop compact models of standard cells for static timing analysis enabling accurate characterization over variations in input waveform characteristics, output loading, process parameters, and the environment (temperature and power supply voltage). Our approach, involving equation-based cell characterization in combination with PCA waveform modeling, offers improved handling of a highdimensional parameter space by reducing memory requirement. The compact models enable the performance of a variety of statistical experiments, including efficient Monte Carlo analysis of the impact of within-die variation on delay and of the impact of various temperature profiles and variations in the power supply voltage on delay. To illustrate the impact of our variational waveform modeling, in combination with equation-based cell characterization incorporating parameter variations, the accuracy and efficiency of the method have been evaluated in comparison with Hspice and the tabular method. Run-times and accuracy are comparable with the tabular method, while memory usage is improved. We explored several possible strategies to construct a variational waveform model and a cell model based on our methodology. For a variational waveform model, three approaches to PCA have been compared; the results indicate that the SSM and SNM 44

69 methods offer better accuracy to model a variational waveform. For a cell model, several choices of experimental designs were considered because the accuracy of a cell is dependent on the sampling method employed for cell characterization, and we want to develop a cell model to be accurate over parameter variations. Simulation of the inverter tree showed that the accuracy for Latin hypercube designs depends on the sample size and the locations of samples, while factorial designs were in general more stable and performed better because such models are built to be accurate at the corners of the parameter space. In contrast, Latin hypercube designs just explore a subset of the parameter space; and it is possible that a combination of parameters falls outside the range of the original quasi-random dataset that was used to build the model. This can result in inaccurate extrapolation. Moreover, the results showed that fractional-factorial designs with up to two factor interactions offer better memory usage, reduce the computational cost, and provide accuracy, which is almost as good as full-factorial designs with all interactions. 45

70 CHAPTER IV EXTENDING AND ENHANCING THE METHODOLOGY We devised a methodology on how to construct and use a compact variational waveform model and how to use it to build a compact variation-aware timing model for a basic cell in a cell library, i.e. an inverter. We showed the feasibility of our methodology in [55]-[59]. With minor modifications, the methodology can be extended to other cells. We want to extend and enhance the methodology to push it to its limits. First, we used the most recent technology accessible to us at the beginning, but we want to show that the methodology is applicable to the newer technologies as well. Second, we started with a moderate range of variation for process parameters, but we want to demonstrate that we can build the models to work within their maximum range of variation supported with their circuit-level models. Third, we assumed pure capacitive loads for the models, which makes our models applicable to resistive-capacitive loads using the effective capacitance method; however, since the effective capacitance method loses its accuracy for interconnect dominant circuits, we want to extend the methodology to support resistive-capacitive loads. Finally, we want to look at all possible ways to increase the accuracy of our methodology by categorizing the accuracy improvement methods. This chapter is organized as follows. Section 4.1 explains how to construct a cell model for a deep submicron technology. Section 4.2 demonstrates how to construct a cell model for very large parameter variations. Section 4.3 describes constructing our cell model for resistive-capacitive loads. Section 4.4 lists and briefly discusses our accuracy improvement methods. 46

71 4.1. Constructing a Cell Model for Deep Submicron Technology Our waveform and cell models are statistical models obtained by statistical analysis on the waveform data generated for standard cells. It is obvious that the accuracy of our models depends on the accuracy of the transistor model used to generate the waveform data. We developed our methodology to construct our waveform and cell models using TSMC180RF technology, which was the submicron technology that we had access to and was supported by our CAD tools. The minimum transistor channel length for TSMC180RF is 180 nm. We want to make sure the methodology is applicable to deepsubmicron technologies, where the minimum size transistor channel length is 100 nm or lower [60]. The transistor models for deep-submicron technologies need to better model and address the presence of physical effects of the sub-micron regime. Since our high-level statistical waveform and cell models are built based on the behavior of the circuit-level models of the transistors, our cell models are affected indirectly as we use the samples of an output variable response surface to construct them. On the other hand, process variation is more pronounced as the technology scales down because it becomes more difficult for semiconductor process control to keep pace with the accuracy needed to keep the process variation within the tighter limits needed. In this section, we discuss the support for parameter variation of the circuit-level transistor models for our high-level statistical cell models and present our parameter selection criteria for our cell models. We explain how to determine the largest possible range of variations for our cell model to maintain their functionality as a logic gate in 47

72 Section 4.2. Then, we construct our cell models for the largest possible range of variations of a deep-submicron technology supporting resistive-capacitive loads in Section More Accurate Transistor Models with More Parameters We use FreePDK45 (NSCU 45 nm) technology [61], with 45 nm minimum transistor channel length, as our deep-submicron technology to assess our methodology because we didn t have access to any commercial 45 nm technology files. FreePDK45 is available for academic purposes from NanGate Inc.[62]. In the technology files that we use, transistors are modeled by BSIM (Berkeley Short Channel IGFet Model) [63]. The values of the transistor parameters in FreePDK45 technology files are based on the Predictive Technology Model (PTM) [64]. BSIM contains more than 200 parameters; however, most of them are related to secondary effects. TSMC180RF uses BSIM3v3 (Hspice Level 49) while FreePDK45 [61] uses BSIM4 (Hspice Level 54). BSIM3v3 is an industry-wide standard for the modeling of deep submicron Metal Oxide Semiconductor Field Effect Transistor (MOSFET) transistors [63]. BSIM4 is an extension to BSIM3 model and addresses MOSFET physical effects into the sub-100 nm regime [65]. We list just the enhancements that can affect timing analysis from the reference: (a) An accurate new model of the intrinsic input resistance (bias-dependent gate resistance model), for both RF, high-frequency analog and high-speed digital applications; (b) A comprehensive and versatile geometry dependent parasitics model for various source/drain connections and multi-finger devices; 48

73 (c) Asymmetrical and bias-dependent source/drain resistance, either internal or external to the intrinsic MOSFET at the user's discretion; (d) Acceptance of either the electrical or physical gate oxide thickness as the model input at the user's choice in a physically accurate manner; (e) The quantum mechanical charge-layer thickness model for both IV and CV; (f) A more accurate mobility model for predictive modeling; (g) Different diode IV and CV characteristics for source and drain junctions; (h) Dielectric constant of the gate dielectric as a model parameter; (i) A new scalable stress effect model for the process-induced stress effect; device performance becoming thus a function of the active area geometry and the location of the device in the active area; (j) A unified current-saturation model that includes all mechanisms of current saturation- velocity saturation, velocity overshoot and source end velocity limit; (k) A new temperature model format that allows convenient prediction of temperature effects on saturation velocity, mobility, and S/D resistances. We investigated which of the above enhancements have been used in FreePDK45 for NMOS_VTL and PMOV_VTL, which are NMOS and PMOS transistor models respectively. Enhancements (b), (d), (e), (h), and (k) are used; and the rest are not employed. Consequently, the model should behave like an older model such as BSIM3v3.2 considering the absent parameters [65]. We checked the presence of related parameters in the technology files to know which enhancements are used. Table 4.1 shows the new or enhanced BSIM4.5.0 parameters used in FreePDK45. The value used for the parameter inside parentheses for the 2 nd 49

74 column shows its similarity to older models. The enhancement column has three subcolumns demonstrating the utilization of enhancements in FreePDK45 models, the letter of enhancement that we listed before, and how it affects the accuracy of timing analysis the category column. We know parasitics, transistor parameters, and temperature influence the accuracy of timing analysis. Table 4.1. New or enhanced BSIM4.5.0 model parameters used in FreePDK45 Parameter Description (Value used [Similar to]) Enhancement Name Used Letter Category GEOMOD Geometry-dependent parasitics model Yes b Parasitics TOXE Electrical gate equivalent oxide thickness Yes d, e Parasitics TOXP Physical gate equivalent oxide thickness Yes d, e Parasitics EPSROX Gate dielectric constant relative to vacuum Yes h Parasitics UA Coefficient of first-order mobility Yes k Mobility degradation due to vertical field UB Coefficient of second-order mobility Yes k Mobility degradation due to vertical field UC Coefficient of mobility degradation due to Yes k Mobility body-bias effect UA1 Temperature coefficient for UA Yes k Mobility UB1 Temperature coefficient for UB Yes k Mobility UC1 Temperature coefficient for UC Yes k Mobility RGATEMOD Gate resistance model selector (1 [BSIM3; Constant resistance]) No a Parasitics RDSMOD Bias-dependent source/drain resistance No c, k Parasitics model selector (0 [BISM3; Rds modeled internally through IV equations]) KU0 Process induced stress effect No i Mobility MOBMOD Mobility model selector (0 [BSIM3v3.2]) No f Mobility DIOMOD Source/drain junction diode selector No g I-V curve (1[BSIM3v3.2]) LAMBDA Velocity overshoot coefficient No j Velocity VTL Thermal velocity No l Velocity TEMPMOD Temperature mode (0 [BSIM3v3.2]) No k Temperature 50

75 We did not use oxide thickness in our models as one of the parameters and we present our reason later, but we mention that its changes can affect process transconductance (k n ), while mobility has a weaker effect [63]. BSIM4 allows capturing the variations related to mobility on the circuit performance better as they are affected by temperature, which is one of parameters that we use for characterizing our statistical cell models. Table 4.2 shows how these parameters are listed in the BSIM4.5.0 user manual. From more than 200 parameters listed in the manual, we listed just the parameters that we used in Table 4.1. Please note that some of the parameters are binnable, which is explained in the next section. Parameter Name GEOMOD TOXE TOXP EPSROX UA UB UC Table 4.2. BSIM4.5.0 model selectors/controllers. Description Default Value Binnable Note Geometry-dependent parasitics model selector - specifying how the end S/D diffusions are connected Electrical gate equivalent oxide thickness Physical gate equivalent oxide thickness Gate dielectric constant relative to vacuum Coefficient of first-order mobility degradation due to vertical field Coefficient of secondorder mobility degradation due to vertical field Coefficient of mobility degradation due to bodybias effect 0 (isolated) NA - 3.0e-9m No Fatal error if not positive TOXE No Fatal error if not positive 3.9 (SiO2) No Typically greater than or equal to e-m/V for MOBMOD=0 and 1; 1.0e-15m/V for MOBMOD=2 Yes - 1.0e-19m 2 /V 2 Yes V-1 for MOBMOD=1; e-9 m/v 2 for MOBMOD=0 and 2 Yes - 51

76 Parameter Name KU0 UA1 UB1 UC1 Table 4.2. (cont.) BSIM4.5.0 model selectors/controllers. Description Default Value Binnable Note Mobility degradation/enhancemen t coefficient for stress effect Temperature coefficient for UA Temperature coefficient for UB Temperature coefficient for UC KT1 Temperature coefficient for threshold voltage RGATEMOD Gate resistance model selector RDSMOD Bias-dependent source/drain resistance model selector 0.0[m] No - 1.0e-9m/V Yes e-18 (m/v) 2 Yes V-1 for MOBMOD=1;0.056e -9m/V 2 for MOBMOD=0 and 2 Yes V Yes - 0 (no gate resistance) - 0 NA Rds(V) modeled internally through IV equation MOBMOD Mobility model selector 0 NA - DIOMOD Source/drain junction 1 NA - diode IV model selector LAMBDA Velocity overshoot coefficient 0 Yes If not given or (<=0.0), velocity overshoot will be turned off VTL Thermal velocity 2.05e5[m/s] Yes If not given or (<=0.0), source end thermal velocity will be turned off TEMPMOD Temperature mode selector 0 No If 0, original model is used If 1, new format used 52

77 Non-binned Transistor Model vs. Binned Transistor Model FreePDK45 uses BSIM4 that models the transistors more accurately than BSIM3v3, which was used in TSMC180RF; however, TSMC180RF uses a binned transistor model while FreePDK45 does not. Binning is done by dividing the 2-dimensional surface of acceptable W (transistor width) and L (transistor length) into several regions. The number of regions is the product of the number of divisions chosen for W and L. Transistor parameter values have been determined for each region separately. The proper region is automatically selected by a Spice simulator depending on the values of W and L. This makes the transistor model more accurate because it is like having a unique transistor model in each region. Each transistor model is valid for a limited region determined by these parameters in the transistor model: LMIN, LMAX, WMIN, and WMAX; which are the minimum and maximum of acceptable transistor channel length and width Support of Symmetric Parameter Variation for all Parameters TSMC180RF doesn t let the transistor model be simulated for channel length values less than 180 nm; therefore, we considered only the +5% variation in L and ignored its -5% variation in constructing our models. Consequently, the range of channel length values is from 180 nm to 189 nm and its center is nm. Thus, the range of variation in channel length is not symmetric with respect to the default value. If we could have used its symmetric range, the channel length values would have been from 171 nm to 189 nm and its center would have been 180 nm. FreePDK45 allows reducing the channel length down to 35 nm while the standard cells use 50 nm for the channel length (instead of 45 nm). We see that we can reduce the 53

78 channel length by 30% and the range of variation in channel length will be from -30% (35 nm) to +30% (65nm) and its center is 50 nm; therefore, the range of variation in channel length is symmetric. To make sure of the accuracy of simulations for values under 45 nm, we contacted Prof. Yu Cao, Nanoscale Integration and Modeling (NIMO) Group because FreePDK45 uses a Predictive Technology Model based on BSIM4. According to him, the transistor model will be accurate to the first order when we get closer to 35 nm, which was mentioned as one of our process extremes. That means the model can still be used for channel length values under 45 nm although the accuracy is not going to be as good as for channel length values over 45 nm Variation Parameters Chosen for Cell Models The transistor models link the transistor-level parameter variations to our statistical waveform and cell models. Our models take into account variation in both process parameters and environmental parameters. We have included just the channel length and threshold voltage of transistors to represent process variation parameters in our models to keep the total number of variation parameters minimal; however, there are many more parameters at the transistor level. Reference [63] categorizes these parameters into two groups. (a) Variations in the process parameters includes impurity concentration densities, oxide thickness, and diffusion depths. They originate from nonuniform conditions during deposition and/or diffusion of impurities, and they affect transistor parameters such as the threshold voltage and sheet resistances. Table 4.3 lists some of these parameters for BSIM4.5.0 [65]. The last row shows the zero-body-bias threshold voltage that we used to vary threshold voltage while other parameters can be used for the same purpose. 54

79 (b) Variations includes the dimensions of devices. They result mainly from the limited resolution of the photolithographic process. These affect the W, L of the transistor, and the width of the interconnect wires. We have included L in the list of our parameters, while W is a constant for each cell. The variation in W does not affect the switching speed of a transistor in the same way as the variation in L. However, the variation in W can affect the load and the parasitics. It is worth mentioning that variation in W and L are totally uncorrelated because the first is related to the field oxide step while the second is determined by polysilicon definition and the source and drain diffusion processes. Table 4.3. Example BSIM4.5.0 model process parameters. Parameter Description Default Value Binnable Note Name TOXE Electrical gate equivalent oxide thickness 3.0e-9m No Fatal error if not positive TOXP Physical gate equivalent oxide thickness TOXE No Fatal error if not positive NDEP Channel doping concentration at 1.7e17 cm -3 YES depletion edge for zero body bias NSUB Substrate doping concentration 6.0e16 cm -3 YES - NGATE Poly Si gate doping concentration 0.0 cm -3 Yes - NSD Source/drain doping 1.0e20 cm -3 Yes - concentration Fatal error if not positive NDEP Channel doping concentration at depletion edge for zero body bias 1.7e17 cm -3 Yes - VTH0 or Long-channel threshold voltage 0.7 V (NMOS) Yes - VTHO at Vbs=0-0.7 V (PMOS) We have chosen channel length and threshold voltage to represent process parameter statistical models. Our reasons for this selection are as follows. 55

80 (a) From the process parameter perspective, the variation in channel length, threshold voltage and oxide thickness of transistors affect the timing of a circuit [63]; however, the variation in channel length and threshold voltage has a more significant role in the performance of a circuit. (b) The variations in channel length is independent from the variation in the threshold voltage because they are determined by different process steps [63]. (c) We need to keep the number of parameters small in our statistical models to make them computationally efficient. We close this section by presenting the reasons for the variation in channel length and threshold voltage. Channel length variation is because of optical exposure and process control limitations. Threshold voltage variation originates from process control limitations since the number of dopants in the channel cannot be controlled well enough Constructing a Cell Model for Very Large Parameter Variations We assumed a reasonable range of variation for all variables at the beginning of the research as mentioned in the previous chapter. It is conducive to study the effect of very large parameter variations on our variational waveform model and cell model; therefore, we try to construct a cell model compliant with the range of variations predicted by the International Technology Roadmap for Semiconductors (ITRS) [66] for channel lengths and threshold voltages of the transistors. We construct our models using an inverter from the Oklahoma State University standard cell library, which is based on FreePDK45 technology. We used Calibre [67] 56

81 for DRC, LVS, and parasitic extraction of the layout for the inverter because the technology files of FreePDK45 didn t support Assura. We used Assura for parasitic extraction for our inverter model based on TSMC180RF. For more information about the cell models that we used, please refer to Appendix B. To know what are the maximum range of variations in Vt (Vtn or Vtp in Table 3.1) and L (Ln or Lp in Table 3.1), we swept the range of L and the range of Vt and observed the voltage transfer curve (VTC) [63] changes for the inverter. VTC is based on DC analysis, therefore, the curves for all four different combinations of Fanout and Slope are mapped into just one curve. We included these two variables in our experiment design parameters table to use exactly the same combinations of variables that we use for our cell characterization using transient analysis simulations. Moreover, the inclusion of the two variables makes this table consistent with our other tables for variation model parameters. Figure 4.1 shows the VTC curves for our inverter based on FreePDK45 technology when we varied all the parameters using a 2-level full-factorial design based on Table 4.4. The horizontal and vertical axes are input voltage and output voltage in volts, respectively. We observe the VTC s are affected by parameter variations. We will show later that increasing the range of variations in parameters can make the VTC unacceptable for operating as an inverter. Table 4.4. Variation model parameters for our VTC's (FreePDK45). Variable Variation Variable Variation Lp -20% to 20% Ln -20% to 20% Vtp -5% to 5% Vtn -5% to 5% T 0 C to 70 C Fanout 1 to 9 Vdd -10% to 10% Slope 10 ps to 3 ns 57

82 On a VTC, as we increase V in, the first time that the slope of the curve is -1, it is V IL (the acceptable low voltage for input) and the second time that slope is -1, it is V IH (the acceptable high voltage for input), the distance between V IH and V IL is where the signal is undefined. V OL is the minimum value of Vout (output voltage) on the VTC and V OH is the maximum value of V out on the VTC. Reference [63] provides more details on the relationship among V IL, V IH, V OL, and V OH. Figure 4.1. VTC s are affected by variations of 20% and 5% in L and Vt, respectively. We used a 2-level full-factorial design to plot the set of voltage transfer curves taking into account the variations in Lp, Ln, Vtp, Vtn, Vdd (supply voltage), and T (temperature). When we increase the variation in Vt and/or L more than a limit the inverter does not function properly because the level of the output cannot reach within an 58

83 acceptable distance from the supply voltage (Vdd) or ground (zero). We observe, V OL (V OH ) is far from 0 (Vdd) for some cases with parameter variations. We see process variation decreases the noise-margin-low (NM L =V IL -V OL ) and the noise-margin-high (NM H =V OH -V IH ). In a good design our goal is to preserve a full rail-to-rail swing and to keep V OH =Vdd and V OL =0; however, we choose the acceptable level of the output to be within 10% of the target value to take into account the effect of process and environmental variations. We consider a voltage swing of 80% of the supply voltage acceptable for the output in the presence of process and environmental variations although the ideal swing should have been 100% of the supply voltage, which guarantees the correct operation of the inverter. The inverter does not switch properly for some combinations in the factorial design; therefore, the range of the parameters must be adjusted to make all the VTC s acceptable. We increased L by 5% increments from 0% to 30% and set the variation in Vt to be 5%. The results indicated that maximum acceptable variation in L should be 20% for the right operation of the inverter. Similarly, when we set the variation in L to be 20% and increased Vt from 5% to 20% and then to 30%, the maximum acceptable range of variation in Vt was 20%. Our target range of variation was 30% for both Vt and L originally, but the inverter does not function properly when we pass the 20% limit of the range of variation in both Vt and L at the same time; therefore, we have to change our target to 20% to be practical from the ideal target of 30% predicted in ITRS. We illustrate with an example how process variation can interfere with operation of an inverter and does not allow its correct operation. Figure 4.2 shows the output response of the inverter based on a transient analysis using Hspice for all possible combinations of 59

84 parameters of Table 4.4 with one exception. We swept the variation in L from 5% to 10% and then to 20%, 30%, and 40%. Output waveforms for all the cases of the variation in L as mentioned are in cyan, yellow, pink, green, and red, respectively. In the plot, the vertical axis is the input or output voltage in volts and the horizontal axis is time in nanoseconds. The input waveform is a saturated ramp with both of the extreme values for the slope based on the table. The supply voltage also exercises the two extreme cases, i.e and 1.21 volts based on the table. You cannot see the actual input waveform (in blue) clearly because is covered by the output waveforms. We want just to show how output waveforms are affected when we sweep the variation in L. In the plot, we observe the level of the output is affected for cases with 30% and 40% variation in L. This makes the levels of the output swing not acceptable for the correct operation of the inverter for each of the two supply voltage levels. For all other cases with the range of variation in L equal or less than 20%, the levels of the output swing fall within 10-90% of the supply voltage, which is acceptable for our cell models although a good design targets a 100% rail-to-rail swing. Figure 4.3 shows the input transitions and corresponding output transitions for the first set of transitions resulting from applying the first pulse in Figure 4.2. In the figure, we observe the output transitions do not cover all the range of the supply voltage for each of two supply voltage levels for the cases where the range of variations in L is more than 20%. Figure 4.4 depicts a plot similar to the plot in Figure 4.1, but it shows VTC s when the variations in L and Vt are 30% and 5%, respectively. Consequently, the inverter doesn t function correctly. 60

85 Figure 4.2. Output swings are affected by process parameter variations. Figure 4.3. Output transitions are affected by parameter variations (Input transitions are the saturated ramps in blue). 61

86 After showing the case that the increase of the range of variation in L is the major reason for the problem, we show in Figure 4.5 another plot when the variations in L and Vt are 5% and 90%, respectively. This is for the case that the increase of the range of variation in Vt is the major source of the problem and the inverter does not function correctly. Based on a series of simulations changing the range of variation in L and Vt, we determined the safe range of variation in L and Vt is 20%. Figure 4.6 depicts the VTC curves for this case. Figure 4.4. VTC s are affected adversely by increasing the variation in L to 30% while keeping the variation in Vt at 5%. Similar to our inverter model based on TSMC180RF, we use 5% as the range of variations in Vt and L to construct our original models for FreePDK45. We choose 20% as the range of variation in Vt and L to build our models with a large range of variation. We will compare the accuracy of the two PC waveform models in Section

87 Figure 4.5. VTC s are affected adversely by increasing the variation in Vt to 90% while keeping the variation in L at 5%. Figure 4.6. VTC s are acceptable when the variations in L and Vt are set to 20%. 63

88 Tables 4.5, 4.6, 4.7 and 4.8 show the range of variations in all the parameters for an inverter based on FreePDK45. They will be used in Section 4.4 when we explore the effect of subranging and large variations on the accuracy of our models, but we have included them here to introduce the basic idea. In Table 4.5, the range of variation in all the parameters but Slope is chosen to be the same as what we used for the inverter based on TSMC180RF. This helps us evaluate our waveform and cell construction methodology, which was discussed in Chapter III, for FreePDK45 using the same variation model parameters that we used for TSMC180RF. In Table 4.6, the range of variation in threshold voltages and channel length is increased from (-5% to 5%) to (-20% to 20%), which is denoted with large variations in its caption. In Tables 4.7 and 4.8 the range of Fanout and Slope has been reduced, i.e. subranged, to evaluate if using a subrange of these two parameters can improve the accuracy of our waveform and cell models. Table 4.5. Variation model parameters (FreePDK45). Variable Variation Variable Variation Lp 5% to 5% Ln 5% to 5% Vtp -5% to 5% Vtn -5% to 5% T 0 C to 70 C Slope Fanout 1 to 65 Vdd -10% to 10% [PC1, PC2] [L, Θ] 10 ps to 3 ns (for slope) Dataset Range (otherwise) Table 4.6. Variation model parameters (FreePDK45 - large variations ). Variable Variation Variable Variation Lp -20% to 20% Ln -20% to 20% Vtp -20% to 20% Vtn -20% to 20% T 0 C to 70 C Slope Fanout 1 to 65 Vdd -10% to 10% [PC1, PC2] [L, Θ] 10 ps to 3 ns (for slope) Dataset Range (otherwise) 64

89 Table 4.7. Variation model parameters (FreePDK45 - subrange). Variable Variation Variable Variation Lp 5% to 5% Ln 5% to 5% Vtp -5% to 5% Vtn -5% to 5% T 0 C to 70 C Slope Fanout 1 to 9 Vdd -10% to 10% [PC1, PC2] [L, Θ] 10 ps to 300 ps (for slope) Dataset Range (otherwise) Table 4.8. Variation model parameters (FreePDK45 subrange & large variations). Variable Variation Variable Variation Lp -20% to 20% Ln -20% to 20% Vtp -20% to 20% Vtn -20% to 20% T 0 C to 70 C Slope Fanout 1 to 9 Vdd -10% to 10% [PC1, PC2] [L, Θ] 10 ps to 3 ns (for slope) Dataset Range (otherwise) 4.3. Constructing a Cell Model for Resistive-Capacitive Loads We want to extend our methodology to support resistive-capacitive loads. The methodology that we developed for our compact variation-aware cell models [55]-[59], in chapter III, is based on the assumption of pure capacitive loads for cells while interconnects can make the loads resistive-capacitive. For interconnect dominant logic circuits, such a solution can reduce the accuracy of the timing analysis because (a) it takes into account just the effect of the capacitive component of a load of the cell and ignores the effect of its resistive component or (b) it provides not very accurate delay estimates even by using the effective capacitance [68] concept to adjust its capacitive component to reflect the existence of the resistive component to some degree. While interconnect networks have inductive components, for our timing analysis purposes with a focus on high frequency signals, we limit our discussion just to RCinterconnect networks [69] instead of RLC-interconnect networks [69]. 65

90 We represent voltage transfer functions and impedances (or admittances) in the Laplace domain throughout this section. Moreover, the terms cell and gate have been used interchangeability. This section is organized as follows. Section describes how a resistivecapacitive load can model each resistive-capacitive interconnect network to be included in our cell characterization. Section shows how a resistive-capacitive interconnect network can be mapped into a Pi-model [69]. Section covers the process of our cell characterization with a Pi-model load and the RC-interconnect networks. Section summaries the RC-interconnect networks characterization methods and presents our choice. Section describes our test circuit and its Pi-model-converted RCinterconnect networks. Our timing analysis engine is discussed in Section Section is dedicated to the timing simulation and analysis of the results. Section concludes Section Timing Characterization of Complex Loads The signal propagation delay from one gate to another has two components: (a) the delay of a gate to the RC-interconnect network at the gate output and (b) the delay of the interconnect network at its own output, which can be the input of the next stage gate. The delay of a gate is a function of its input waveform transition and its output load; therefore, the resistance of the interconnect at its output causes the gate to see a smaller capacitance, which is called effective capacitance [68]. To find the propagation delay of one gate to another, the delay of the gate, which is a function of the RC-interconnect itself, must be added to the delay of the interconnect [69]. 66

91 Resistive-capacitive loads are actually complex loads. Since the resistance is the real part of the (complex) impedance, it can be ignored for very small values in comparison with the imaginary part, which makes the load almost purely capacitive. The concept of the effective capacitance is based on this rule, which treats the load almost as pure capacitive, but the value of the capacitance seen should be adjusted to make a "good" approximation of a complex impedance by considering just its "adjusted" complex part. However, this approximation loses its accuracy as the value of the resistance, the real part of the impedance, increases. There are different levels for modeling the load of a logic gate as shown in Figure 4.7. The load of a gate is actually the complex load of the RC-interconnect network connected to it and includes the input capacitance of the next level logic as shown in Figure 4.7.a. We can use: (a) the original RC interconnect-network, (b) a Pi-model [69],[70] approximation, as shown in Figure 4.7.b, or (c) an effective-capacitance model (Ceff) approximation. We call the input admittance of the networks (a), (b), and (c), Y(s), Y'(s), and Y''(s), respectively. The input voltages of the RC-interconnect networks in the figure, which are the output voltages of the gate, are shown as Vin(s), Vin'(s), and Vin''(s), respectively. The output voltage of the RC-interconnect network, which is the voltage across the input capacitance of the next stage logic, are shown as Vout(s), Vout'(s), and Vout''(s), respectively. Since Y'(s) and Y''(s) are approximations of Y(s), the input admittance of the RC-interconnect network, Vin'(s) and Vin"(s) are be approximations of Vin(s); and Vout'(s), and Vout"(s) are be approximations of Vout(s). Depending on the level of accuracy needed for timing analysis, different models can be used, which will 67

92 affect the simulation time and accuracy. The models from the highest to the lowest level of accuracy (even at circuit level) are as follows: (a), (b), and (c). Figure 4.7. For a logic gate, there are different levels for load modeling: (a) original RC-interconnect network, (b) Pi-model, and (c) Ceff-model. Level (a) models can be described in Detailed Standard Parasitic Format (DSPF) [71], in which the detailed parasitics are represented in Spice format. Level (b) models can be described in Reduced Standard Parasitic Format (RSPF) [71] because the parasitics are 68

93 represented a reduced format. Both levels (a) and (b) can be simulated using Spice-like simulators. On the other hand, the Standard Parasitic Exchange Format (SPEF) [71] allows representation of only the detailed parasitics (and not the gates!) Level (c) models can be used in high-level timing analysis tools by adjusting the value of capacitive load to compensate for the existence of a small resistance in resistive-capacitive loads. It is obvious that the Ceff capacitance becomes a poor approximate for interconnect dominant networks that the resistance of in the complex impedance is significant. In the original RC-interconnect network (a), the load of the next stage logic is actually considered a part of the interconnect itself. Since the input capacitive load of a gate is variable and is a function of the transition time, an average value must be used as an approximate of the input capacitance of the gate when the gate input capacitance is replaced by a capacitor. In circuit-level simulation, we do not need to do this necessarily, but it is required to do so if we want to map the RC-interconnect network to an approximate model and calculate an input admittance and a voltage transfer function. In the Pi-model (b), C1 is connected directly to the output of a gate and C2 is connected through R to C1 and the second terminals of both C1 and C2 are grounded. Vin'(s), the voltage across C1 (and not C2, which is connected to the terminal node in the Pi-model), is applied through a voltage-controlled voltage source with unit gain after passing through a low-pass 1-level RC filter to the original output of the interconnect to make the Vout'(s). The values of R3 and C3 are determined in such a way to reflect the necessary delay and slope at the output node of the RC-interconnect network. The reason for using this technique will be better understood when we describe handling the fanout branches in RC-interconnect networks. 69

94 In the Ceff-model (c), the gate output voltage, i.e. Vin''(s), is the same as the RCinterconnect network output voltage, i.e. Vout''(s), because the resistance of the complex load is negligible after adjusting the value of the load to Ceff; hence we see only one capacitor in the model and no resistors at all. It is theoretically possible to include the effect of a complex impedance in our compact models by adding as many parameters needed to represent an RC-interconnect network for cell characterization, but this method has two major practical problems: (a) it increases the number of the dimensions in our models by the total number of capacitors and resistors in the interconnect network and consequently increases the characterization time complexity exponentially and (b) it makes the characterization of different RCinterconnect network topologies necessary and increases the characterization time complexity as the result. Theoretically, the interconnect complex impedance can be modeled as a set of topologies, represented by a set of complex functions T={T1(.), T2(.),... T(k)(.)}, each with a set of resistances and capacitances as their parameters if we restrict our models just to RC-interconnect networks. For example, as shown in Figure 4.8., T3(C0, R1, C1, R2, C2) represents an RC-interconnect network with a capacitance of C0 cascaded with 2 low-pass RC filters of R1C1 and R2C2; while T4(C0, R1, C1, R2, C2), with the same set of parameters, represents another RC-interconnect network with a capacitance of C0 cascaded with a fanout node with 2 branches, for which, the first fanout is a low-pass filter of R1C1 and the second one is a low-pass filter of R2C2 and the output is derived from the non-ground terminal of C2. Please note the other output is not important for us, but the existence of its circuit elements, R1 and C1, affects the input admittance function. 70

95 T5 is similar to T4, but the output is derived from the non-ground terminal of C1. While it would be possible to categorize a large number of the most used topologies for a finite number of resistors and capacitors, it would be very time consuming. Figure 4.8. Two RC-interconnect networks with different topologies are mapped to a simple RC network (i.e. Pi-model) just with different values for the parameters. In practice, instead of using the set of interconnect complex functions with many parameters that we mentioned, we need a function with just a few parameters. If we can find a way to map the parameters of all the complicated complex functions to a simple function with a few parameters, we have solved the problem. Such a function should be an approximation of the all the functions for different topologies while its parameters could capture somehow the related topology information of each original function. Different topologies of the interconnect can be represented by multi-port networks, but for our timing analysis purposes, we need to know only the impedance (or admittance) that we can see from each port; therefore a 1-port model should be constructed with its impedance (or admittance) ideally equal to the impedance (or admittance) seen from the 71

96 specific port of the multi-port network. Fortunately, there are standard methods to reduce the order of such RC networks, which are referred to as model order reduction (MOR) methods [69], but we can only find a good approximation of the original interconnect complex impendence. For example, we can map the set T into a set of reduced-order models (ROM), which we represent them as M={ M1(.), M2(.),... M(k)(.)}. We define the M(.) functions just with three parameters, C1, R, and C2, which correspond to the capacitances and resistances of a Pi-Model [70]. Hence, we will have a function in the form: M(C1,R,C2). Here, the actual load is the C1 cascaded to a low-pass RC2 filter, which is called a Pimodel, is shown on the right side of Figure 4.8. The values of C1, C2, and R of the Pimodel (i.e. C1', C2', and R' or C1'', C2'', and R'') in the figure, are determined based on the original network topology and the values for capacitances and resistances of the network. The Pi-model can be used in a circuit-level simulator for timing analysis. The reduced standard parasitic format, which is one of the standard formats for representation of parastitics for post-layout timing analysis, is based on a Pi-model, in which the parasitics extracted from a layout are mapped into C1, R, and C2 parameters [71]. For example, Figure 4.9 shows an RC-interconnect network comprised of three RCinterconnect segments, in which each segment is 0 or 1 capacitor cascaded to a one or more RC-low-pass filters. This is an example of an RC network with two fanout branches. Here, the load is modeled by a Pi-model that incorporates all interconnect segments (e.g. INT0, INT5, and INT6); however, the voltage at the end of each output interconnect segment is determined by passing the input voltage through the interconnect 72

97 network; and the signal is affected by a low-pass filter (e.g. R5C5 or R6C6) to accommodate for the slope and delay changes of the signal through each specific interconnect segment to the output (e.g. INT5 and INT6). Such a model described in RSPF can be mapped easily to a Spice netlist by including the unit gain voltagecontrolled voltage sources. That means the timing simulation can be performed at a higher level of abstraction of the circuit but still in circuit-level. Figure 4.9. For an RC-interconnect network with fanout branches, the load is modeled by a Pi-model that incorporates all interconnect segments (e.g. INT 0, INT5, and INT6); the voltage at the end of each output interconnect segment is determined by passing the input voltage of the interconnect network affected by a low-pass filter (e.g. R5C5 or R6C6) to accommodate for the slope and delay change of the signal through each specific interconnect segment to the output (e.g. INT5 and INT6). 73

98 We want to do the timing simulation at a higher level of abstraction than circuit-level, like Spice. Therefore, we need to characterize the cells for a complex load. Fortunately, using a Pi-model increases the number of parameters in our models just by 2 in comparison with that of Ceff that we used in Chapter III. This makes the characterization possible since many parameters (resistances and capacitances of each interconnect segment) are mapped only into just 3 parameters, which are C1, R, and C Mapping RC-Interconnect Networks to Pi-Models There are several standard methods to map each RC network of T with H(s) transfer function to a 2-pole 1-zero H'(s) approximate transfer function. We just refer to the fundamentals of Asymptotic Waveform Evaluation (AWE) [72]. AWE is a general method based on state-space equations and the circuit response. It has the following two steps: (a) Moment generation: we generate the moments from a circuit. (b) Moment matching: we match the moments to the simpler model. Assume H(s) has the following general form: H ( s) = c 0 a + c s c 0 1 n 1 + a s a 1 n s s n 1 n (4.1) The numerator and denominators are polynomials of s. and c 0 -c n-1 and a 0 -a n are the coefficients of the polynomials. On the 1st step, we describe H(s) based on its moments by a Taylor expansion about s = 0 with a sufficient number of its moments in the following form: 74

99 2 H ( s) = m + m s +. m s... (4.2) H(s) is polynomials of s and m 0, m 1, m 2, are called the moments of H(s). On the 2nd step, we map the above function back to an H'(s) with lower order polynomials in the nominator and denominator to obtain the MOR. For example, H'(s) can be the MOR of H(s) if we can determine the coefficients of the polynomials in the nominator and denominator so that H (s) has the same moments as H(s). H '( s) c' + c' s 0 1 = 2 (4.3) a' 0 + a' 1 s+ a' 2 s Since we did not have access to a MOR engine, we built our own MOR engine using Matlab's symbolic math toolbox and implemented the following three methods: (a) Two-pole approximation with explicit moment matching [73], (b) Stable 2-pole model based on first three moments [74], and (c) Stable 2-pole method (S2P) [75]. These methods are not as general as AWE, but they are suitable for our purpose of obtaining the H'(s) and Y'(s) with minimum necessary implementation time since we need a 2-pole system for a Pi-model. We used the S2P as the main method because of its guaranteed stability of the resulting model and used the two other methods to cross-check the transfer functions. Also, S2P gives us Y(s) the ROM of the (input) admittance of the RC network as Y'(s), which is a 2-pole 2-zero system and different from the voltage transfer function of the RC 75

100 network, H'(s), which is as a 2-pole 1-zero system. The general forms of the described approximate Y'(s) and H'(s) are as follows: b Y '( s) = a 0 0 H '( s) = a 0 + b s+ b a s+ a c c s + a s+ a s 2 s s (4.4) (4.5) Please note that Y'(s) and H'(s) are related by looking at the denominator polynomials. We will use this information to say the parameters of circuit models (Pi-model and our H'(s) circuit model) that implement these two functions are related and consequently cannot be treated as independent parameters in our 2-level full-factorial designs for generating our target timing models, but we could not find an straight forward relationship between them. Further research is needed to determine if such a relationship can be expressed in a closed-form for all types of the RC interconnect networks. To generate the moments, as our first attempt, we used Modified Node Analysis (MNA) [76] to solve the circuit. MNA is very general, but it suffers from poor scalability because of the large matrices it needs. Then, we decided to use a method similar to Path Tracing (PT) [69], which is suitable for an interconnect network with a tree topology. A similar method that explicitly makes the impedance (Z) matrix of the circuit was used in RICE [77], which has a scalability problem because it is expensive to find the inverse matrix of Z, i.e. the admittance matrix, that we need. Because all the interconnect networks in our test circuit were trees, we could this method, but we needed the admittance (Y) matrix, therefore, we built our own tree traversal method suitable for our specific interconnect tree types in our test circuit. 76

101 There are some advanced methods for MOR with much better scalability. We include this one just for reference. For example, PRIMA [78]-[80] is a Krylov-based [81] projection MOR method that guarantees generating stable models. In projection MOR methods, the equations of a circuit (e.g. MNA equations) are mapped to a set of simpler equations with the fewer number of dimensions and the new set of equations is solved instead. As a summary, our goal was to find all the T's in symbolic form as the most general solutions. Building such a library could reduce the problem of circuit analysis to a simple function evaluation. We used the first approach at the beginning to find MOR's using (MNA) using the symbolic calculation engine of Matlab versions (2010a-2010b); however, the matrix operations were prohibitive considering our limited computing resources. Therefore, we came up to a solution based on equivalent admittance as symbolic expression, which means the R s and C s in the expression are not substituted with their values. We started finding the equivalent admittance and performing MOR for a case with a capacitance connected to a low-pass RC filter, which is a Pi-model, to verify its feasibility and we later extend our method to perform MOR for a RCinterconnect segment with many low-pass RC filters in series; however, this time, the complexity of the complete complex admittance functions impacted scalability of the solution because of using symbolic expressions although we could find a general solution for each interconnect segment as a symbolic expression. Such general solutions enable us to build a library of MOR models for interconnect segments with different lengths. Using the general solutions, you can find a MOR model just by plugging the parameters of the interconnect segments (i.e. R's and C's) into each symbolic expression. The 77

102 symbolic expressions can grow very large as we increase the number of the parameters of the interconnect segments and this makes the computations slow and resource intensive. Since simplifying the complex admittance functions makes the complexity of the functions manageable, we substituted R's and C's with their values in the data structures for each of our RC-interconnect segments (and our RC-interconnect network, later) prior to deriving a complex admittance function. Then, we used the moment matching techniques that we mentioned to obtain the equivalent Pi-models. We categorized the interconnect networks that we needed to reduce, to solve the problem for our specific cases of the interconnect tree networks; however, the tool was designed to be expandable for future cases. Although, we did not plan to make a tool to characterize interconnects based its parameter variations, we built the foundation necessary to make variation-aware interconnect models. That means expanding our compact variation-aware methodology from covering only gates to RC- interconnect networks. We built a tool to generate our ROM models for RC-interconnect networks because we did not have access to a MOR engine. To verify the correctness of our MOR models and our MOR engine we compared the output voltage signals of each RC-interconnect network with that of each corresponding Pi-model. In an effort to estimate the output voltage of each RC-interconnect network, we constructed H'(s) as a 2-pole ROM for H(s) as follows: k1 k2 H '( s) = + ( s p1) ( s p2) (4.6) 78

103 However, the voltage transfer function, H'(s), cannot be directly included in a netlist for circuit-level simulation; therefore, we designed a 2-port network with the same transfer function as shown in Figure We describe the circuit-level model details later and show the transfer function at this point. H'(s) is as follows: ( Rx. Cx + Ry. Cy(1 x)) s+ 1 H '( s) = 2 (4.7) ( Rx. Cx. Ry. Cy) s + ( Rx. Cx + Ry. Cy) s + 1 We used partial fraction expansion to determine the values of Rx, Cx, Ry, Cy, and x using the values of k1, k2, p1, and p2. We know that the time constant equations of H'(s) are as follows: T1 = -1/p1 T2 = -1/p2 (4.8) (4.9) We used an arbitrary value for Cx and Cy and set the values of Rx, Ry, and x so that Equation (4.6) matches Equation (4.7). The corresponding equations to do this are as follows: Cx = Constant Value (e.g. 1f F) Cy = Constant Value (e.g. 1f F) Rx = T1/C1 Ry = T2/C2 x = ((k1+k2)+p1) /(p1-p2) (4.10) (4.11) (4.12) (4.13) (4.14) 79

104 With 5 parameters, the general form of H'(s) could be characterized in our compact variation-aware cell methodology to find the output voltage of each RC-interconnect network, while the Pi-model provides us with the input voltage of the RC-interconnect network. Since these 5 parameters are related to (and not independent from) the 3 parameters of the Pi-model and the relationship could be very complicated and different for different topologies of the interconnect, we decided to incorporate only the Pi-model in our cell characterization and to include the H'(s) models just for our circuit-level simulations to verify the correctness of our MOR models and our MOR engine. That means our cell model will provide us with the voltage at the input of the RC-interconnect network and we propagate the signals through the interconnect using our timing characterized models just for our specific type of RC-interconnect networks in our test circuit. We emphasize that our research goal is cell characterization and not necessarily interconnect characterization. Therefore, we characterized our RC-interconnect networks connecting each two gates in our test circuit by just changing the input waveform slope using Hspice considering the fact that we did not have access to an interconnect timing simulation engine. Such an engine could complement our cell models in a commercial tool implementation. Moreover, the Pi-models could be generated using a MOR engine from a commercial tool although we built our own engine to show the feasibility of our compact variation-aware models methodology for resistive-capacitive loads. Figure 4.10 shows the Hspice model that we made to verify the correctness of our H'(s) model. Here, Rx and Cx constitute the lower low-pass RC filter and Ry and Cy form the upper low-pass RC filter. The lower low-pass RC filter is fed by a voltage- 80

105 controlled voltage source with the gain x while the upper low-pass RC filter is fed by a voltage-controlled voltage source with the gain (1-x). The output of the two filters are added algebraically using a unit gain voltage-controlled voltage source. H'(s) will convert the input voltage of the RC-interconnect network to its output. This conversion can be performed by an interconnect simulation engine in a CAD tool internally, but we made our own engine to generate H'(s) functions for the RC-interconnect networks in our test circuit for us. It is important to mention that the voltage-controlled voltage sources prevent the 2-port model to load Vin'(s) or to be loaded by the next level logic; consequently, we do not distort the waveform shapes because of loading. Figure Our H'(s) is a 2-pole and 1-zero reduced order model for H(s). We verified the correctness of our Pi-model and H'(s) model using a test bench that include both of the following circuits: (a) an inverter connected to each RC-interconnect network (e.g. the top picture of Figure 4.9) and 81

106 (b) another instance of the inverter with the same set of parameters connected to the equivalent Pi-model and H'(s) model of the circuit (a), as shown in Figure We changed the parameters of the circuit using the 2-level full-factorial design of Table 4.9 and performed transient analysis on both circuits and captured the inverter (and the interconnect) output waveform transition times and delay errors (with respect to the original circuit), which are displayed in Table 4.10, and the inverter output waveform transition timing points of 10%, 50%, and 90% plus equivalent resistance errors as shown in Table All the errors are relative to the value of the corresponding parameters of the circuit (a). Table 4.9. The 2-level full-factorial design variation parameters for verifying Pimodels and H'(s) models. Variable Variation Variable Variation Lp -20% to 20% Ln -20% to 20% Vtp -20% to 20% Vtn -20% to 20% T 0 C to 70 C C1 Fixed for each network Vdd -10% to 10% R Fixed for each network Slope 10 ps to 3 ns (for slope) C2 Fixed for each network According to Table 4.10, all the output waveform transition time errors are less than 1% for both the inverter and the RC-interconnect network. Please note that the Tplh and Tphl columns for the RC-interconnect networks actually refer to their input-to-output delay because the transition directions of the input and the output of RC-interconnect networks are the same. For the RC-interconnect networks the average errors are almost 0%. In Table 4.11, we compare the inverter output waveform transitions at their 50% and 90% points for a rising output waveform transition and at their 10% and 50% for a falling output transition. All the average errors are less than 0.07%. All the maximum errors but 82

107 one are less than 1%. We also compare the equivalent resistance for both rising and falling output waveform transitions of the inverter as a secondary measure to know if the inverter output waveform transition for the circuit (a) matches that of the circuit (b). This resistance is the average output resistance of 10-90% of the inverter output voltage that the inverter sees when it is connected the original RC-interconnect network in the circuit (a) or its Pi-model in the circuit (b). The average errors are less than 1%; however, the maximum error is about 91%. Since the average is very small, the error(s) could appear in one or a few cases of the 2 7 = 128 cases for each of the 11 RC-interconnect networks. Table Comparing output waveform transition time and delay errors (%) of our Pi-model and H'(s) model for all 11 RC-interconnect networks. Output of RiseTime Tplh(delay) FallTime Tphl(delay) Inverter Average Max Interconnect Average Max We also observed that our Hspice simulator in an effort to minimize the simulation time decreases the resolution. This could put small spikes, as computation noise, on the current through the interconnect or our Pi-model. The spikes were comparable in magnitude to the current, and consequently, made the instantaneous current very small and caused very large instantaneous resistance values making the average resistance much more than what it should be. By increasing the resolution of simulation, we could avoid most of such wrong values; however, we allowed Hspice to set its optimum resolution for all 1408 (128*11) transient analysis simulations and we did not have a chance to review all the 128 inverter output waveform transitions for all the 11 RCinterconnect networks because the average error was very small. 83

108 Table Comparing the inverter output waveform transition timing points errors (%) at 10%, 50%, and 90% points and equal resistance errors (%) of our Pimodel and H'(s) model for all 11 RC-interconnect networks. Output Transition T(10%) T(50%) T(90%) Req(10-90%) (Ω) Rising Average Max Falling Average Max We showed that our Pi-models and H'(s) are very accurate; therefore, we use our Pimodel to characterize a cell in the next section Cell Characterization with a Pi-Model Load We solved the problem of mapping the interconnect segments to their corresponding Pi-models. We show how to perform the cell characterization for an inverter with a complex load. Using a Pi-model for the load enables representing a resistive-capacitive load with only 3 parameters. In our previous compact variation-aware cell models in Chapter III, we include only one parameter for loading in variation model parameters, which is Fanout as we showed in Table 3.1. We also decided to use just a 1-parameter waveform transition shape, i.e. Slope instead of our 2-parameter waveform transition shape. This allows us to reduce the dimension of the new model by one and to show that our methodology for building a compact variation-aware cell model does not necessarily have to use a multi-parameter compact waveform model such as our PCA waveform model. Table 4.12 lists the new set of parameters along with their range of variation. Lp and Ln are the channel lengths of the PMOS and NMOS transistors. Vtp and Vtn are the variation in the threshold voltages of PMOS and NMOS transistors. Vdd is the supply voltage. T is the temperature variation parameter. C1, C2, and R are the capacitances and the resistance of 84

109 a Pi-model, which approximates the resistive-capacitive load of an RC-interconnect network. Slope is the 0% to 100% transition time of the input waveform transition time of a saturated ramp while we measure 10% to 90% transition time of the output waveform and adjust it to 0% to 100% when we use it. Table Variation model parameters for (resistive-capacitive) Pi-model loads. Variable Variation Variable Variation Lp -20% to 20% Ln -20% to 20% Vtp -20% to 20% Vtn -20% to 20% T 0 C to 70 C C1 Range of our samples * Vdd -10% to 10% R Range of our samples * Slope 10 ps to 300 ps (for slope) C2 Range of our samples * * C1, R, and C2 are from ff, Ω, and ff for our samples. The spread of C1, R, and C2 is available in Appendix D. We use a method similar to our standard cell characterization method that we described in Chapter III. Using a 2-level 10-parameter full-factorial design, we define the relationship between the 10-input parameters and each of the output parameters: (a) cell output-transition time, i.e. RiseTime or FallTime, which is the 10% to 90% output waveform transition time and (b) cell delay, i.e. TPLH or TPHL, which is time difference between the 50% time of the input to the 50 time of the output. The relationship between the input parameters and output parameters is computed using the Yate s algorithm [43], which is very efficient compared to linear regression. We have two sets of the mentioned output parameters: one for the rising output transition, which are GateRiseTime(.) and GateTPLH(.), and the other one for the falling output transition, which are GateFallTime(.) and GateTPHL(.). We generate the following functions as multivariable polynomials with 1024 terms (an average and 1023 effects): 85

110 GateRiseTime(Vdd, Slope, C1, R, C2, T, Ln, Lp, Vtn, Vtp)=... GateTPLH(Vdd, Slope, C1, R, C2, T, Ln, Lp, Vtn, Vtp) =... GateFallTime(Vdd, Slope, C1, R, C2, T, Ln, Lp, Vtn, Vtp) =... GateTPHL(Vdd, Slope, C1, R, C2, T, Ln, Lp, Vtn, Vtp) =... (4.15) (4.16) (4.17) (4.18) We call these our FF models, but we do not show the actual polynomials here for brevity because they are very long. We simplify the multi-variable polynomial using analysis of variance [43] to include the most significant terms up to 2 and 3 factors to build our FF2 and FF3 models. Table 4.13 shows all of our full-factorial models. Table Designs used for cell model construction with a Pi-model load. Design Points Model Terms Abbr. (2 10 ) All factors FF Full Factorial (2 10 ) Significant up to 3 factors FF3 (2 10 ) Significant up to 2 factors FF2 We compared the adequacy of the models using the following criteria: (a) The sum of squares of the residuals [43], 2 S R : They are compared in Table The smaller the sum of the residuals, the better fit is the model; therefore, FF models are the best and FF2 models the worst. (b) The coefficient of multiple determination [53], 2 R,and the adjusted coefficient of multiple determination [53], 2 R : They are compared in Table The closer to 100% the numbers, the more adequate the models. While the values for all the models are more 86

111 than 93%, the values for FF3 models are at least 98%, which is very close to the original FF models. (c) The coefficient of multiple determination for prediction [53], R 2 predict : They are compared in Table The closer the values are to 100%, the better is the prediction capability of the models. This criteria should be used when at least one of the terms is not present in polynomials of the models; therefore, it is not applicable for assessing the original FF models because all the terms have been used. We indicated this with an * in the table just to make the table consistent with other tables. The values for all the models based on FF2 and FF3 are more than 92%; however, the values for FF3 models are at least 98%. Table Sum of squares of residuals for cell models. Model RiseTime Tplh FallTime Tphl FF FF FF Table Comparing the model adequacy of the full-factorial models using coefficient of multiple determination (%) and adjusted coefficient of multiple determination (%) in parentheses. Model RiseTime Tplh FallTime Tphl FF 100(-) 100(-) 100 (-) 100 (-) FF3 99 (99) 99 (99) 98 (98) 98 (98) FF2 95 (95) 93 (93) 96 (96) 95 (95) Table Comparing the model prediction accuracy of the full-factorial models using coefficient of multiple determination for prediction. Model RiseTime Tplh FallTime Tphl FF * * * * FF FF

112 We can list all the full-factorial models based on adequacy and prediction accuracy from the best to worst as: FF, FF3, and FF3. Please note the values for FF3 are much closer to FF than the values for FF2 in general. Accuracy is not the only factor for choosing the models. We need to consider the size of the models (as a measure of space complexity) as well as their execution times (as a measure of time complexity). Table 4.17 shows the number of terms (i.e. the number of coefficients) in each equation for each function as well as the number of operations (total additions and multiplications) needed for their execution. By studying the table, we observe the FF model is the poorest in both space complexity and time complexity while FF2 is the best in both perspectives. We see that FF3 is placed somewhere between FF2 and FF, but it is much closer to FF2. Table Number of terms (number of operations) in cell models. Model RiseTime Tplh FallTime Tphl Total FF 1024 (6144) 1024 (6144) 1024 (6144) 1024 (6144) 4096 (24576) FF3 87 (296) 71 (273) 81 (274) 59 (196) 298 (1003) FF2 37 (101) 32 (85) 35 (94) 24 (63) 128 (143) Therefore, there is a trade-off between accuracy and both space complexity and time complexity. FF3 seems to be the optimal choice, because it exhibits an accuracy very close to that of FF and a very close space complexity and time complexity to that of FF2. However, all three models can be used depending on our priorities. Although we demonstrated the cell characterization for an inverter with one input, the methodology can be generalized to multi-input cells by adjusting the number of input parameters and building the models for each timing arc, which is defined as the path of signal change propagation from each input to the output. 88

113 RC-Interconnect Network Characterization We showed in Section how we can find H'(s), which is the MOR model for an interconnect transfer function of H(s). In this section, we describe possible approaches to perform RC-interconnect characterization and our selected approach. In general, having a 2-pole 1-zero H'(s), we can find the h'(t) in the time domain as an equation in the following form: h' ( t) + p1. t p2. t = k1. e k2. e (4.19) Here, p1 and p2 are the poles of the H'(s) and k1 and k2 are the constants for each component. Using this equation, it is possible to obtain the delay of the RC-interconnect network using the analytical step response and ramp response of h'(t); however, the delay equation must be solved using numeric methods since there is no closed-form formula for it [69]. The step and ramp response equations as a function of time are as follows: v ramp v k p1. t step ( t) = e + p1 k p 2 2 e p. t 1 2 pi. t pi ( t tr ) [( e 1 p. t). u( t) ( e 1 p ( t t )). 2 1 ki ( t) = i i r u( t tr ) (4.21) t p r 1 i (4.20) Here, t r is the input-transition time of the saturated ramp as the input and the approximate delay of an RC-interconnect network for an arbitrary waveform transition can be found by mapping the input waveform transition to a saturated ramp. For example, to find the delay using one of the equations, we should solve the equation for time when the value of the voltage is at 50%. We can find the delay by 89

114 subtracting the time for 50% of the input voltage from the calculated time, the time for 50% of the output voltage. Moreover, it is possible to find the slope of the output voltage similarly by subtracting the time for 10% (90%) of the output voltage from the time for the 90% (10%) the output voltage for a falling (rising) waveform transition. Therefore, the following formulas can be used to find the 50% input to 50% output delay and the 10% to 90% output slope for falling and rising output waveform transitions: delay slope slope = out in T ( V = 50%) T ( V = 50%) falling = out out ri T ( V = 90%) T ( V = 10%) sin g = T ( Vout = 10%) T ( Vout = 90%) (4.22) (4.23) (4.24) Because for an arbitrary input waveform transition there is not such simple closedform solutions, we decided to characterized our interconnect segments using Hspice. If we could have found a method to characterize RC-interconnect networks using our compact statistical waveform model, we could have included it in our cell characterization methodology to be able to determine the approximate output waveform transition of an RC- interconnect network for its input waveform transition, which is the output transition waveform of the gate feeding the RC-interconnect network. As we explained before, we decided to include this as in item in our future research direction and use a 1-parameter saturated ramp waveform transition model to examine if our methodology could still be applicable; and if so, what modifications were necessary, if any. Therefore, instead of solving the ramp response equation numerically to find the output transition times and delays, our RC-interconnect characterization was done by 90

115 sweeping the input slope to obtain the output slope and delay of each RC-interconnect network in our test circuit, which we will describe in the next section, and storing the response surfaces in tables. We use linear interpolation to estimate the response points not recorded in the tables using the following functions: InterconnectTransitionTime (C1(i), R(i), C2(i), Slope) InterconnectDelay (C1(i), R(i), C2(i), Slope) (4.25) (4.26) By comparing the output transition time and delay functions with that of cells that we developed before, we observe the following differences: (a) We have only one output transition function that covers both the rising and falling transitions. (b) We have just one delay function that is good for both types of delay for each type of input transition. (c) Here, the functions are not dependent on supply voltage. (d) Here, the functions are not dependent on temperature. The reason for all the above is our RC-interconnect networks are linear systems consisting of ideal resistors and capacitors. Although real resistors and capacitors are temperature dependent, we did not include it in characterization because we did not have their temperature dependent resistor and capacitor models. Since our goal was just to show the effectiveness of our method for cell characterization and not interconnect characterization, we characterized our specific interconnects just for Slope. At the beginning, we were looking for a compact model 91

116 applicable to all types of RC-interconnect networks. To form such a model, we need a few independent parameters to represent an arbitrary RC-interconnect network. We can use the parameters p1, p2 and one of k1 or k2 (or Rx, Ry, and x) and add a few parameters for the waveform transition, or just one parameter as Slope in its simplest form, to build a set of 2-level full-factorial compact models for the output transition time and delay of an arbitrary RC-interconnect network. Please note that k1 and k2 are interdependent because of initial conditions; therefore, we should use one of them. To build reasonably accurate models, we need to determine the optimum range of the parameters in the model because the accuracy of our statistical models is dependent on the range of the data that we build the models for. We did not verify if such models could be built with good accuracy and left answering to this question as an item for our future research direction, i.e. building compact interconnect timing models Test Circuit and its Pi-Model-Converted RC-Interconnect Networks Our test circuit is the clock tree of a JPEG2 Encoder which was synthesized and mapped into NCSU 45 nm standard cells at one of the laboratories of the Georgia Institute of the Technology. Figure 4.11 depicts the 10-level tree that we used to verify our compact variation-aware cell models. Although the original tree is not a binary tree, we have shown a binary subtree with 1024 leaves for simplification. The tree consists of minimum sized inverters connected by RC-interconnect networks. We have expanded just the first level of the tree and have showed several paths of the tree from the starting inverter to the final load, which is shown as a box. The first RC-interconnect network is fed by another inverter, which is not part of the tree. The tree has about 500 inverters and 500 RC-interconnect networks with about 6200 resistors and 6500 capacitors. We have 92

117 marked one of the paths to show a sample path that can be chosen to perform a pathoriented static timing analysis. It starts from the root of the tree and ends at one of the final leaves. Figure Our test circuit is a JPEG2 Encoder clock tree. Usually several critical paths should be included in static timing analysis, but we can use just the slowest path without losing generality. The slowest path was determined using Silicon Encounter [82] by applying our slowest transition to the input of the clock tree. The tool sorts the clock tree branches according to their node delays. In Figure 4.12 the slowest path is the topmost one and the fastest path is at the bottommost one. Supposing the highlighted path in Figure 4.11 is the slowest inverter chain, we show its details in Figure We have 11 inverters, out of which, only the first one is not part of the tree, but has been included to drive the first interconnect network. Each 93

118 inverter is connected to the next one through an RC-interconnect network. Here, we just show the input and the specific output of the RC-interconnect network that connects to the next inverter while the other interconnect network outputs are terminated with the input capacitance of the cell connected to them. Figure Choosing one of slowest critical paths of the JPEG2 Encoder clock tree. Each RC-interconnect network consists of one or more RC-interconnect segment(s) connected to each other. We define an RC-interconnect segment as a series of RC lowpass filters with one input and one output. We had access to the Spice netlist of our JPEG2 Encoder as well as its SPEF file. The total number of capacitors and resistors in the all RC-interconnect networks in the 94

119 selected critical path is 220 and 200, respectively. We could partition the RCinterconnect networks using the SPEF file and mapped them into a set of interconnect segments that we encountered in the selected critical path. We categorized them into 5 types, which are shown in Figure While the academic MOR tool that we made using Matlab, and named it GT_MOR, supports only these interconnect network types, support for other type of networks can be implemented in future versions. Figure The abstracted inverter chain used for our timing analysis. Each RCinterconnect network is made of one or more RC-interconnect segment(s) and each interconnect segment is made of one or more cascaded RC low-pass filters. We used the SPEF file and Spice netlist to extract the critical path netlist and its RCinterconnect networks to create the inverter chain Hspice netlist and our Hspice cell characterization test bench. Moreover, we stored the interconnect networks as Hspice subcircuits to be able to compare the accuracy of our Pi-models with the original RC- 95

120 interconnect networks. It is important to remember not to include the load capacitance of the next gate in the each interconnect subcircuit when we instantiate them in the inverterchain netlist to avoid having duplicated loads because the input capacitance of the gate is already present. It is necessary to include this capacitance for finding the equivalent Pimodel as well as Y'(s) and H'(s). For each RC-interconnect network, we replace it with its equivalent Pi-model with Y'(s) admittance. Since we cannot get the interconnect output waveform at the other end of the Pi-model, we had to include our voltage transfer model, H'(s). That means we replaced each interconnect network with a Pi-model and connected the gate output voltage to the input of our H'(s) Spice network, which is shown in Figure This transformation is depicted in Figure 4.15, which makes our Pimodel-converted interconnect networks. Figure The interconnect network types reduced to a Pi-models by our tool (GT_MOR). 96

121 In the next section, we will describe the timing analysis engine that we built to assess our methodology. Figure To test our Pi-model in the inverter chain at Spice level the RCinterconnect networks were replaced by sets of Pi-model and H'(s) Timing Analysis Engine for Our Cell Models and RC-Interconnect Models We have built our cell models and RC-interconnect models. We can use them to perform static timing analysis. We find the total delay for our 11-stage inverter chain, which is shown in Figure 4.13, after replacing each RC-interconnect network by its Pimodel. Our static timing analysis engine was coded in the C programming language using the algorithm shown in Figure The algorithm uses a two-step delay approximation [69], in which a delay has two components added together: a gate delay and an Piinterconnect model delay. Please note that the algorithm is shown for the falling input waveform transition; the algorithm for the rising input transition is similar, which is the dual of this one. The algorithm is based on the following principles: (a) The gate delays and output transition times for even (odd) stages are given by GateTPHL (GateTPLH) and GateRisetime (GateFalltime), respectively. 97

122 (b) The RC-interconnect network Pi- model delays and output transition times, regardless of the transition direction, are given by InterconnectDelay and InterconnectTransitionTime, respectively. The parameters (Cipi, Rpi, and Cjpi) are fixed for each stage. (c) Propagation of an input waveform transition time through a gate (RC-interconnect network Pi- model) determines its output waveform transition time using GateTPHL and GateTPLH (InterconnectDelay). (d) Each gate (RC-interconnect network Pi- model) input transition time is equal to its previous RC-interconnect network Pi- model (gate) transition time. This determines all the gates and RC-interconnect network Pi- model input transition times. (e) Having all the input waveform transition times, all gate delays and all RCinterconnect network Pi- model delays can be determined. (f) Adding each gate delay to its next RC-interconnect network Pi- model delay determines its stage delay. (g) The total delay is the sum of all the stage delays. A waveform transition time, Tr, is the time between its 10% and 90% of the supply voltage transition that have been mapped into a saturated ramp. For a fixed supply voltage, there is a one to one correspondence between a 10-90% transition time and its slope as follows: Slope = 1.25 * Tr(10-90%) (4.27) In general, Tr can be substituted with a multi-parameter waveform model, like our PCA waveform model with 2 parameters, and the waveform parameters propagate 98

123 through the chain. Delays must be a function of the parameters of the waveform model. In this algorithm Tr has been mapped into a 1-paramters saturated ramp. TotalDelay(0) = 0; Tr(0) = Falltime10to90Percent(0) for all stages (i=0 to 10){ if (Even Number) { GateDelay(i) = GateTPLH(C(i), R(i),C(i), Tr(i)); GateOutputTransitionTime(i) = GateRisetime(C(i), R(i),C(i), Tr(i)); } else { GateDelay(i) = GateTPHL(C(i), R(i),C(i), Tr(i)) GateOutputTransitionTime(i) = GateFalltime(C(i), R(i),C(i), Tr(i)) } InterconnectDelay(i) = InterconnectDelay (C(i), R(i),C(i), Tr(i)); InterconnectOutputTransitionTime(i) =InterconnectTransitionTime (C(i), R(i),C(i), Tr(i)) StageDelay(i) = GateDelay(i) + InterconnectDelay(i) TotalDelay(i+1) = TotalDelay(i) + StageDelay(i) Tr(i+1) = InterconnectOutputTransitionTime(i) } Figure Timing simulation algorithm with RC-interconnect networks support. This algorithm is a generalization of Equations 3.10 that we used for our tabular static timing analysis engine, but our cells are characterized in a way that is very similar to what we did for our cell models when we used our PCA waveform model. The only 99

124 simplification is the replacement of the PCA waveform model with a saturated ramp waveform model. We may lose some accuracy because of using a simpler waveform model, but it allows us just to concentrate on interconnect modeling for cell characterization in a framework of an academic research project. As a future plan, a multi-parameter waveform model could be used in conjunction with the Pi-model for cell characterization and timing analysis; however, the waveform model should be capable of the large range of waveform shape variations of interconnects. The authors of [83] emphasized the need for a effective waveform for cell characterization and introduced a moment-based solution on how to fit the best saturated waveform for a sample of input waveforms using an optimization problem solving approach Timing Simulation and Simulation Results We performed static timing analysis (STA) on our test circuit, the inverter chain, and compared the results for the following methods: Method 1 (Hspice): Circuit-level STA with the original RC-interconnect networks. In this method, the original inverter chain is simulated at the circuit level using Hspice. The results of this method, as our golden model results, are used to measure the level of accuracy of Method 2 and Method 3. Method 2 (Pi-Model): Circuit-level STA with our Pi-model-based interconnect loads and our H'(s) circuit model. In this method, the RC-interconnect networks in the inverter chain are replaced by their corresponding Pi-models, which were built using our Pimodel generation engine. As mentioned before, Pi-models could not provide us with the output waveform transition at the end of each RC-interconnect network; however, they can provide us with the output waveform transition of the inverter feeding the RC- 100

125 interconnect network. Hence, our H'(s) model, implemented as a circuit, as in Figure 4.10, is used to generate the output waveform of each RC-interconnect network. The simulations, performed at the circuit level, enable us to evaluate the accuracy of our Pimodels and our H'(s) by comparing the results with Method 1. Method 3 (FF-Model and its family, i.e. FF3-Model and FF2-Model): High-level STA with our compact variation-aware cell models and interconnect models. In this method, our high-level models are used. The inverters are replaced with our compact variation-aware models and the RC-interconnect networks are replaced with our interconnect-characterized timing models. Since our compact variation-aware cell model is the high-level timing model of a cell with a Pi-model load, comparison of the simulation results with those of Method 2 could reveals how much inaccuracy could originate from the abstraction of our circuit-level model using our timing characterization methodology, which is based on 2-level full-factorial designs and the analysis of variance. Comparing the results of this method with those of Method 1, determines the accuracy of our methodology considering both the high-level cell modeling and the Pimodel RC-interconnect network abstraction used in our high-level cell modeling. Using all the 3 methods, we performed a set of 24-STA simulations, i.e. 12 for the rising input waveform transition and 12 for the falling input waveform transition, for the following cases. Case a: Nominal parameter values condition: The process parameters (Lp, Ln, Vtp, Vtn) and environment parameters ( T and Vdd) are kept at their nominal values and only the waveform transition shape parameter (Slope) varies linearly in its range in Table The nominal parameter values for our 2-level full-factorial designs are the center 101

126 values of parameters. The Pi-model parameters (C1, R, and C2) are fixed for each RCinterconnect network. Case b: Random parameter values condition: All process parameters (Lp, Ln, Vtp, Vtn) and environment parameters ( T and Vdd) and the waveform transition shape parameter (Slope) vary randomly in their range in Table The Pi-model parameters (C1, R, and C2) are fixed for each RC-interconnect network. Figure 4.17 compares total delays at each stage through the inverter chain for our three different methods for a falling input transition for two cases for (a) nominal parameter values conditions and (b) random parameter values conditions. For each case, although we have included only one plot from each set of 24-STA simulations for in this document, we studied all the plots and concluded that both the Pi-model-based and FFmodel-based simulations track the results of Hspice simulations reasonably; however, all the Pi-model-based simulations overestimated the delays and all the FF-model-based simulations underestimated the delays. Since we characterize our cells using a saturated ramp, the cell delay can be underestimated for an actual input waveform transition abstracted as a saturated ramp. Figure 4.18 shows such a situation that characterizing a cell using a saturated ramp waveform model can be troublesome. We use the 10% point to the 90% point of the waveform transition to find the transition time and slope; we know that the 50% point for a saturated ramp is always in the middle of the 10% and 90% points while the 50% point for a real waveform shape is not necessarily in the middle of these points. Consequently, the delay estimates have some error due to imperfection of the cell characterization used at the circuit level. 102

127 (a) (b) Figure Comparison of delay for the three methods (Hspice, Pi-Model, and FF) at (a) nominal parameter values and (b) random parameter values. 103

128 Figure Cell characterization using a saturated ramp could result in a delay error. In STA, the overestimated delays are more desirable than the underestimated delays. According to a recent conference publication authored by famous timing analysis researchers from IBM [83], fitting a saturated ramp waveform for cell characterization can result in up to 16% delay variation for one cell. In our work that uses standard saturated ramp waveform for cell characterization, our maximum delay per stage error is 2.2%, and 2.9% for nominal parameter values case and random parameter values case. While the authors of [83] provide their own solution for the better fitting of a saturated ramp toward better cell characterization, they reported that using 20% to 80% waveform transition points makes it possible to overestimate the transition time and slope. Moreover, it is possible to overestimate the delay in cell characterization by adding some marginal values to the delays, e.g. by applying a falling input waveform transition to an inverter and measuring the delay as the time between the 60% point (instead of the 50% point) of the input waveform to the 40% point (instead of the 50% 104

129 point) of the output waveform transition. We believe using a better waveform model for characterization can improve the accuracy of timing models and we proposed our PCA waveform models for this reason; however, in our academic research framework we had to limit the scope of our research. Although our methodology gives reasonable delay estimates, they can be improved using some of the mentioned techniques. To have a better understanding of the delay error trends, we plotted the relative delay error per stage, in percentage, for Method 2 (Pi-model) and Method 3 (FF-model its family, i.e. FF3-model and FF2-model) for the inverter chain. Please note that the delays obtained from Method 1 (Hspice) were used as the basis to obtain the relative delay errors. Figure 4.19 shows the error per stage, in percentage, for cases (a) nominal parameter values condition and (b) random parameter values condition. We have included one input transition waveform direction for each case, i.e. low-to-high (high-tolow) output transition of the first gate for the first (second) case. The plots for the other input transition waveforms are similar. We can draw the following conclusions from studying the plots. (a) The errors per stage for the Pi-model are positive while the errors per stage for the FF-models are negative. This means the Pi-Model overestimates the total delay while the FF-Model and its family underestimate the total delay. (b) The errors per stage for the Pi-model are less than 0.5% for most cases and less than 0.8% for all the cases; therefore, our Pi-models are very accurate. (c) The errors per stage for the FF-model and its family are less than 3%. Our FFmodel and its family are less accurate than our Pi-model, but the errors are much smaller 105

130 1.0 Per Stage Error (%) HL(FF) HL(FF3) HL(FF2) HL(Pi-Model) -2.5 (a) Per Stage Error (%) (b) LH(FF) LH(FF3) LH(FF2) LH(Pi-Model) Figure Comparison of delay errors, in percentage, using Pi-Model and FF- Model and its variations, i.e. FF3-Model and FF2-Model, at (a) nominal parameter values and (b) random parameter values. 106

131 than the 16% delay variation per cell that we mentioned before; however, we will show that timing analysis using our FF-models is about 2880 times faster than Hspice simulation. (d) The errors per stage for the FF-model and its family are very close. This means that we can make the models smaller and faster and get almost the same results. FF2 models are our smallest models and the fastest but the least accurate models of the family. All our STA simulations were performed on a Dell PowerEdge 2970 with a dual-core Intel Xeon E5440 CPU (2.83 GHz) and 16 GB of RAM under a Red Hat Enterprise Linux Server. The FF-model and its family were generated using Matlab R bit on a Dell Optiplex 980 with dual-core Intel Core i5 CPU (3.19 GHz) and 8 GB of RAM. Table 4.18 compares the simulation time, the characterization time, the total number of operations, the memory usage and the per-stage accuracy of all the STA methods. The characterization time has the two components of Hspice simulation time and the FF- Model generation time. The number of operations column does not have any meaningful values for the comparison between the first two circuit-level simulations and the rest of the methods. The memory usage for Hspice and Pi-Model methods is the memory that our Hspice simulator reported to us and cannot be compared with the memory usage of our FF-models family. The table shows the trade-off between time, memory, and accuracy for all the methods. The accuracy column shows the average accuracy of all the 24 simulations. In Appendix E, we compare the space complexity, the simulation time complexity, and the characterization time complexity of the FF family models and the tabular models. 107

132 Table 4.18 Comparing STA time, the characterization time, the total number of operations, the memory usage and the accuracy of all the STA methods. Method STA Time(s) Characterization Time (s) Number of Operations Memory Usage (kb) * Per-stage Accuracy (%) Hspice ** 100 Pi-Model ** FF-Model FF3-Model FF2-Model * The estimate was found by adding the total number of operations to the total number of terms and multiplying the result by 8/1024, supposing 8 bytes per double floating point numbers and 8 bytes of machine code per operation. ** This number is the total memory usage reported by our Hspice simulator. By studying Table 4.18, we can draw the following conclusions: (a) Using our FF2-model can speed up the simulation time by 2880 times while losing about 2.03% per-stage accuracy. (b) The major characterization time for the FF-Model and its family is for Hspice simulation of the Pi-Models; therefore, our methodology is not very expensive. The times for generation of our FF-Models are about 0.7% of the total characterization time, which are included under the characterization time after the plus sign. (c) Our FF2-Model lookup is 171 times faster than that of FF-Model considering the number of operations. (d) The memory usage of our FF2-Model is 112 times smaller than that of FF-Model. (e) As we described before, although we did not use an ideal cell characterization method, the per-stage accuracy of our FF-Model and its family is about 97.95% while the accuracy of the Pi-Model is about 99.46%. (f) Using Pi-Models for RC-interconnect networks can speed up the simulation at the circuit level by 2 times while losing about 0.64% per-stage accuracy. The memory usage of the Pi-Model method was about half of that of the Hspice method. 108

133 (g) Although FF2 offers much better simulation time and memory usage, considering the fact that all the family of FF-Models are statistical models and we have checked the model adequacy and prediction accuracy, using FF3-model is preferred to be on the safe side. The FF3-Model speeds up the simulations up to 1440 times while the models are 25 times smaller than the FF-model Conclusions and Future Work Our compact variation-aware cell methodology allows generating cell timing models that incorporate the variations in semiconductor process parameters and environment parameters such as supply voltage and temperature. The timing models, i.e. cell models, can be used to perform statistical static timing analysis using a static timing analysis engine to run Monte-Carlo-based path-oriented static timing simulations. They can also be used to characterize cells to build a statistical timing library for block-based statistical static timing analysis engines. The simulation times can be an order of a couple thousand times faster than their circuit-level equivalents while some accuracy is lost due to model abstraction. We showed that our compact variation-aware cell methodology can be extended to interconnect dominant circuits using a Pi-model to represent the RC-interconnect networks and verified our Pi-model conversion engine. We also showed that our methodology is applicable even to a single parameter waveform transition model although we used a two parameter waveform transition model, our PCA waveform model, in Chapter III. We used a methodology very similar to what we did for our previous compact-variation aware cell models that did not support RC-interconnect networks and developed the family of FF-models. Using our static timing simulation 109

134 engine, we simulated an inverter chain from a clock tree network of a JPEG2 encoder covering both the cases of (a) nominal parameter values conditions and (b) random parameter values conditions. Then, we compared the simulation time, the memory usage, the characterization time, and the accuracy of all our STA simulation methods including (a) circuit-level STA for the original circuit with RC-interconnect networks, (b) circuitlevel STA for the chain with our Pi-model-converted RC-interconnect network circuits, (c) High-level STA using our cell full-factorial models. We found that the Pi-model conversion for RC-interconnect networks at the circuitlevel has a per-stage accuracy of 99.46% while the memory usage and the simulation times were reduced to half. Moreover, we learned that our full-factorial models can offer a per-stage accuracy of 97.95% while the simulation times were improved by a factor of We pointed to the imperfection of the cell characterization using a saturated slope that we used and reviewed some suggested accuracy improvement solutions. We had to limit the scope of this endeavor; however, based on the insight we gained from this experience, we believe this research could be continued in the following directions with a good chance of success. (a) Extending the methodology to multi-input cells: The process of applying the methodology to multi-input cells is straightforward; however, the number of parameters increases, which increases the characterization time. It is interesting to know how the size of the models and the accuracy will be affected by building variation-aware cell models for several multi-input cells and evaluating the models. (b) Finding more general compact waveform models: Building alternative compact waveform models for cell characterization is necessary since the 1-parameter waveform 110

135 model using a saturated ramp makes the cell characterization inaccurate and our PCA waveform model are accurate only in a finite range and for specific shape types. (c) Including a multi-parameter compact waveform model in our improved cell characterization with RC-interconnect network support: Improving the cell characterization accuracy using multi-parameter waveform models should be possible with keeping the support for Pi-model-converted RC-interconnect networks. (d) Building compact variation-aware interconnect network models: Building compact interconnect network timing models using our cell models methodology should be possible; however, the support for a good compact waveform model capable of handling various waveform shapes at the output of the interconnect networks is necessary. (e) Improving the accuracy of our cell models: Although using 3-level full-factorial designs are more expensive than 2-level full-factorial designs, they should improve the accuracy of the cell models, however; we may need to use linear regression to make our cell models and that will be more expensive than using the combination of the Yate algorithm and the analysis of variance that we used Investigating Accuracy Improvement Methods We want to know how we can improve the accuracy of our variational waveform model and our cell models. When our cell models use our variational waveform model, increasing the accuracy of the waveform model can also improve the accuracy of the cell models. Moreover, the accuracy of our waveform models and cell models depends on the 111

136 accuracy of the circuit simulation and the accuracy of our statistical models. We categorized the sources of error in our waveform model and our cell models as follows. (a) Circuit level simulation errors: We use transient analysis to obtain the set of waveforms for both waveform and cell models. Using automatic selection of the simulation time resolution makes the simulations needed for cell characterization much faster but reduces the accuracy by introducing some error. Therefore, an optimum simulation time resolution must be selected neither to lose the accuracy nor to waste the simulation time because increasing the simulation resolution more than a limit simulation increases the simulation significantly while the accuracy of our captured parameters almost do not increase at all for our purpose. (b) Experimental design errors: We sample a subset of all the possible waveform space to make manageable the simulation time for waveform model generation and cell characterization; therefore, some error is introduced in the model because we cannot afford to cover all the output waveforms space. For example, using full-factorial models ignores all the remaining points. In general, including more sampling points at strategic locations in designs can improve the accuracy of the models; therefore, building models using designs with more sampling points, such as central composite designs [52] and 3- level full-factorial designs [52], should improve the accuracy of the models but increases the characterizing time. It is interesting to see if the increase of the accuracy is worth the characterization time overhead. (c) Statistical modeling errors: For our variational waveform model, we used principal component analysis to represent waveforms just with a few principal components with a very high variance coverage (more than 95%) but the residual errors 112

137 resulting from ignoring the low order principal components are still present. Moreover, in our cell models when we use linear regression models, we introduce some residual errors to the model due to the lack of fit. With full-factorial designs, we can reduce the lack of fit to 0, but when we use just a fraction of the terms to have a compact model, we introduce some lack of fit error. We describe the accuracy improvement analysis for our waveform model in Section In Section 4.4.2, we explain briefly the accuracy improvement analysis for our cell models Accuracy Analysis of the PCA Waveform Model Accuracy of our waveform model is dependent on accuracy of the PCA model. In Chapter III, we presented a qualitative factor to choose the most accurate method from the methods of PCA model construction (SNM, SSM, and ASM). We extend the accuracy analysis discussion with presenting some quantitative factors. Since we discretized voltage waveforms across the voltage axis to find the corresponding time vector, the discretization level (number of samples) and the locations of samples affect the accuracy of the waveform and its corresponding PCA model. We want to find the minimum number of the samples and the best locations for the samples Accuracy Analysis of the PCA Waveform Model Number and Location of Points The discretization level for waveform modelling was chosen in order to have straightforward voltage levels for transistors in the technology that was used in Chapter III. However, accuracy of the PCA waveform model is a function of 113

138 (a) the number of discretization levels along the voltage axis, and (b) the choice of voltage levels to discretize the waveform on the voltage axis. To analyze waveform model accuracy with respect to the number of discretization levels along the voltage axis, seven discretization patterns were compared. These are summarized in Figure 4.18, with between three and 19 levels. 19 Levels Voltage 15 Levels 10 Levels 5 Levels -- case 1 5 Levels -- case 2 5 Levels -- case 3 3 Levels Figure PCA waveform discretization patterns for the voltage scale. Figure 4.19 compares the accuracy of the uniform discretization plans using the Sum of Squares of Error (SOS). It can be seen that at least 10 points are needed to achieve high accuracy, and that increasing beyond 10 points does not increase accuracy much. However, it should be noted that as few as five points can achieve high accuracy if they are appropriately placed. Additionally, fewer discretization levels result in fewer principal component basis functions. However, even with the worst case that we studied where we considered 19 levels for discretization, two principal component basis functions covered 99.8% of the variation. 114

139 0.6 PCA Waveform Accuracy Sum of Squares of Error Uniform Discretization 5 Levels -- Case 1 5 Levels -- Case 3 Figure Increase in accuracy of the PCA waveform as a function of the number of discretization levels Accuracy Analysis of the PCA Waveform Model for TSMC180RF Waveform Dataset Selection, Range of Parameter Variations, and Model Subranging In this section, we explain how the accuracy of the PCA waveform model is affected by the selection of the waveform dataset, the range of parameter variation, and the model subranging. This discussion is related to our variational waveform model that we developed in Chapter III for our cells based on TSMC180RF technology. We could simplify cell characterization to reduce the memory requirement for cells by using a single variational waveform model for both rising and falling transitions and for both input and output waveforms for the inverter cell. This was possible by 115

140 (a) generating the rising and falling output waveforms by using a 2-level full-factorial design to cover the space of all parameters, i.e. slope, fanout, and process and environment parameters, (b) combining the dataset of the rising waveforms with that of the falling waveforms to obtain a general waveform model using PCA on the combined datasets, and (c) modifying the PCA basis functions of the general waveform model iteratively to find a common set of basis functions for both inputs and outputs a cell. However, all of these actions can adversely affect waveform accuracy. A full-factorial design only samples the response space at all the combinations of its extreme input parameter levels; therefore, it can introduce inaccuracy if it cannot capture the nonlinearity of the response space where it does not have any samples from. For example, delay is a nonlinear function of slope and fanout; consequently, a full-factorial design can introduce more inaccuracy for a larger range of slope and fanout. It seems logical that constructing the models for a subrange of these variables can increase accuracy; we call this technique model subranging. Model subranging should be an effective accuracy improvement method when the surface response in selected subranges has less nonlinearity than in the original ranges. It is similar to concept of binned transistor models introduced in Section that binned models, in general, are more accurate in each bin than their non-binned models. In chapter III, we constructed the waveform model for the small range of variations (5%) for the process parameters, i.e., channel length and threshold voltage of transistors, while we used the full range of both slope and fanout. We want to know how the waveform model is affected by 116

141 (a) using a subrange for both slope and capacitance, and (b) using a large range of variation (30%) instead of the small range of variation (5%). We constructed 5 waveform models, as shown in Table 4.19, for an inverter based on TSMC 180 nm technology. Each row shows the model number, model name, whether it uses rise time and/or fall time waveforms, the subrange of slope and fanout used in percent, and the range of process parameters variations in percent. Table Waveform Models Compared for Accuracy TSMC180RF. # Model Name Waveforms used Subrange of these parameters (%) Range of Process Parameters Trise Tfall slope Fanout Variation (%) 1 tr_gn tr_sub_gn tr_sub_large_gn tr_sub_large_sn tf_sub_large_sn We use several criteria to compare the error introduced in a transition using a PCA model. The errors are normalized to the range of original waveform transition; therefore, we can compare the accuracy of all models although they are based on different sets of waveforms. Waveform model 1 is based on using both rising and falling waveforms, while we use the whole range of slope and fanout and keep process parameters variation at 5%. The settings to obtain waveform model 2 is similar to the settings of waveform model 1, but we use just the 25% range of fanout. We increase the range of process parameter variation from 5% to 30% to construct waveform model 3. These three models are used 117

142 to evaluate the effect of subranging fanout and increasing the range of process parameter variations. Waveform model 4 and 5 use only the rising waveforms or the falling waveforms that we used to construct waveform model 3, respectively. We use waveform models 3-5 to compare the accuracy of the waveform model obtained by combing the rising and falling waveforms with the waveform models based on only the rising or falling waveforms. Figure 4.20 compares the accuracy of all 5 waveform models. It shows the maximum, the average, and the maximum of the average relative errors for (a) a complete transition (19-points) and (b) the 10% to 90% of transition (15-points). Waveform Accuracy - Max, Average, Max. Ave. Relative Error Max.(19-pt) Average(19-pt) Max. Ave.(19-pt) tr.gn tr_sub.gn tr_sub_large.gn tr_sub_large.sn Waveform Model tf_sub_large.sn Max. Ave.(15-pt) Max.(15-pt) Figure Waveform accuracy Max, Average, and Max. Ave. (TSMC180RF). 118

143 In general, subranging increases the maximum and the average relative errors; however, combining subranging with a large range of process parameter variation (30%) does not increase the average relative errors much (4%). Moreover, combining the rising and falling waveforms does not decrease the accuracy much when we look at the last three waveform models. The points outside of the 10% to 90% of a waveform transition should not affect a gate delay much because one of the transistors is almost off; therefore, if we observe an error increase (or decrease) inconsistency between the 15- and 19- point cases, we should accept the error increase (or decrease) of the 15-point case. We have included all, just to show the maximum relative error at all the 19 points of a transition. We use the Mahalanobis distance [84] to measure how similar a PCA waveform model, using just the two first principal components, is to the original waveform. The Mahalanobis distance takes into account the correlation among timing points at different voltage levels while the maximum relative errors or the maximum average relative errors, presented earlier, do not consider this correlation. It is actually the square root of the distance of the two vectors, i.e. the difference between the actual time points of the waveform transition and their estimated values from our PCA waveform model in its transposed form, multiplied by the inverse of the two sets correlation matrix, and finally multiplied by the difference vector. The Mahalanobis distance between the two vectors x r and y r is defined as: r r d ( x, y) = r ( x r y) T S 1 r ( x r y) (4.28) 119

144 Here, S is the matrix for the covariance between x r and y r. If the covariance matrix is an identity matrix, the Mahalanobis distance reduces to a Euclidean distance [42]; and, if the covariance matrix is a diagonal, it reduces to normalized Euclidean distance, which is defined as follows for N-dimensional vectors: r r d ( x, y) = N ( x i y i ) 2 σ 1 i 2 (4.29) In our PCA waveform model, the timing points at different voltage levels are correlated; therefore, we need to use a measure of distance that takes into account this correlation, which is the Mahalanobis distance. We could use the normalized Euclidean distance only if the timing points at different voltage levels were independent. The Euclidean distance is not a good choice, either. Figure 4.21 shows the maximum and the average of the Mahalanobis distance for all 19 points and the 15 middle points (10% to 90% points) of a transition. We see that 10% to 90% of transitions are more similar to the original waveforms than the whole transition whether we look at the maximum or the average Mahalanobis distance. Comparing the first three waveform models, we observe subranging increases the accuracy, while increasing the range of process parameter variation decreases the accuracy. Comparison of the last three waveform models shows that combing the rising and falling waveform models increases the overall accuracy of the waveform model for rise time and decreases the waveform accuracy for the fall time. Figure 4.22 compares the waveform model accuracy for all the waveform models just for the 50% point that we use for delay calculation. We observe the maximum 120

145 introduced error is less than 5% and the average error is less than 2% for all the models. Moreover, we see the error is the least (i.e. maximum error< 3% and average error < 0.5%) just for the waveform model based on falling waveforms when we use just a subrange of fanout and we have a 30% range of process parameter variation. The error for the waveform model based on the rise times is next least. The error for the waveform model based on combing the rising and falling waveforms and using only a subrange of fanout with a 30% range of process variation is better than the two first waveform models. It seems waveform model 3 performs the best if we want to use a single waveform model for both rising and falling waveforms and use a subrange for fanout and a 30% range of process parameters variation. W aveform Accuracy - Mahalanobis Distance Mahalanobis Distance tr.gn tr_sub.gn tr_sub_large.gn tr_sub_large.sn Waveform Model tf_sub_large.sn Max. Mahalanobis Distance(19-pt) Max. Mahalanobis Distance(15-pt) Average Mahalanobis Distance(19-pt) Average Mahalanobis Distance(15-pt) Figure Waveform accuracy - Mahalanobis distance (TSMC180RF). 121

146 W aveform Accuracy - 50% Point Relative Error Max.(50% Point) Average(50% Point) tr.gn tr_sub.gn tr_sub_large.gn tr_sub_large.sn tf_sub_large.sn Waveform M odel Figure Waveform accuracy 50% point (TSMC180RF) Accuracy Analysis of the PCA Waveform Model for FreePDK45 Waveform Dataset Selection, Design of Experiment, Discretization Level, Range of Parameter Variations, and Model Subranging In this section, we describe how the accuracy of the PCA waveform model is affected by the selection of the waveform data, the range of parameter variation, and the model subranging. This discussion is for to a variational waveform model for FreePDK45 technology using our methodology, which was described in Chapter III for our cells based on TSMC180RF technology. We want to know how options can impact the accuracy of the resulted waveform and cell models. Table 4.20 shows a list of options for improving the accuracy of waveform and cell models. We describe it in more detail later. 122

147 We have made a list of our options with the code from our general waveform and cell models improvement methods in Table The list of our options is as follows: (a) Selection of the variational waveform model basis functions (w.1): Using common basis functions for variational waveform reduces the memory requirement for each cell and simplifies timing simulation; however, it affects the accuracy. We want to know how much accuracy improvement is possible using a dedicated basis function for each cell versus having to map from one basis function to another. The basis function can be constructed for output waveforms from rising waveforms, falling waveforms and both rising and falling waveforms. The basis function resulting from the convergence of the basis functions through an iterative approach is another alternative. (b) Experimental Design (wc.1): One of the factors affecting the accuracy of a PCA waveform model and consequently the accuracy of the cell model is the choice of alternative experimental designs to construct the mapping equations. We want to compare the accuracy of models obtained based on factorial design, fractional-factorial design, central composite design (Faced) and Latin hypercube sampling. (c) Fanout load model vs. effective-capacitance load model (wc.3): We characterized our cells using the number of fanouts as a simplification to help us concentrate on our modeling methodology; however, the better modeling of loading parameters can improve the accuracy of the cell models. Therefore, the fanout parameter can be replaced by a complex load, which takes into account the resistive, capacitive, and inductive components of the interconnect networks as the load as we incorporated a resistivecapacitive load model in Section 4.3. As a better single parameter load model, we can use effective capacitance. 123

148 Table Waveform and cell models accuracy improvement methods. Type Category Options c.1 Cell model Implementation Method 1) Equations 2) Tables c.2 w.1 w.2 wc.1 wc.2 wc.3 wc.4 wc.5 wc.6 wc.7 wc.8 Waveform model used for Cell model Type of Waveform Basis Function PCA Model Construction Methods Design of Experiment Used (Appendix C) Sampling Points Fanout Load Model Range of Variables Covered by Model Variation Covered by Model Load Assumptions Rail-to-Rail Input Assumptions Circuit Model Used 1) Slope 2) Variational waveform model 1) Based on rising output waveforms (Tr) 2) Based on falling output waveforms (Tf) 3) Based on both falling and rising output waveforms (Tg) 4) Based on converged waveform model (Tc) 1) Symmetric Nonstandardized Model (SNM) 2) Symmetric Standardized Model (SSM) 3) Asymmetric Standardized Model (ASM) 1) Full-factorial design 2) Fractional-factorial design 3) Central composite design (Faced) 4) Latin hypercube design Number & location of sampling points 1) Number of fanouts for minimum size transistors 2) Effective capacitance load model 3) Resistive-capacitive load model 1) Single model (whole range) 2) SR (Subrange (slope, load)) 3) SR2 (Subrange (load)) 1) 5% 2) 20% 3) 30% (not possible for the circuit-level model) 4) No variations (Used for our models based on FreePDK45) 5) The same variation for supply voltage and temperature 6) The same variation in threshold voltages, channel lengths, supply voltage and temperature (Used for our models based on TSMC 180 nm) 1) The same swing as the supply voltage of the cell 2) Fixed voltage swing 1) RC_CC for cell & load 2) RC_CC for cell & C for load 3) RC for cell & load 4) NP (No parasitics for cell and load) 124

149 The effective capacitance load model takes into account to some extent (i) the nonlinearity effect of the internal capacitance of the loading transistor and (ii) the resistive shielding of the capacitance. In our characterization in Chapter III, we have included the effect of nonlinearity of the internal capacitance of the transistors indirectly. We want to know what the impact on accuracy is by using our simple fanout load model vs. the effective capacitance load model. (d) Model subranging (wc.4): Since our cell timing parameters depend nonlinearly on variables, such as fanout and waveform shape, we expect to have a more accurate model if we generate the models for subranges of the variables. Subranging the variables will increase characterization time, simulation time, and the memory requirement by a factor of (number_of_subranges+1)/2 for each dimension; however, we expect to see accuracy improvement. The resulting models will be a hybrid using tables for selected nonlinear variables and equations for the other variables. We can subrange both load and slope, our sr models; or subrange just load, our sr2 models. (e) Variation covered by the model (wc.5): We want to know how the range of variation covered by the model affects the accuracy of the waveform and cell models. We use the following two ranges for threshold voltages and transistor channel lengths: (i) -5% to 5% (ii) -20% to 20% Please note that the large range of variation -30% to 30% will not be compared since the cell will not function properly. (f) Variational vs. Fixed load (wc.6): The set of waveforms that we used to construct our waveform and cell models is generated by the circuit-level simulation of a netlist 125

150 representing an inverter connected to a load using Hspice. If we use an inverter or multiple inverters with a common input as a load, we need to know how the load can be affected by process and environment variations. For our waveform and cell models based on TSMC 180 nm, we assumed the same variation for the load as for the cell. For our waveform and cell models based on FreePDK45, we assumed no variations in load. The other option is to use the same range of variation in the supply voltage, Vdd, since the cell and the load are usually very close. (g) Voltage-dependent vs. fixed waveform swing (wc.7): The rail-to-rail input voltage swing affects the shapes of the set of output waveforms. We can use the same swing for the input as the supply voltage of the cell or a fixed-voltage-range swing for the input. We used the same swing for the input as the supply voltage since the swing is generated by another cell because its rail-to-rail swing is a function of the supply voltage and usually the supply voltage pins of the adjacent cells are connected to the same branch of the power grid. (h) Circuit-level model accuracy (wc.8): The accuracy of the circuit-level models, representing a cell in a netlist, affects the shape of the set of waveforms we use to obtain the basis functions for the variation waveform. A selection of our cases of interest is as follows: (i) RC_CC: We use the extracted resistance, capacitance and coupling capacitance for the cell and the load; the load is an inverter or multiple inverters with a common input. (ii) RC_CC+C: We use the extracted resistance, capacitance and coupling capacitance for the cell while we use a capacitor for load; this is for characterizing the cell using effective capacitance. 126

151 (iii) RC: We use the extracted resistance and capacitance for the cell and the load; the load is an inverter or multiple inverters with a common input. (iv) NP: We use just transistor models with no parasitics for the cell and the load. Table 4.20 shows the generalization of all the accuracy improvements methods for the waveform and cell models. The columns of the table are the type of the category of the possible improvement method, the category name, and our options for each category. The improvement methods are coded as c, w, wc followed by a number for cell models, waveform models, and both, respectively. We evaluated category w.2 in chapter III, and we have categories c.1 and c.2 as improvement methods for cell models, which will be described in Section We designed another experiment to explore most of our other options. We use the discretization level of 11 for cells based on FreePDK45 to have straightforward numbering for 1.1 (volts) supply voltage, which sets the distance between subsequent levels to 0.11 (volts). In contrast, we used a discretization level of 19 for TSMC 180 nm technology for 1.8 (volts) supply voltage, which set the distance between subsequent levels to 0.10 (volts). We want to know how the accuracy of the variational PCA waveform model, based on variation model parameters of Table 4.5, is affected for the cases when we make these changes: (a) Increasing the discretization level from 11 to 19 while the Vt variation is 5% and the L variation is 20% (wc.2), (b) Using different cell and load models of Table 4.20 (wc.8), 127

152 (c) Using a central composite design instead of a full-factorial design while the Vt variation is 20% and the L variation is 20% (wc.1), (d) Using a subrange model by subranging only the load, only the slope, and both while the Vt variation is 20% and the L variation is 20% as shown in Table 4.7 (wc.4), (e) Increasing the range of variations from 5% to 20% in both threshold voltages and channel lengths as shown in Table 4.6 (wc.5), and (f) Using different types of waveform basis functions based on: rising output waveforms, falling output waveforms, both rising and falling output waveforms, and converged waveform model (w.2). We used option 1 for categories wc.6 and wc.7, which are "no variations for load" and "fixed rail-to-rail voltage swing". Please note that the variation model parameters for the combination of cases (d) and (e) are shown in Table Table 4.21 shows a series of waveform models obtained by varying the options listed. This table is for the case that we extract the resistance, capacitance, and coupling capacitance (tagged as RCCC) for the cell model. This is similar to Table 4.19 with the addition of the two columns for DOE (Design of Experiment) and the number of points (discretization level). We have 3 more sets of these tables for (a) extracted parasitics for both resistance and capacitance (tagged as RC), (b) extracted parasitics for only capacitance (tagged as C, and capacitance is used instead of fanout), and (c) no extracted parasitics (tagged as NP). We use an experimental design to evaluate how our PCA waveform accuracy is affected by the options in categories wc.1-wc.8 of Table We compare the accuracy of all methods using these criteria: 128

153 (a) Mahalanobis distance: We compare the original output waveforms originated from the selected experimental design with their 2-PC PCA waveform approximates. We plotted the maximum, the average, and the standard deviation of Mahalanobis distance for all cases in Table Table Waveform models compared for accuracy FreePDK45. # Model Name DOE No. of Points Waveforms Used Subrange of these Parameters (%) Range of Process Parameter s Variation L and Vt (%) Trise Tfall slope fanout L Vt 1 11Pts-L05-Vt05-SR-F-Tr-RCCC FF Pts-L20-Vt05-SR-F-Tr-RCCC FF Pts-L20-Vt05-SR-F-Tr-RCCC FF Pts-L20-Vt20-SR-FT-Tr-RCCC FF Pts-L20-Vt20-SR-F-Tr-RCCC FF Pts-L20-Vt20-SR-F-CCF-Tr-RCCC CCF Pts-L20-Vt20-Tr-RCCC FF Pts-L05-Vt05-SR-F-Tf-RCCC FF Pts-L20-Vt05-SR-F-Tf-RCCC FF Pts-L20-Vt05-SR-F-Tf-RCCC FF Pts-L20-Vt20-SR-FT-Tf-RCCC FF Pts-L20-Vt20-SR-F-Tf-RCCC FF Pts-L20-Vt20-SR-F-CCF-Tf-RCCC CCF Pts-L20-Vt20-Tf-RCCC FF Pts-L05-Vt05-SR-F-Tg-RCCC FF Pts-L20-Vt05-SR-F-Tg-RCCC FF Pts-L20-Vt05-SR-F-Tg-RCCC FF Pts-L20-Vt20-SR-FT-Tg-RCCC FF Pts-L20-Vt20-SR-F-Tg-RCCC FF Pts-L20-Vt20-SR-F-CCF-Tg- RCCC CCF Pts-L20-Vt20-Tg-RCCC FF

154 (b) The absolute value of error: We can have an idea about the gap between the original waveform and the estimated waveform. We plotted the maximum of this parameter for each discretization point, the maximum of the average of this parameter for each discretization point, and the average of this parameter for all points. (c) The absolute value of relative error at the middle point (50% point) of the waveform. We use the 50% point of the waveforms as the basis for delay propagation of a waveform. We plotted the average and the maximum of this parameter. (d) Relative errors: The relative error is calculated by dividing the absolute error (not the absolute value of error) by the original value of each timing point. We used the absolute value of this parameter for the last criteria. We plotted the average and the standard deviation of this parameter for all points and just 50% points. (e) Absolute errors: An absolute error is calculated by subtracting the original value of a timing point from its estimate. We plotted the average and the standard deviation of this parameter for all points and just 50% points. Figures are similar plots to We included Figures to show how other criteria are compared. The plots are for models 1-7 for the case when the load is an inverter and the parasitics extracted for the inverter include resistance, capacitance, and coupling capacitance. In general, the plots are similar, but there are some exceptions. Studying the plots shows: (a) The results of comparison using the Mahalanobis distance are not consistent with the results from using relative errors. (b) Increasing the discretization level should increase accuracy, but the accuracy can decrease for some cases. This can be explained by the data dependency of the PCA and 130

155 the fact that we change the datasets of discretized points when we have to extrapolate from the data points that we do not have in our simulations. (c) The results of using RCCC are almost the same as using RC. The results of using RCCC, RC, NP, and C are consistent for most of the cases; however, using only a capacitance instead of an inverter as the load seems to get the best results although it is not as realistic as using the inverter with a non-linear time varying capacitance. (d) Using a faced-central composite design improves the accuracy in comparison with a 2-level full-factorial design, but the improvement is not possibility worth the overhead for characterization. (e) Subranging load and slope can improve the accuracy in many cases, but the results are very data dependent. It seems that the waveforms for a 2-level full-factorial design dataset resulting from subranged parameters form a more uniform cluster, and are probably more similar, in comparison with such waveforms for the whole range of parameters. The waveform models obtained by subranging both load and slope were, in general, more accurate than the waveform models obtained by subranging just the load. It means subranging should be done for both dimensions for the best results, which we call it symmetric subranging. (f) It seems that increasing the range of variation reduces the accuracy of the models but this is not a general rule because the results are very data dependent. It seems that the waveforms for a dataset resulting from a larger range of parameter variation form a more nonuniform cluster, and are probably less similar because of the nonlinearity of the parameters, in comparison with such waveforms for a smaller range of parameter variations. The waveform models obtained by increasing the range of variations in both 131

156 Vt and L were, in general, more accurate than the waveform models obtained from increasing just the range of variation in L. This means increasing the range of variation should be done for both L and Vt for the best results, which we call the symmetric increase of the range of variation. (g) A PCA waveform model obtained from combining rising and falling datasets is almost as accurate as the one from using only a rising dataset. The PCA waveform model obtained using the rising (falling) dataset is the most (least) accurate. The waveform accuracy of PCA waveform models obtained using both the falling and rising dataset falls between the two others. Accuracy Criteria Waveform Accuracy - Mahalanobis Distance Pts-L05-Vt05-SR-F-Tr-RCCC 19Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-FT-Tr-RCCC 11Pts-L20-Vt20-Tr-RCCC 11Pts-L20-Vt20-SR-F-CCF-Tr... Waveform Model Max. Mahalanobis Distance(11-pt) Max. Mahalanobis Distance( 9-pt) Average Mahalanobis Distance(11-pt) >Std Mahalanobis Distance(11-pt) Average Mahalanobis Distance( 9-pt) >Std Mahalanobis Distance( 9-pt) Figure Waveform model accuracy compared using Mahalanobis distance. 132

157 Relative Error Waveform Accuracy - Max, Average, Max. Ave. of Absoulte of Relative Errors Max.(11-pt) Average (11-pt) 0.15 Max. Ave.(11-pt) Max. Ave.( 9-pt) 0.00 Max.( 9-pt) 11Pts-L05-Vt05-SR-F-Tr-RCCC 19Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-FT-Tr-RCCC 11Pts-L20-Vt20-Tr-RCCC 11Pts-L20-Vt20-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-F-CCF-Tr... Waveform Model Figure Waveform model accuracy compared using the absolute value of relative errors. Relative Error Pts-L05-Vt05-SR-F-Tr-RCCC 19Pts-L20-Vt05-SR-F-Tr-RCCC Waveform Accuracy - 50% Points of Absoulte of Relative Errors 11Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-FT-Tr-RCCC 11Pts-L20-Vt20-SR-F-CCF-Tr... 11Pts-L20-Vt20-Tr-RCCC Max.(50% Point) Average (50% Point) Waveform Model Figure Waveform model accuracy compared using absolute value of relative errors of 50% points. 133

158 Relative Error Pts-L05-Vt05-SR-F-Tr-RCCC Waveform Accuracy - Relative Errors 19Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-FT-Tr-RCCC 11Pts-L20-Vt20-Tr-RCCC 11Pts-L20-Vt20-SR-F-CCF-Tr... Average of All Points(11-pt) >Std of All Points(11-pt) Average (50% Point) >Std of (50% Point) Waveform Model Figure Waveform model accuracy compared using relative errors. Absolute Error (ns) Waveform Accuracy - Absolute Errors Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L05-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt05-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-F-Tr-RCCC 11Pts-L20-Vt20-SR-FT-Tr-RCCC 11Pts-L20-Vt20-Tr-RCCC Waveform Model 11Pts-L20-Vt20-SR-F-CCF-Tr-RCCC Average of All Points(11-pt) >Std of All Points(11-pt) Average (50% Point) >Std of (50% Point) Figure Waveform model accuracy compared using absolute errors. 134

159 We created 3-dimentional plots similar to to compare models 1-7 for changes in the options of the categories w.1-w.2 and wc.1-wc.8 of Table 4.21 for the experimental design. The plots are available in Appendix F. We learned from studying the plots that the results are similar in general but there are some exceptions. Table 4.22 summaries the results of all the plots. They agree with the results we mentioned earlier. Table Which options improve waveform models accuracy FreePDK45. Does the option Mahalanobis Overall improve accuracy Max Average Max of Max 50% (#) based on the criteria? Std Avg Group Set 1: Set 2: Set 3: Set 4: Set 5: Model Subranging Yes Yes Yes Yes Yes* Yes(5) Central Composite Design (Faced) Yes Yes Yes Yes - Yes(4) Larger Range of Variation No Yes No - No* Yes(1) Higher Discretization Level No Yes No No Yes Yes(2) Using C as Load Yes Yes - - Yes Yes(3) Tr,Tg,Tf (Greater than sign means better) Tr>Tf>Tg Tr>Tg>Tf * Tf>Tg>Tr * Tg>Tf>Tr Tr>(Tf=Tg) Tr>Tg>Tf (Average), Tg>Tf>Tr (Std), Tr>Tf>Tg (Max) # of plots in group Accuracy Analysis of the PCA Waveform Model for FreePDK45 The Iterative Method for Finding the Common PCs We combined the rising and falling waveform datasets to construct a common waveform model for both rise and fall transitions and investigated the effect of this combination. We intended to know how the PCA waveform model accuracy is affected by using a common set of PCs for both inputs and output waveforms of a cell. Having a 135

160 common set of PCs reduces memory requirement and simulation time for a cell but it can affect the accuracy as we described in Section 3.2 of Chapter III. We used the inverter based on FreePDK45 technology to create the PCA waveform model. The basis functions were based a common set of principal components. We used a similar methodology to construct the PCA waveform model with a unique set of basis functions based on a set of common principal components, as we did for TSMC180RF in Section 3.2 of Chapter III, but we made two changes for improvement. We used a capacitor as the load instead of an inverter as the load; and we used the dataset of the combined set of rising and falling waveforms for PC model construction for the next iteration. For TSMC180RF, we used the dataset of rising waveforms for the next iteration and we assumed the resulting PCA waveform model could be used for both rising and falling transitions. We followed the iterative method of finding the set of common PCs described in Chapter III and we got convergence on the second iteration. Figure 4.28 shows the convergence of coefficients of principal component basis functions. Two PCs cover 99.7% of variance for iteration 0 and they cover 99.4% of variance for iterations 1-2. We performed two more iterations (3, 4) after the observation of convergence on iteration 2 to make sure the convergence was stable. We evaluated the model adequacy (fitting accuracy) and prediction accuracy of two PCA waveform models for iterations 1, 4. Waveform model adequacy is based on the statistics of residual errors. Waveform model prediction accuracy was evaluated by cross-checking a model (e.g. the waveform PCA model of iteration 4) with the waveform dataset of the other model (e.g. waveform data set of iteration 1) and evaluating the residual errors. Moreover, we used 136

161 the dataset of iteration 0 based on Slope instead of [L, Θ] to cross-check the dataset with the PCA waveform model of iteration 4. Comparing PC1s (Trise) Coefficients of PC1s IT(0) IT(1) IT(2) Data Points (a) Comparing PC2s (Trise) Coefficients of PC2s IT(0) IT(1) IT(2) Data Points (b) Figure The coefficients of the principal components basis functions for the inverter based on FreePDK45 technology, computed after each of the iterations: (a) PC1 and (b) PC2. 137

162 Figure 4.29 shows the waveforms for iteration 1 of the inverter based on FreePDK45 technology in both the time domain and the PCA domain. The plots are similar to Figures They show the original corners of the experimental design if we could have used Cartesian coordinates similar to Figure 3.3, which helped us to explain the concept of invalid waveforms and the acceptability region in Figure 3.4. Figure Waveforms for iteration 1 of the inverter based on FreePDK45 technology, (a) Time Domain and (b) PCA Domain. The pink line is the limit line similar to what we had in Figure 3.6 to impose the convergence requirement. We can also observe the triangular shaped region that have us use polar coordinates to map each data point back to time domain. Some of data are points mapped close to the pink limit line because we had to extrapolate some of the end 138

163 points of the transitions since some end points of output transitions could not be captured in our Hspice simulations. We also applied the 8 ns limit on the waveforms by cutting their upper tail of the transition for the transitions longer than the 8 ns limit. We can do that because both tails of the transitions falling outside the range of 10% to 90% do not have significant affect on switching timing of NMOS and PMOS transistors of the inverter because one of them should be off at each tail. We used the same criteria to assess both model adequacy and prediction accuracy. The models are listed in Table The columns of the table are self descriptive. All the models are similar to item 21 (i.e. 11Pts-L20-Vt20-Tg-RCCC) of Table 4.11 with the one exception that only capacitance is used as the load in the simulations. Figures shows the accuracy comparison plots similar to the of the Figures We observe the waveform dataset of iteration 0 cross-checked with the waveform model of iteration 4 results in the largest errors in all the three plots. This implies that a saturated ramp transition cannot perfectly be mapped to our PCA waveform model. This is a source of error when we compared the timing results for Tabular STA and our PCA method in Chapter III. We constructed our PCA waveform model for the extreme cases of the largest range of process variation and the largest range of load and capacitance that makes the cluster of output waveforms very nonlinear. It is obvious that mapping a line to a nonlinear curve cannot be done perfectly. The waveform shapes are very nonlinear for the cell in reality based on our simulation results. This fact suggests using a waveform model with better support for nonlinearity like our PCA waveform models can improve the accuracy of our cell models. 139

164 Table Waveform models compared for adequacy and prediction accuracy FreePDK45. # Case Name: PCA Checking Checking Cross-checked waveform model or the waveform adequacy prediction dataset combination of waveform dataset plus waveform model used as name model iteration number accuracy iteration number 1 11Pts-L20-Vt20-Tg-IT Pts-L20-Vt20-Tg-IT Pts-L20-Vt20-Tg-IT4inIT Pts-L20-Vt20-Tg-IT1inIT Pts-L20-Vt20-Tg-IT0inIT4 4 0 Accuracy Criteria Waveform Accuracy - Mahalanobis Distance Pts-L20-Vt20-Tg-IT1 11Pts-L20-Vt20-Tg-IT4 11Pts-L20-Vt20-Tg-IT1inIT4 11Pts-L20-Vt20-Tg-IT4inIT1 11Pts-L20-Vt20-Tg-IT0inIT4 Waveform Model Max. Mahalanobis Distance(11-pt) Max. Mahalanobis Distance( 9-pt) Average Mahalanobis Distance(11-pt) >Std Mahalanobis Distance(11-pt) Average Mahalanobis Distance( 9-pt) >Std Mahalanobis Distance( 9-pt) Figure Waveform accuracy for a waveform model based on a common set of PCs Mahalanobis distance (FreePDK45 Iterations 0-4). 140

165 Relative Error Waveform Accuracy - Max, Average, Max. Ave. of Absoulte of Relative Errors Pts-L20-Vt20-Tg-IT1 11Pts-L20-Vt20-Tg-IT4 11Pts-L20-Vt20-Tg-IT4inIT1 11Pts-L20-Vt20-Tg-IT1inIT4 11Pts-L20-Vt20-Tg-IT0inIT4 Waveform Model Max.(11-pt) Average (11-pt) Max. Ave.(11- pt) Max. Ave.( 9-pt) Max.( 9-pt) Figure Waveform accuracy for a waveform model based on a common set of PCs Max, Average, and Max. Ave (FreePDK45 Iterations 0-4). Relative Error Pts-L20-Vt20-Tg-IT1 Waveform Accuracy - 50% Points of Absoulte of Relative Errors 11Pts-L20-Vt20-Tg-IT4 11Pts-L20-Vt20-Tg-IT1inIT4 11Pts-L20-Vt20-Tg-IT4inIT1 11Pts-L20-Vt20-Tg-IT0inIT4 Waveform Model Max.(50% Point) Average (50% Point) Figure Waveform accuracy for a waveform model based on a common set of PCs 50% point (FreePDK45 Iterations 0-4). 141

166 We dropped the last model from the plots in Figures to obtain the plots in Figure These plots compare better the model adequacy and prediction accuracy of the models. We observe the PCA waveform model of iteration 4 is as adequate as the one from iteration 1. Moreover, we see the prediction accuracy of the PCA waveform model based on the iteration 4 is as good as its adequacy (fitting accuracy) when we cross-check the waveform model for iteration 4 with the dataset of iteration 1. This statement is valid even when we exchange 1 and 4 in the last statement. The Mahalanobis distance in Figure 4.34 shows how the shape of the waveform estimates are similar to the shape of the original waveforms for all the four models. The maximum errors for all the points are less than 7% according to Figure The average error for all the points of the our waveform estimates in comparison with the original waveform is about 5% and the average error of the 50% point of the waveform estimates is about 3% according to Figure There are other plots similar to Figures for the PCA waveform model based on a common set of PCs but we do not include them in this document. 142

167 Accuracy Criteria Waveform Accuracy - Mahalanobis Distance Pts-L20-Vt20-Tg-IT1 11Pts-L20-Vt20-Tg-IT4 11Pts-L20-Vt20-Tg-IT4inIT1 11Pts-L20-Vt20-Tg-IT1inIT4 Waveform Model Max. Mahalanobis Distance(11-pt) Max. Mahalanobis Distance( 9-pt) Average Mahalanobis Distance(11-pt) >Std Mahalanobis Distance(11-pt) Average Mahalanobis Distance( 9-pt) >Std Mahalanobis Distance( 9-pt) Figure Waveform accuracy for a waveform model based on a common set of PCs Mahalanobis distance (FreePDK45 Iterations 1-4). Relative Error Waveform Accuracy - Max, Average, Max. Ave. of Absoulte of Relative Errors Pts-L20-Vt20-Tg-IT1 11Pts-L20-Vt20-Tg-IT4 11Pts-L20-Vt20-Tg-IT4inIT1 11Pts-L20-Vt20-Tg-IT1inIT4 Waveform Model Max.(11-pt) Average (11-pt) Max. Ave.(11- pt) Max. Ave.( 9-pt) Max.( 9-pt) Figure Waveform accuracy for a waveform model based on a common set of PCs Max, Average, and Max. Ave (FreePDK45 Iterations 1-4). 143

168 Relative Error Pts-L20-Vt20-Tg-IT1 Waveform Accuracy - 50% Points of Absoulte of Relative Errors 11Pts-L20-Vt20-Tg-IT4 11Pts-L20-Vt20-Tg-IT4inIT1 11Pts-L20-Vt20-Tg-IT1inIT4 Waveform Model Max.(50% Point) Average (50% Point) Figure Waveform accuracy for a waveform model based on a common set of PCs 50% point (FreePDK45 Iterations 1-4) Accuracy Analysis of the Cell Models The accuracy of cell models is affected by the accuracy of the waveform model that the cells use. In this section, we investigate the accuracy improvement methods specific to our cell models and not our waveform models. Although it is possible to construct a variation-aware timing model just based on tables, the exponential complexity (with the table resolution as the base) of the characterization time and memory with respect to the number of dimensions (variables) makes such models impractical. Such models could be very accurate with enough resolution for each dimension, while being impractical as described. Such models can be compacted using linear regression to find their corresponding multivariable equations; however, they still suffer from exponential characterization time. Our compact variation- 144

169 aware models use just a small fraction of all the points based on analysis of variance or by sampling the whole space; therefore, characterization time will be much less while still being exponential with a base of 2. The table-based model will be more accurate because in our models the existence of residuals means that the equations do not pass through all the points in the tables; therefore, the residuals will reduce accuracy in favor of compactness of the models. We used a more accurate waveform model to increase accuracy of our cell modeling. We want to know how much accuracy improvement can be achieved by building a compact variation-aware model using slope instead of the variational waveform model. In Section 4.4 we built our models using slope instead of the variational waveform model. Although the models could give use reasonable accuracy, it obvious that a more accurate waveform mode can increase the accuracy of the cell models. We did not construct our cell models for the same technology; therefore, we do not have any data for the level of the improvement of the accuracy. In general, including more sampling points at strategic locations in designs can improve the accuracy of our cell models; therefore, building models using designs with more sampling points, such as central composite designs [52] and 3-level full-factorial designs [52], should improve the accuracy of the models but increases the characterizing time. It is interesting to see if the increase of the accuracy is worth the characterization time overhead; therefore, we included this topic as an item in our future research list Conclusions The tabular cell models can be very accurate, but their time and space complexity prohibits using them for a large number of parameters; therefore our compact variation- 145

170 aware cell models can be useful considering their superior time and space complexity. However, the accuracy is impacted. We listed the sources of the errors in our waveform and cell models as circuit-level simulation errors, experimental design sampling errors and model fitting errors. We categorized our accuracy improvement methods as (a) waveform model improvement and (b) cell model improvement. Since we use a waveform model in a cell model, the accuracy of a cell is affected by accuracy of the used waveform models. Consequently, our variational waveform model can improve the accuracy without imposing a large overhead on cell characterization. We used TSMC180RF and FreePDK45 technologies in developing our waveform and cell models. We concluded several results from a set of experiments to evaluate the accuracy of our waveform and cell models. First, our methodology is not technology dependent. Second, we can use a common set of basis functions for our variational waveform model without much impact on its accuracy. Third, the accuracy of our variational waveform model increases with increasing the number of the points to represent the waveform, but it reaches to saturation after a limit. Forth, using the effective capacitance load model increases the accuracy of cell modeling in comparison with the number of fanouts load model that we used before, although using a resistivecapacitive load model is more accurate. Fifth, symmetric subranging for load and slope can increase the accuracy of the models in general but not always. Sixth, increasing the range of parameter variation can reduce the accuracy of the models in general, but not always. Seventh, our methodology is applicable for cell characterization with loads that can be affected by variation themselves, but the accuracy is affected depending on how we characterize our cells for a variational or fixed load. Eighth, the accuracy of our 146

171 models is affected by the difference between the supply voltage and the total swing of the input waveforms that can happen by the variation in the supply voltage. Ninth, the experimental design used can affect both our waveform and cell models and using more samples and better sample locations. Finally, the inaccuracies of the circuit-level model used as well as the simulator are reflected in high-level cell models. While we tried to explore as many options as we could to find ways to improve the accuracy of our waveform and cell model, we had to narrow down our selection options. We leave exploring more options with using the resistive-capacitive load model as a few items in our future research directions that we mentioned in

172 CHAPTER V FAST VARIATION-AWARE STATISTICAL DYNAMIC TIMING ANALYSIS A statistical dynamic timing analysis framework is presented to study the impact of catastrophic defects and process variation on the delay behavior of a digital circuit considering the effect of gate switching on delays. It uses object-oriented programming and levelized code generation techniques to achieve fast runtimes with linear time complexity as the number of gates increases. The generated functional delay model along with experiments and statistical modules are compiled to machine code before execution; and random transition vectors approximate the delay profiles useful for virtual speed grading and yield estimation. The methodology was published in [85] Introduction Yield of digital circuits is reduced when a fraction of circuits do not meet timing constraints. Major sources of yield loss include process variability and random defects. Statistical timing analysis enables estimation of yield loss from failures of timing specification tests. Timing characteristics of individual circuits are primarily a function of process variations, especially from channel length [86]-[88] and threshold voltage [89]; however, manufacturing defects can also significantly degrade timing characteristics [90]. Hence, it is important to determine timing sensitivity to both process variations and to major sources of defects, such as resistive vias. 148

173 Timing analysis can be static or dynamic. Dynamic Timing Analysis (DTA) and Static Timing Analysis (STA) are not alternatives to each other. STA estimates the delay of paths (from inputs to outputs) supposing the other inputs of gates not in the path have fixed logic values. Dynamic timing analysis verifies functionality of the design by applying input vectors and checking for correct output vectors. The quality of DTA increases with an increase in the number of input test vectors, at the expense of simulation time. While a static timing analysis (STA) approach is very pessimistic, statistical STA methods are more realistic [6],[91]-[92]. Path-oriented statistical STA tools actually do a similar analysis for a set of predetermined critical (longest delay) paths and form the corresponding delay probability density functions considering parameter variations; which are combined to form their joint probability density function. The run time is a linear function of the number of paths although the total number of paths is an exponential function of the number of gates. In high performance designs, such as pipelines, all paths are designed to have very close delays; therefore, all paths must be included in the analysis, which effectively makes the run time an exponential function of the number of gates [1]. Moreover, process variation can make a critical path non-critical or vice versa. To make sure all the paths are covered, all must be selected. In such situations block-based statistical static timing simulation has much better performance [1]; however statistical timing simulation with a proper set of vectors is much more accurate when the switching of gates is considered [93]. 149

174 We have designed a statistical dynamic timing simulation framework to study the impact of catastrophic defects and process variations on the delay behavior of a digital circuit. The tool can be used for virtual speed grading, yield estimation, and delay fault diagnosis. Based on our knowledge, this is the first statistical DTA tool that considers statistical distributions of parameters, not just corners as in [98],[99], and it has almost linear time complexity as the number of gates increases. Based on the classification of statistical performance simulation methodologies in [96], statistical timing simulation can be either Monte Carlo [93], Quasi-Monte Carlo [97], or analytical block-based using joint probability density functions [6],[98]. While Monte Carlo implementations suffer from intense computation requirements, analytical approaches have exponential worst case run times. We used a Monte Carlo approach. Our approach uses gate level simulation to achieve faster speed in comparison with circuit-level or switch-level simulation, combined with a levelized-compiled code approach to achieve higher performance. While statically-scheduled levelized code evaluates logic gates based on the partial order of causality, dynamically-scheduled code schedules evaluations just as needed [99]. Digital circuit simulators are either based on interpreted or compiled-code. Circuit compilation increases the efficiency and speed of simulation at the cost of preprocessing time and larger code. Circuit compilation is essentially a pre-processing step that symbolically executes the simulation to uncover data structures that can usually be statically allocated. The circuit graph traversal is eliminated by hard coding in the simulator kernel. Moreover, most indirect memory references are replaced with direct ones. Compilation unrolls most loops and embeds most function calls; therefore, it 150

175 reduces the context switching overhead and increases instruction level parallelism in parallel or superscalar processors [100]. Texam [101] makes compiled code cycle simulation more efficient by separation of timing and function. Boolean operations are executed in one cycle. Multi-value logic code generation was developed to eliminate the need for event-driven simulations. We enhanced this approach by including the effects of parameter variations on timing and expressing delays with equations in our framework. Fault simulators use fault models to simulate faults. In this paper, only defects resulting in delay faults are considered. Such defects may result from resistive vias, which are common in deep submicron technologies, and random process variations. Section 5.2 gives more information about the statistical dynamic timing analysis (SDTA) framework and its implementation. Section 5.3 presents the experimental results with their comparison and interpretation. Section 5.4 is dedicated to the conclusion and future work VVCCP A compiled-code SDTA tool A framework has been created for statistical dynamic timing analysis, based on Verilog/VHDL Compiler-Compiler Programs (VVCCP), which extends VCCP [101] with dynamic static timing simulation code generation modules Fault simulation framework C++ code generation allows building flexible experiments with integrated test benches for statistical analysis and fast simulation runs. Fast simulation is achieved by running machine level code, and no external simulation engine is required. 151

176 Using the framework, a faultable functional delay model is created. Delay faults are injected as variations of gate delays, which include global process shifts and random delay faults, defined in the corresponding generated test benches. The integrated statistical analysis in the test bench determines the simulation results. Figure 5.1 shows the simulation framework. Verilog [102] and VHDL [103] are used as the front-end for the input. The fault simulation engine and critical path functional delay model are generated for the input model in C++. The generated code is run after compilation. Figure 5.1. Fault simulation framework. Here, the experiment of interest is the delay profile generation and comparison. A delay profile is the cumulative probability density function of critical path delays of a circuit. Examples of these delay profiles are shown in Figure 5.6 in Section which provides the delay as a function of a set of 4096 random patterns. The minimum and maximum delay in Figure 5.6(b) is 0 and 1400 a.u., respectively. It can be seen that some patterns produce significantly larger delay than the others. 152

177 The framework supports experiments, such as path delay as a function of a set of test patterns in the presence of global parametric shifts, random within die variation, and single and multiple delay faults. From these simulations, we obtain the impact on critical path delay and the impact on the delay distribution for all test patterns. Figure 5.2 shows the block diagram of VVCCP. After lexical analysis of the gate level model, intermediate code is generated that is converted to an abstract model used by the code generators. Special code generators were designed and implemented to generate the functional delay model, the statistical module, and the automated experiments. The code generator for the critical path delay functional model is based on the functional delay model. Other code generator modules can be plugged into the suite. Figure 5.2. VVCCP block diagram Transformation process of models The transformation needs two mappings as shown in Figure 5.3. The first one maps the input model into the functional delay model. This is symbolically shown by replacing 153

178 the input/output port labels by a delay_ prefixed label creating an input delay vector and an output delay vector. Gate delays are shown as vectors. The next mapping creates the critical path functional delay model by the conversion of the output delay vector into a scalar delay value. If logic is not considered, the critical path functional delay model actually gives an upper bound on the logic dependent critical path delays. The input/output and gate delay initializations are essential to the correct functionality of the model. The output delay vector (D Outputs ) is a function of the input delay vector (D Inputs ) and the gate delay vector (D Gates ): r D Outputs r r = DF D, D ) (5.1) ( Inputs Gates The functional critical path delay model is the sub-graph of the delay graph based on the activity of the logic signals. The main three steps are the initialization of the logic and delay elements, the logic network evaluation and the delay network evaluation. Figure 5.3. Model transformations. 154

179 In static timing analysis, the critical path delay function is a scalar value that is the maximum delay considering all outputs, input delays, and gate delays. r r d = CPDF D Inputs, D ) (5.2) ( Gates Dynamic timing analysis accounts for the sensitizability of paths. The critical path delay function is the vectorized form of the former function where each element of the vector corresponds to each applied logic transition input vector (Linputs). r r r r d = CPDF L, D, D ) (5.3) ( Inputs Inputs Gates A six value system [104] is used for the logic manipulation. Faulty delays are computed through fault insertion. r d r d Good Faulty r r r = CPDF L,0, D ) (5.4) ( Inputs GoodGates r r r = CPDF L,0, D ) (5.5) ( Inputs FaultyGates Experiments Four types of experiments on input models were implemented: a parametric shift experiment, a random within die variation experiment, fault-free circuit delay profile generation, and faulty circuit delay profile generation. The gate level delay model of Section is used in this implementation; however, the experiments can be modified to use actual delays derived from back annotation of a design extracted from a layout. 155

180 Figure 5.4 shows the graphical representation of the first two experiments (a, b). Red dots correspond to faulty gates, and green dots are fault-free gates. Figure 5.4. Parametric and random experiments. The parametric shift experiment increases the delay of all of the gates from 0% to 100% as the fault injection method. This corresponds to global process variation, which is analyzed for the worst case or corner analysis. A random within die variation experiment adds random gate delay variation of 0% to 100% of the nominal delay with an injection probability of 0% to 100% to the parametric experiment. This models variation due to random variation in channel doping and channel length. Comparison of the delay distributions is made after normalizing the maximum delay of each circuit instance. Figure 5.5(a) shows the experimental design for fault-free and faulty delay profile generation experiments. The light green dot in the middle of the front box corresponds to (b); the dark green dots in the middle of the sides of the front box correspond to (c); the red dot in the middle of the back box corresponds to (d); the blue dots at the corners of 156

181 the front box correspond to (e); and the pink dots at the corners of the back box correspond to (f). The experiments that were performed are: (b) generation of the ideal fault-free delay profile (DP) using equation (5.4), (c) generation of the fault-free DP in the presence of process shifts (±30shift), (d) generation of the faulty DP with a single delay fault with 100% increase in delay, (e) generation of the good DP with ±5% random noise, and ±30% shift in the process that here good refers to not ideal and possibly faulty and (f) generation of the faulty DP with a single delay fault with +100% increase in delay, ±5% noise, and ±30% shift in the process. We generate the delay profiles using appropriate fault injection for each experiment and apply the same set of input transitions. Experiments (c)-(f) use equation (5.5). Figure 5.5 shows the resulting delay distribution functions. Details of implementation are different for each experiment; however, the general pseudo-code is shown in Figure 5.6. The path delay function first determines the logic and delay of each output; then it obtains the path delay by evaluating a function in terms of delay and logic of output nodes. The delay profile generator initializes all experiment parameters and then applies the set of transition vectors. In the loop, it first resets the delay of all gates and signals of the circuit; then it calls the path delay evaluation function and updates the statistical tables. 157

182 Figure 5.5. Delay profile generation experiments. 158

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS The major design challenges of ASIC design consist of microscopic issues and macroscopic issues [1]. The microscopic issues are ultra-high

More information

Amber Path FX SPICE Accurate Statistical Timing for 40nm and Below Traditional Sign-Off Wastes 20% of the Timing Margin at 40nm

Amber Path FX SPICE Accurate Statistical Timing for 40nm and Below Traditional Sign-Off Wastes 20% of the Timing Margin at 40nm Amber Path FX SPICE Accurate Statistical Timing for 40nm and Below Amber Path FX is a trusted analysis solution for designers trying to close on power, performance, yield and area in 40 nanometer processes

More information

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI ELEN 689 606 Techniques for Layout Synthesis and Simulation in EDA Project Report On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital

More information

Accurate and Efficient Macromodel of Submicron Digital Standard Cells

Accurate and Efficient Macromodel of Submicron Digital Standard Cells Accurate and Efficient Macromodel of Submicron Digital Standard Cells Cristiano Forzan, Bruno Franzini and Carlo Guardiani SGS-THOMSON Microelectronics, via C. Olivetti, 2, 241 Agrate Brianza (MI), ITALY

More information

All Digital on Chip Process Sensor Using Ratioed Inverter Based Ring Oscillator

All Digital on Chip Process Sensor Using Ratioed Inverter Based Ring Oscillator All Digital on Chip Process Sensor Using Ratioed Inverter Based Ring Oscillator 1 G. Rajesh, 2 G. Guru Prakash, 3 M.Yachendra, 4 O.Venka babu, 5 Mr. G. Kiran Kumar 1,2,3,4 Final year, B. Tech, Department

More information

Dr. Ralf Sommer. Munich, March 8th, 2006 COM BTS DAT DF AMF. Presenter Dept Titel presentation Date Page 1

Dr. Ralf Sommer. Munich, March 8th, 2006 COM BTS DAT DF AMF. Presenter Dept Titel presentation Date Page 1 DATE 2006 Special Session: DFM/DFY Design for Manufacturability and Yield - Influence of Process Variations in Digital, Analog and Mixed-Signal Circuit Design DATE 06 Munich, March 8th, 2006 Presenter

More information

Variation-Aware Design for Nanometer Generation LSI

Variation-Aware Design for Nanometer Generation LSI HIRATA Morihisa, SHIMIZU Takashi, YAMADA Kenta Abstract Advancement in the microfabrication of semiconductor chips has made the variations and layout-dependent fluctuations of transistor characteristics

More information

Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance

Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance Muralidharan Venkatasubramanian Auburn University vmn0001@auburn.edu Vishwani D. Agrawal Auburn University vagrawal@eng.auburn.edu

More information

POWER consumption has become a bottleneck in microprocessor

POWER consumption has become a bottleneck in microprocessor 746 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007 Variations-Aware Low-Power Design and Block Clustering With Voltage Scaling Navid Azizi, Student Member,

More information

ECE 683 Project Report. Winter Professor Steven Bibyk. Team Members. Saniya Bhome. Mayank Katyal. Daniel King. Gavin Lim.

ECE 683 Project Report. Winter Professor Steven Bibyk. Team Members. Saniya Bhome. Mayank Katyal. Daniel King. Gavin Lim. ECE 683 Project Report Winter 2006 Professor Steven Bibyk Team Members Saniya Bhome Mayank Katyal Daniel King Gavin Lim Abstract This report describes the use of Cadence software to simulate logic circuits

More information

Output Waveform Evaluation of Basic Pass Transistor Structure*

Output Waveform Evaluation of Basic Pass Transistor Structure* Output Waveform Evaluation of Basic Pass Transistor Structure* S. Nikolaidis, H. Pournara, and A. Chatzigeorgiou Department of Physics, Aristotle University of Thessaloniki Department of Applied Informatics,

More information

Appendix. RF Transient Simulator. Page 1

Appendix. RF Transient Simulator. Page 1 Appendix RF Transient Simulator Page 1 RF Transient/Convolution Simulation This simulator can be used to solve problems associated with circuit simulation, when the signal and waveforms involved are modulated

More information

Statistical Static Timing Analysis Technology

Statistical Static Timing Analysis Technology Statistical Static Timing Analysis Technology V Izumi Nitta V Toshiyuki Shibuya V Katsumi Homma (Manuscript received April 9, 007) With CMOS technology scaling down to the nanometer realm, process variations

More information

CONTENTS PREFACE. Part One THE DESIGN PROCESS: PROPERTIES, PARADIGMS AND THE EVOLUTIONARY STRUCTURE

CONTENTS PREFACE. Part One THE DESIGN PROCESS: PROPERTIES, PARADIGMS AND THE EVOLUTIONARY STRUCTURE Copyrighted Material Dan Braha and Oded Maimon, A Mathematical Theory of Design: Foundations, Algorithms, and Applications, Springer, 1998, 708 p., Hardcover, ISBN: 0-7923-5079-0. PREFACE Part One THE

More information

Lecture 11 Digital Circuits (I) THE INVERTER

Lecture 11 Digital Circuits (I) THE INVERTER Lecture 11 Digital Circuits (I) THE INVERTER Outline Introduction to digital circuits The inverter NMOS inverter with resistor pull-up Reading Assignment: Howe and Sodini; Chapter 5, Sections 5.1-5.3 6.12

More information

UNIT-III POWER ESTIMATION AND ANALYSIS

UNIT-III POWER ESTIMATION AND ANALYSIS UNIT-III POWER ESTIMATION AND ANALYSIS In VLSI design implementation simulation software operating at various levels of design abstraction. In general simulation at a lower-level design abstraction offers

More information

ECE 484 VLSI Digital Circuits Fall Lecture 02: Design Metrics

ECE 484 VLSI Digital Circuits Fall Lecture 02: Design Metrics ECE 484 VLSI Digital Circuits Fall 2016 Lecture 02: Design Metrics Dr. George L. Engel Adapted from slides provided by Mary Jane Irwin (PSU) [Adapted from Rabaey s Digital Integrated Circuits, 2002, J.

More information

Tradeoffs and Optimization in Analog CMOS Design

Tradeoffs and Optimization in Analog CMOS Design Tradeoffs and Optimization in Analog CMOS Design David M. Binkley University of North Carolina at Charlotte, USA A John Wiley & Sons, Ltd., Publication Contents Foreword Preface Acknowledgmerits List of

More information

Multivariate Permutation Tests: With Applications in Biostatistics

Multivariate Permutation Tests: With Applications in Biostatistics Multivariate Permutation Tests: With Applications in Biostatistics Fortunato Pesarin University ofpadova, Italy JOHN WILEY & SONS, LTD Chichester New York Weinheim Brisbane Singapore Toronto Contents Preface

More information

A Novel Low-Power Scan Design Technique Using Supply Gating

A Novel Low-Power Scan Design Technique Using Supply Gating A Novel Low-Power Scan Design Technique Using Supply Gating S. Bhunia, H. Mahmoodi, S. Mukhopadhyay, D. Ghosh, and K. Roy School of Electrical and Computer Engineering, Purdue University, West Lafayette,

More information

CHAPTER 6 DIGITAL CIRCUIT DESIGN USING SINGLE ELECTRON TRANSISTOR LOGIC

CHAPTER 6 DIGITAL CIRCUIT DESIGN USING SINGLE ELECTRON TRANSISTOR LOGIC 94 CHAPTER 6 DIGITAL CIRCUIT DESIGN USING SINGLE ELECTRON TRANSISTOR LOGIC 6.1 INTRODUCTION The semiconductor digital circuits began with the Resistor Diode Logic (RDL) which was smaller in size, faster

More information

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate Preface to Third Edition p. xiii Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate Design p. 6 Basic Logic Functions p. 6 Implementation

More information

Rail to Rail Input Amplifier with constant G M and High Unity Gain Frequency. Arun Ramamurthy, Amit M. Jain, Anuj Gupta

Rail to Rail Input Amplifier with constant G M and High Unity Gain Frequency. Arun Ramamurthy, Amit M. Jain, Anuj Gupta 1 Rail to Rail Input Amplifier with constant G M and High Frequency Arun Ramamurthy, Amit M. Jain, Anuj Gupta Abstract A rail to rail input, 2.5V CMOS input amplifier is designed that amplifies uniformly

More information

EE 42/100 Lecture 23: CMOS Transistors and Logic Gates. Rev A 4/15/2012 (10:39 AM) Prof. Ali M. Niknejad

EE 42/100 Lecture 23: CMOS Transistors and Logic Gates. Rev A 4/15/2012 (10:39 AM) Prof. Ali M. Niknejad A. M. Niknejad University of California, Berkeley EE 100 / 42 Lecture 23 p. 1/16 EE 42/100 Lecture 23: CMOS Transistors and Logic Gates ELECTRONICS Rev A 4/15/2012 (10:39 AM) Prof. Ali M. Niknejad University

More information

IT has been extensively pointed out that with shrinking

IT has been extensively pointed out that with shrinking IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 5, MAY 1999 557 A Modeling Technique for CMOS Gates Alexander Chatzigeorgiou, Student Member, IEEE, Spiridon

More information

Announcements. Advanced Digital Integrated Circuits. Project proposals due today. Homework 1. Lecture 8: Gate delays,

Announcements. Advanced Digital Integrated Circuits. Project proposals due today. Homework 1. Lecture 8: Gate delays, EE4 - Spring 008 Advanced Digital Integrated Circuits Lecture 8: Gate delays, Variability Announcements Project proposals due today Title Team members ½ page ~5 references Post it on your EECS web page

More information

DESIGNING powerful and versatile computing systems is

DESIGNING powerful and versatile computing systems is 560 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 5, MAY 2007 Variation-Aware Adaptive Voltage Scaling System Mohamed Elgebaly, Member, IEEE, and Manoj Sachdev, Senior

More information

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting C. Guardiani, C. Forzan, B. Franzini, D. Pandini Adanced Research, Central R&D, DAIS,

More information

Introduction. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

Introduction. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002 Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Introduction July 30, 2002 1 What is this book all about? Introduction to digital integrated circuits.

More information

New System Simulator Includes Spectral Domain Analysis

New System Simulator Includes Spectral Domain Analysis New System Simulator Includes Spectral Domain Analysis By Dale D. Henkes, ACS Figure 1: The ACS Visual System Architect s System Schematic With advances in RF and wireless technology, it is often the case

More information

Analysis and loss estimation of different multilevel DC-DC converter modules and different proposed multilevel DC-DC converter systems

Analysis and loss estimation of different multilevel DC-DC converter modules and different proposed multilevel DC-DC converter systems The University of Toledo The University of Toledo Digital Repository Theses and Dissertations 2014 Analysis and loss estimation of different multilevel DC-DC converter modules and different proposed multilevel

More information

! Review: MOS IV Curves and Switch Model. ! MOS Device Layout. ! Inverter Layout. ! Gate Layout and Stick Diagrams. ! Design Rules. !

! Review: MOS IV Curves and Switch Model. ! MOS Device Layout. ! Inverter Layout. ! Gate Layout and Stick Diagrams. ! Design Rules. ! ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 3: January 21, 2017 MOS Fabrication pt. 2: Design Rules and Layout Lecture Outline! Review: MOS IV Curves and Switch Model! MOS Device Layout!

More information

EEC 116 Fall 2011 Lab #2: Analog Simulation Tutorial

EEC 116 Fall 2011 Lab #2: Analog Simulation Tutorial EEC 116 Fall 2011 Lab #2: Analog Simulation Tutorial Dept. of Electrical and Computer Engineering University of California, Davis Issued: September 28, 2011 Due: October 12, 2011, 4PM Reading: Rabaey Chapters

More information

Low-Voltage Wide Linear Range Tunable Operational Transconductance Amplifier

Low-Voltage Wide Linear Range Tunable Operational Transconductance Amplifier Low-Voltage Wide Linear Range Tunable Operational Transconductance Amplifier A dissertation submitted in partial fulfillment of the requirement for the award of degree of Master of Technology in VLSI Design

More information

Design of Adders with Less number of Transistor

Design of Adders with Less number of Transistor Design of Adders with Less number of Transistor Mohammed Azeem Gafoor 1 and Dr. A R Abdul Rajak 2 1 Master of Engineering(Microelectronics), Birla Institute of Technology and Science Pilani, Dubai Campus,

More information

Design Strategy for a Pipelined ADC Employing Digital Post-Correction

Design Strategy for a Pipelined ADC Employing Digital Post-Correction Design Strategy for a Pipelined ADC Employing Digital Post-Correction Pieter Harpe, Athon Zanikopoulos, Hans Hegt and Arthur van Roermund Technische Universiteit Eindhoven, Mixed-signal Microelectronics

More information

! Review: MOS IV Curves and Switch Model. ! MOS Device Layout. ! Inverter Layout. ! Gate Layout and Stick Diagrams. ! Design Rules. !

! Review: MOS IV Curves and Switch Model. ! MOS Device Layout. ! Inverter Layout. ! Gate Layout and Stick Diagrams. ! Design Rules. ! ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 3: January 21, 2016 MOS Fabrication pt. 2: Design Rules and Layout Lecture Outline! Review: MOS IV Curves and Switch Model! MOS Device Layout!

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 3: January 21, 2016 MOS Fabrication pt. 2: Design Rules and Layout Penn ESE 570 Spring 2016 Khanna Adapted from GATech ESE3060 Slides Lecture

More information

Towards PVT-Tolerant Glitch-Free Operation in FPGAs

Towards PVT-Tolerant Glitch-Free Operation in FPGAs Towards PVT-Tolerant Glitch-Free Operation in FPGAs Safeen Huda and Jason H. Anderson ECE Department, University of Toronto, Canada 24 th ACM/SIGDA International Symposium on FPGAs February 22, 2016 Motivation

More information

Contents. Contents... v. Preface... xiii. Chapter 1 Introduction...1. Chapter 2 Significant Physical Effects In Modern MOSFETs...

Contents. Contents... v. Preface... xiii. Chapter 1 Introduction...1. Chapter 2 Significant Physical Effects In Modern MOSFETs... Contents Contents... v Preface... xiii Chapter 1 Introduction...1 1.1 Compact MOSFET Modeling for Circuit Simulation...1 1.2 The Trends of Compact MOSFET Modeling...5 1.2.1 Modeling new physical effects...5

More information

Designing Information Devices and Systems II Fall 2017 Note 1

Designing Information Devices and Systems II Fall 2017 Note 1 EECS 16B Designing Information Devices and Systems II Fall 2017 Note 1 1 Digital Information Processing Electrical circuits manipulate voltages (V ) and currents (I) in order to: 1. Process information

More information

A new 6-T multiplexer based full-adder for low power and leakage current optimization

A new 6-T multiplexer based full-adder for low power and leakage current optimization A new 6-T multiplexer based full-adder for low power and leakage current optimization G. Ramana Murthy a), C. Senthilpari, P. Velrajkumar, and T. S. Lim Faculty of Engineering and Technology, Multimedia

More information

Analysis and Reduction of On-Chip Inductance Effects in Power Supply Grids

Analysis and Reduction of On-Chip Inductance Effects in Power Supply Grids Analysis and Reduction of On-Chip Inductance Effects in Power Supply Grids Woo Hyung Lee Sanjay Pant David Blaauw Department of Electrical Engineering and Computer Science {leewh, spant, blaauw}@umich.edu

More information

Gate Delay Estimation in STA under Dynamic Power Supply Noise

Gate Delay Estimation in STA under Dynamic Power Supply Noise Gate Delay Estimation in STA under Dynamic Power Supply Noise Takaaki Okumura *, Fumihiro Minami *, Kenji Shimazaki *, Kimihiko Kuwada *, Masanori Hashimoto ** * Development Depatment-, Semiconductor Technology

More information

Lecture 11 Circuits numériques (I) L'inverseur

Lecture 11 Circuits numériques (I) L'inverseur Lecture 11 Circuits numériques (I) L'inverseur Outline Introduction to digital circuits The inverter NMOS inverter with resistor pull-up 6.12 Spring 24 Lecture 11 1 1. Introduction to digital circuits:

More information

CHAPTER 3 NEW SLEEPY- PASS GATE

CHAPTER 3 NEW SLEEPY- PASS GATE 56 CHAPTER 3 NEW SLEEPY- PASS GATE 3.1 INTRODUCTION A circuit level design technique is presented in this chapter to reduce the overall leakage power in conventional CMOS cells. The new leakage po leepy-

More information

PROCESS and environment parameter variations in scaled

PROCESS and environment parameter variations in scaled 1078 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 10, OCTOBER 2006 Reversed Temperature-Dependent Propagation Delay Characteristics in Nanometer CMOS Circuits Ranjith Kumar

More information

5. CMOS Gates: DC and Transient Behavior

5. CMOS Gates: DC and Transient Behavior 5. CMOS Gates: DC and Transient Behavior Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 September 18, 2017 ECE Department, University

More information

DAT175: Topics in Electronic System Design

DAT175: Topics in Electronic System Design DAT175: Topics in Electronic System Design Analog Readout Circuitry for Hearing Aid in STM90nm 21 February 2010 Remzi Yagiz Mungan v1.10 1. Introduction In this project, the aim is to design an adjustable

More information

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo Digital Integrated Circuits Designing Combinational Logic Circuits Fuyuzhuo Introduction Digital IC Combinational vs. Sequential Logic In Combinational Logic Circuit Out In Combinational Logic Circuit

More information

A SIGNAL DRIVEN LARGE MOS-CAPACITOR CIRCUIT SIMULATOR

A SIGNAL DRIVEN LARGE MOS-CAPACITOR CIRCUIT SIMULATOR A SIGNAL DRIVEN LARGE MOS-CAPACITOR CIRCUIT SIMULATOR Janusz A. Starzyk and Ying-Wei Jan Electrical Engineering and Computer Science, Ohio University, Athens Ohio, 45701 A designated contact person Prof.

More information

Design and Performance Analysis of SOI and Conventional MOSFET based CMOS Inverter

Design and Performance Analysis of SOI and Conventional MOSFET based CMOS Inverter I J E E E C International Journal of Electrical, Electronics ISSN No. (Online): 2277-2626 and Computer Engineering 3(2): 138-143(2014) Design and Performance Analysis of SOI and Conventional MOSFET based

More information

EEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis

EEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis EEC 216 Lecture #1: Ultra Low Voltage and Subthreshold Circuit Design Rajeevan Amirtharajah University of California, Davis Opportunities for Ultra Low Voltage Battery Operated and Mobile Systems Wireless

More information

Process-sensitive Monitor Circuits for Estimation of Die-to-Die Process Variability

Process-sensitive Monitor Circuits for Estimation of Die-to-Die Process Variability Process-sensitive Monitor Circuits for Estimation of Die-to-Die Process Variability Islam A.K.M Mahfuzul Department of Communications and Computer Engineering Kyoto University mahfuz@vlsi.kuee.kyotou.ac.jp

More information

Circuit Simulation with SPICE OPUS

Circuit Simulation with SPICE OPUS Circuit Simulation with SPICE OPUS Theory and Practice Tadej Tuma Arpäd Bürmen Birkhäuser Boston Basel Berlin Contents Abbreviations About SPICE OPUS and This Book xiii xv 1 Introduction to Circuit Simulation

More information

Jack Keil Wolf Lecture. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. MOSFET N-Type, P-Type.

Jack Keil Wolf Lecture. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. MOSFET N-Type, P-Type. ESE 570: Digital Integrated Circuits and VLSI Fundamentals Jack Keil Wolf Lecture Lec 3: January 24, 2019 MOS Fabrication pt. 2: Design Rules and Layout http://www.ese.upenn.edu/about-ese/events/wolf.php

More information

STATISTICAL DESIGN AND YIELD ENHANCEMENT OF LOW VOLTAGE CMOS ANALOG VLSI CIRCUITS

STATISTICAL DESIGN AND YIELD ENHANCEMENT OF LOW VOLTAGE CMOS ANALOG VLSI CIRCUITS STATISTICAL DESIGN AND YIELD ENHANCEMENT OF LOW VOLTAGE CMOS ANALOG VLSI CIRCUITS Istanbul Technical University Electronics and Communications Engineering Department Tuna B. Tarim Prof. Dr. Hakan Kuntman

More information

Power Estimation. Naehyuck Chang Dept. of EECS/CSE Seoul National University

Power Estimation. Naehyuck Chang Dept. of EECS/CSE Seoul National University Power Estimation Naehyuck Chang Dept. of EECS/CSE Seoul National University naehyuck@snu.ac.kr 1 Contents Embedded Low-Power ELPL Laboratory SPICE power analysis Power estimation basics Signal probability

More information

TECHNOLOGY scaling, aided by innovative circuit techniques,

TECHNOLOGY scaling, aided by innovative circuit techniques, 122 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006 Energy Optimization of Pipelined Digital Systems Using Circuit Sizing and Supply Scaling Hoang Q. Dao,

More information

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique Total reduction of leakage power through combined effect of Sleep and variable body biasing technique Anjana R 1, Ajay kumar somkuwar 2 Abstract Leakage power consumption has become a major concern for

More information

Chapter 3 DESIGN OF ADIABATIC CIRCUIT. 3.1 Introduction

Chapter 3 DESIGN OF ADIABATIC CIRCUIT. 3.1 Introduction Chapter 3 DESIGN OF ADIABATIC CIRCUIT 3.1 Introduction The details of the initial experimental work carried out to understand the energy recovery adiabatic principle are presented in this section. This

More information

Fast Statistical Timing Analysis By Probabilistic Event Propagation

Fast Statistical Timing Analysis By Probabilistic Event Propagation Fast Statistical Timing Analysis By Probabilistic Event Propagation Jing-Jia Liou, Kwang-Ting Cheng, Sandip Kundu, and Angela Krstić Electrical and Computer Engineering Department, University of California,

More information

Lecture 12 - Digital Circuits (I) The inverter. October 20, 2005

Lecture 12 - Digital Circuits (I) The inverter. October 20, 2005 6.12 - Microelectronic Devices and Circuits - Fall 25 Lecture 12-1 Lecture 12 - Digital Circuits (I) The inverter October 2, 25 Contents: 1. Introduction to digital electronics: the inverter 2. NMOS inverter

More information

An Improved Bandgap Reference (BGR) Circuit with Constant Voltage and Current Outputs

An Improved Bandgap Reference (BGR) Circuit with Constant Voltage and Current Outputs International Journal of Research in Engineering and Innovation Vol-1, Issue-6 (2017), 60-64 International Journal of Research in Engineering and Innovation (IJREI) journal home page: http://www.ijrei.com

More information

On-Chip Transistor Characterization Arrays with Digital Interfaces for Variability Characterization *

On-Chip Transistor Characterization Arrays with Digital Interfaces for Variability Characterization * On-Chip Transistor Characterization Arrays with Digital Interfaces for Variability Characterization * Simeon Realov, William McLaughlin, K. L. Shepard Department of Electrical Engineering, Columbia University

More information

Body Voltage Estimation in Digital PD-SOI Circuits and Its Application to Static Timing Analysis

Body Voltage Estimation in Digital PD-SOI Circuits and Its Application to Static Timing Analysis 888 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 7, JULY 2001 Body Voltage Estimation in Digital PD-SOI Circuits and Its Application to Static Timing Analysis

More information

LOW VOLTAGE / LOW POWER RAIL-TO-RAIL CMOS OPERATIONAL AMPLIFIER FOR PORTABLE ECG

LOW VOLTAGE / LOW POWER RAIL-TO-RAIL CMOS OPERATIONAL AMPLIFIER FOR PORTABLE ECG LOW VOLTAGE / LOW POWER RAIL-TO-RAIL CMOS OPERATIONAL AMPLIFIER FOR PORTABLE ECG A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BORAM LEE IN PARTIAL FULFILLMENT

More information

I. Digital Integrated Circuits - Logic Concepts

I. Digital Integrated Circuits - Logic Concepts I. Digital Integrated Circuits - Logic Concepts. Logic Fundamentals: binary mathematics: only operate on and (oolean algebra) simplest function -- inversion = symbol for the inverter INPUT OUTPUT EECS

More information

Separation and Extraction of Short-Circuit Power Consumption in Digital CMOS VLSI Circuits

Separation and Extraction of Short-Circuit Power Consumption in Digital CMOS VLSI Circuits Separation and Extraction of Short-Circuit Power Consumption in Digital CMOS VLSI Circuits Atila Alvandpour, Per Larsson-Edefors, and Christer Svensson Div of Electronic Devices, Dept of Physics, Linköping

More information

EE584 (Fall 2006) Introduction to VLSI CAD Project. Design of Ring Oscillator using NOR gates

EE584 (Fall 2006) Introduction to VLSI CAD Project. Design of Ring Oscillator using NOR gates EE584 (Fall 2006) Introduction to VLSI CAD Project Design of Ring Oscillator using NOR gates By, Veerandra Alluri Vijai Raghunathan Archana Jagarlamudi Gokulnaraiyn Ramaswami Instructor: Dr. Joseph Elias

More information

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo Digital Integrated Circuits Designing Combinational Logic Circuits Fuyuzhuo Introduction Digital IC Combinational vs. Sequential Logic In Combinational Logic Circuit Out In Combinational Logic Circuit

More information

VLSI Design I; A. Milenkovic 1

VLSI Design I; A. Milenkovic 1 CPE/EE 427, CPE 527 VLSI Design I L02: Design Metrics Department of Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic ( www.ece.uah.edu/~milenka ) www.ece.uah.edu/~milenka/cpe527-03f

More information

Introduction. Timing Verification

Introduction. Timing Verification Timing Verification Sungho Kang Yonsei University YONSEI UNIVERSITY Outline Introduction Timing Simulation Static Timing Verification PITA Conclusion 2 1 Introduction Introduction Variations in component

More information

DESIGN OF MULTI-BIT DELTA-SIGMA A/D CONVERTERS

DESIGN OF MULTI-BIT DELTA-SIGMA A/D CONVERTERS DESIGN OF MULTI-BIT DELTA-SIGMA A/D CONVERTERS DESIGN OF MULTI-BIT DELTA-SIGMA A/D CONVERTERS by Yves Geerts Alcatel Microelectronics, Belgium Michiel Steyaert KU Leuven, Belgium and Willy Sansen KU Leuven,

More information

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS 70 CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS A novel approach of full adder and multipliers circuits using Complementary Pass Transistor

More information

Novel Buffer Design for Low Power and Less Delay in 45nm and 90nm Technology

Novel Buffer Design for Low Power and Less Delay in 45nm and 90nm Technology Novel Buffer Design for Low Power and Less Delay in 45nm and 90nm Technology 1 Mahesha NB #1 #1 Lecturer Department of Electronics & Communication Engineering, Rai Technology University nbmahesh512@gmail.com

More information

Electronic Circuits EE359A

Electronic Circuits EE359A Electronic Circuits EE359A Bruce McNair B206 bmcnair@stevens.edu 201-216-5549 1 Memory and Advanced Digital Circuits - 2 Chapter 11 2 Figure 11.1 (a) Basic latch. (b) The latch with the feedback loop opened.

More information

Digital Systems Power, Speed and Packages II CMPE 650

Digital Systems Power, Speed and Packages II CMPE 650 Speed VLSI focuses on propagation delay, in contrast to digital systems design which focuses on switching time: A B A B rise time propagation delay Faster switching times introduce problems independent

More information

A Novel Radiation Tolerant SRAM Design Based on Synergetic Functional Component Separation for Nanoscale CMOS.

A Novel Radiation Tolerant SRAM Design Based on Synergetic Functional Component Separation for Nanoscale CMOS. A Novel Radiation Tolerant SRAM Design Based on Synergetic Functional Component Separation for Nanoscale CMOS. Abstract This paper presents a novel SRAM design for nanoscale CMOS. The new design addresses

More information

A Survey of the Low Power Design Techniques at the Circuit Level

A Survey of the Low Power Design Techniques at the Circuit Level A Survey of the Low Power Design Techniques at the Circuit Level Hari Krishna B Assistant Professor, Department of Electronics and Communication Engineering, Vagdevi Engineering College, Warangal, India

More information

MOS TRANSISTOR THEORY

MOS TRANSISTOR THEORY MOS TRANSISTOR THEORY Introduction A MOS transistor is a majority-carrier device, in which the current in a conducting channel between the source and the drain is modulated by a voltage applied to the

More information

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment 1014 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 7, JULY 2005 Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment Dongwoo Lee, Student

More information

UNIT-1 Fundamentals of Low Power VLSI Design

UNIT-1 Fundamentals of Low Power VLSI Design UNIT-1 Fundamentals of Low Power VLSI Design Need for Low Power Circuit Design: The increasing prominence of portable systems and the need to limit power consumption (and hence, heat dissipation) in very-high

More information

Domino Static Gates Final Design Report

Domino Static Gates Final Design Report Domino Static Gates Final Design Report Krishna Santhanam bstract Static circuit gates are the standard circuit devices used to build the major parts of digital circuits. Dynamic gates, such as domino

More information

Power Supply Networks: Analysis and Synthesis. What is Power Supply Noise?

Power Supply Networks: Analysis and Synthesis. What is Power Supply Noise? Power Supply Networs: Analysis and Synthesis What is Power Supply Noise? Problem: Degraded voltage level at the delivery point of the power/ground grid causes performance and/or functional failure Lower

More information

Technology, Jabalpur, India 1 2

Technology, Jabalpur, India 1 2 1181 LAYOUT DESIGNING AND OPTIMIZATION TECHNIQUES USED FOR DIFFERENT FULL ADDER TOPOLOGIES ARPAN SINGH RAJPUT 1, RAJESH PARASHAR 2 1 M.Tech. Scholar, 2 Assistant professor, Department of Electronics and

More information

Andrew Clinton, Matt Liberty, Ian Kuon

Andrew Clinton, Matt Liberty, Ian Kuon Andrew Clinton, Matt Liberty, Ian Kuon FPGA Routing (Interconnect) FPGA routing consists of a network of wires and programmable switches Wire is modeled with a reduced RC network Drivers are modeled as

More information

An Interconnect-Centric Approach to Cyclic Shifter Design

An Interconnect-Centric Approach to Cyclic Shifter Design An Interconnect-Centric Approach to Cyclic Shifter Design Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College. David M. Harris Harvey Mudd College. 1 Outline Motivation Previous Work Approaches Fanout-Splitting

More information

1 Digital EE141 Integrated Circuits 2nd Introduction

1 Digital EE141 Integrated Circuits 2nd Introduction Digital Integrated Circuits Introduction 1 What is this lecture about? Introduction to digital integrated circuits + low power circuits Issues in digital design The CMOS inverter Combinational logic structures

More information

ECE520 VLSI Design. Lecture 5: Basic CMOS Inverter. Payman Zarkesh-Ha

ECE520 VLSI Design. Lecture 5: Basic CMOS Inverter. Payman Zarkesh-Ha ECE520 VLSI Design Lecture 5: Basic CMOS Inverter Payman Zarkesh-Ha Office: ECE Bldg. 230B Office hours: Wednesday 2:00-3:00PM or by appointment E-mail: pzarkesh@unm.edu Slide: 1 Review of Last Lecture

More information

ELEC Digital Logic Circuits Fall 2015 Delay and Power

ELEC Digital Logic Circuits Fall 2015 Delay and Power ELEC - Digital Logic Circuits Fall 5 Delay and Power Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal

More information

A New Model for Thermal Channel Noise of Deep-Submicron MOSFETS and its Application in RF-CMOS Design

A New Model for Thermal Channel Noise of Deep-Submicron MOSFETS and its Application in RF-CMOS Design IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001 831 A New Model for Thermal Channel Noise of Deep-Submicron MOSFETS and its Application in RF-CMOS Design Gerhard Knoblinger, Member, IEEE,

More information

Minimizing the Sub Threshold Leakage for High Performance CMOS Circuits Using Stacked Sleep Technique

Minimizing the Sub Threshold Leakage for High Performance CMOS Circuits Using Stacked Sleep Technique International Journal of Electrical Engineering. ISSN 0974-2158 Volume 10, Number 3 (2017), pp. 323-335 International Research Publication House http://www.irphouse.com Minimizing the Sub Threshold Leakage

More information

Design of a High Speed Mixed Signal CMOS Mutliplying Circuit

Design of a High Speed Mixed Signal CMOS Mutliplying Circuit Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2004-03-12 Design of a High Speed Mixed Signal CMOS Mutliplying Circuit David Ray Bartholomew Brigham Young University - Provo

More information

Abstract of PhD Thesis

Abstract of PhD Thesis FACULTY OF ELECTRONICS, TELECOMMUNICATION AND INFORMATION TECHNOLOGY Irina DORNEAN, Eng. Abstract of PhD Thesis Contribution to the Design and Implementation of Adaptive Algorithms Using Multirate Signal

More information

Digital Microelectronic Circuits ( ) Terminology and Design Metrics. Lecture 2: Presented by: Adam Teman

Digital Microelectronic Circuits ( ) Terminology and Design Metrics. Lecture 2: Presented by: Adam Teman Digital Microelectronic Circuits (361-1-3021 ) Presented by: Adam Teman Lecture 2: Terminology and Design Metrics 1 Last Week Introduction» Moore s Law» History of Computers Circuit analysis review» Thevenin,

More information

Lab 7 (Hands-On Experiment): CMOS Inverter, NAND Gate, and NOR Gate

Lab 7 (Hands-On Experiment): CMOS Inverter, NAND Gate, and NOR Gate Lab 7 (Hands-On Experiment): CMOS Inverter, NAND Gate, and NOR Gate EECS 170LB, Wed. 5:00 PM TA: Elsharkasy, Wael Ryan Morrison Buu Truong Jonathan Lam 03/05/14 Introduction The purpose of this lab is

More information

STATISTICAL MODELING FOR COMPUTER-AIDED DESIGN OF MOS VLSI CIRCUITS

STATISTICAL MODELING FOR COMPUTER-AIDED DESIGN OF MOS VLSI CIRCUITS STATISTICAL MODELING FOR COMPUTER-AIDED DESIGN OF MOS VLSI CIRCUITS THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ANALOG CIRCUITS AND SIGNAL PROCESSING Consulting Editor Related titles:

More information

Analysis and Design of Low Power Ring Oscillators with Frequency ~ khz

Analysis and Design of Low Power Ring Oscillators with Frequency ~ khz Analysis and Design of Low Power Ring Oscillators with Frequency ~10-100 khz PRESENTED BY: PIYUSH KESHRI 3 rd year Undergraduate Student Indian Institute Of Technology, Kanpur, India University Of Michigan

More information

Design of Analog CMOS Integrated Circuits

Design of Analog CMOS Integrated Circuits Design of Analog CMOS Integrated Circuits Behzad Razavi Professor of Electrical Engineering University of California, Los Angeles H Boston Burr Ridge, IL Dubuque, IA Madison, WI New York San Francisco

More information