High-Speed VLSI Circuit Simulator Final Report Spring Semester PDF Free Download

High-Speed VLSI Circuit Simulator Final Report Spring Semester 2014 Prepared to partially fulfill the requirements for ECE402 By Qi Chen Pan Zhang Department of Electrical and Computer Engineering Colorado State University Fort Collins, Colorado 80523 Project Advisor: Prof. Sourajeet Roy Approved by: Prof. Sourajeet Roy

Abstract With the rapid increase in the processing speed and scaling in electronic feature sizes of integrated circuits below 45nm, the analysis and simulation of high speed interconnects has become a critical prerequisite for electrical engineers. Unlike in the past, where the effects of high-speed interconnects are negligible, interconnects are responsible for non-ideal results such as signal delay and attenuation which render circuits inoperable. The interconnect analysis becomes more important due to new technologies such as carbon based interconnects. This project aims to create a general purpose circuit simulator capable of computer aided design (CAD) of high-speed interconnects that will not only allow us to explore the current CAD paradigms but even test highly novel and advanced strategies capable of far greater accuracy and computational efficiency. The procedure of this project includes understanding the advanced mathematics and numerical algorithms behind circuit simulation, the stability, accuracy and computational costs of these algorithms, computer programming skills to efficiently execute these algorithms, as well as the skill to use commercial circuit solver tools for validation of these algorithms. This project will provide important insight of the methods of simulating high-speed copper and carbon nano-tube interconnects that is located either on-chip or at PCB level. The project will study the mathematical representation of circuits that is originally partial differential equations. This means that there is no exact solution in the time domain and corresponding approximations result in very high CPU cost. Thus, there is a need for an accurate, yet fast interconnect solver. It can be used for circuit designers to further understand the effect of highspeed interconnects and its effect on signal degradation. The project will address the current computational constraints of complex high-speed interconnect networks by exploring model order reduction method. The ability to do so will contribute to developing an efficient and robust solver that will help change the current state of the art of circuit simulation. 2

Table of Contents Abstract 2 Table of Contents 3 I. Introduction 5 II. Review of Previous Work 6 III. Applied Engineering Methods 7 A. Equivalent Circuit Formulation 7 B. Mathematical Models 8 B.1 Frequency Domain Representation 9 B.2 Time Domain Representation 10 B.3 Interconnect Modeling 10 C. Engineering Tools 12 IV. Objectives and Constraints 13 A. Goals 13 A.1 Fall Semester Objectives 13 A.2 Spring Semester Objectives 13 B. Technical Performance Measurements 13 B.1 Accuracy 14 B.2 Computation Time 14 C. Risks and Constraints 14 V. Testing and Evaluation Plan 15 A. Frequency Domain 15 B. Time Domain 16 VI. Current Design Process 17 A. Items Completed 17 B. Test Simulation Results 18 B.1 Frequency Domain Simulation Results 18 B.2 Time Domain Simulation Results 20 C. Re-Evaluation and Simulation Improvement 21 C.1 Frequency Domain Simulation Improvement 22 C.2 Time Domain Un-Coupled Simulation Improvement 23 C.3 Time Domain Coupled Simulation Improvement 24 VII. Model Order Reduction 25 A. Methods and Algorithms 25 B. MOR Implemented Simulation Results 27 VIII. User Manual 29 A. Input Format 29 A.1 Frequency Domain Input Format 29 A.2 Time Domain Input Format 30 A.3 Coupled Model Input Format 30 B. Compiling and Extracting Results 32 IX. Conclusion and Future Work 33 A. Future Development 33 A.1 Parallel Simulation 33 A.2 User Interface plus Feedback 33 A.3 Carbon Nano Tube Characterization 33 B. Conclusion 34 X. References 35 XI. Appendix A Stamps 36 XII. Appendix B Abbreviations 40 XIII. Appendix C - Budget 41 XIV. Appendix D LAPACK Code & Matlab Script 42 XV. Appendix E Timeline 46 3

List of Figures Figure 1: Interconnect Hierarchy Figure 2: HSI distortions Figure 3: Sample Circuit Formulation Figure 4: Resistor Stamp Figure 5: Inductor Stamp Figure 6: Backward Euler Method Figure 7: Trapezoidal Method Figure 8: Telegrapher s Equations Figure 9: Lumped Model Derivation Figure 10: model for Single-conductor transmission line Figure 11: model for Multi-conductor transmission line Figure 12: Test Template 1 Figure 13: Test Template 2 Figure 14: Test Template 3 Figure 15: Magnitude Comparison Figure 16: Phase Comparison Figure 17: V1 for d=0.5cm Figure 18: CPU cost vs Length Figure 19: V2 for d=0.5cm Figure 20: Improved Frequency Domain Result V3 Magnitude Figure 21: Improved Frequency Domain Result V3 Phase Figure 22: Improved Time Domain Result V3 Figure 23: Improved Time Domain Coupled Result V2 Figure 24: MOR based on congruent transformation Figure 25: MOR Frequency Domain V3 Magnitude Figure 26: MOR Frequency Domain V3 Phase Figure 27: MOR Time Domain V3 Figure 28: Frequency Domain CPU Time Reduction with MOR Figure 29: Time Domain CPU Time Reduction with MOR Figure 30: Frequency Domain Sample Input Figure 31: Time Domain Sample Input Figure 32: Coupled Time Domain Sample Input Figure 33: Waveform Relaxation Partitioning List of Tables Table 1: Current Progress 4

I. INTRODUCTION In the concentration of Very Large Scale Integration (VLSI) and integrated circuits, interconnects are known as the structure that provides the electrical connection between elements to allow signals to propagate. Interconnects, varied by its electrical signal distance such as boardto-board and chip-to-chip, can range from simple copper wires to high tech carbon nano-tubes and transmission lines. [6] Figure 1-Interconnect Hierarchy [6] For the past decades, the effects of VLSI high-speed interconnect can no longer be ignored in the design of high-speed integrated systems. The current state of the art integrated VLSI circuits has very high density and can operate up to multi Giga hertz clock rates, hence the high-speed. The rapid increase of operating frequencies not only suppresses the accuracy of modeling circuit and simulation, but also physical optimization. Therefore, in order to increase the circuit performance, merely decreasing the size of the chip and its components are not enough to the point that the emphasis requires the attention of distortion and delay of signal caused by interconnect. The current challenges of simulating the effects of interconnect can contributed to its distributed model that depends on dimension, as well as the tradeoffs between rigorous computation cost and accurate results. [6] Figure 2-HSI distortions [6] The goal of the High-Speed VLSI Simulator (HSVS) is to develop a VLSI circuit solver that can be used not only as a general circuit simulator, but also as a tool to simulate and analyze the effects of high-speed interconnects in an attempt to address the current predicament and industry need. The procedure of this project includes understanding the advanced mathematics and numerical algorithms behind circuit simulation, the stability, accuracy and computational costs of these algorithms, computer programming skills to efficiently execute these algorithms, as well as the skill to use commercial circuit solver tools for validation of these algorithms. This project will provide important insight of the methods of simulating high-speed copper and 5

carbon nano-tube interconnects that is located either on-chip or at PCB level. It can be used for circuit designers to further understand the effect of high-speed interconnects and its effect on signal degradation. The project will address the current computational constraints of complex high-speed interconnect networks by exploring model order reduction (MOR) method and parallel waveform relaxation method. The ability to do so will contribute to developing an efficient and robust solver that will help change the current state of the art of circuit simulation. This report will describe the ongoing work and accomplished tasks in the fall 2014 semester as well as the plan for future work. Chapter II briefly reviews the previous work on the simulation of interconnects and its impacts from researchers and industry leaders. Chapter III illustrates in detail of the applied engineering methods that are effectively implemented in this solver. Chapter IV II. REVIEW OF PREVIOUS WORK Prior to delving into software development for the VLSI interconnect solver, a thorough investigation of previous work is needed to strengthen the understanding of the engineering problem and methods to deliver the solution. This chapter will briefly describe approaches used by researchers that laid the foundation for interconnect analysis. A significant constraint that hinders the timely and accurate analysis of high-speed interconnect is the nature of its distributed model. In traditional VLSI circuit solvers, the method of translating circuit schematics into equivalent mathematical representation is through a method the Modified Nodal Analysis (MNA) or Stamp. Each circuit element has a unique stamp derived from MDA, which conjoins into an ordinary differential equations representation that can be easily solved in both time and frequency domain. This fundamental method is suitable for the use of HSVS and will be discussed in detail in later chapters. However, unlike the lumped circuit model where the circuits can be represented by lumped elements such as resistors, capacitors, and inductors which only depend on time, the distributed model of interconnects not only depend on time but also on the geometry and shape. Mathematically, the distributed model is characterized by Telegrapher s Partial differential equations which creates a time and frequency domain mismatch. This means that the frequency domain solution does not have an exact depiction in time domain. In order to resolve this dilemma, the utilization of different types of integration methods is necessary to approximate a solution. General VLSI circuit simulators such as Cadence and HSPICE already embody the capabilities to simulate VLSI circuits with interconnect. The method of transforming the circuit representation to a mathematical model is through the delivery of something called the net-list. The net-list details the specifications of the circuit such as the element and its value, as well as the requirements for the simulation. The net-list effectively provides a user interface for the program, and is referenced to as a general guideline for the user interface for HSVS. 6

III. APPLIED ENGINEERING METHODS A. Equivalent Circuit Formulation The HSVS design is geared towards a general VLSI circuit solver, which means the ability to correctly transform the net-list circuit representation to its equivalent mathematical representation is the first and foremost step. The method used in equivalent circuit formulation is called Stamp or Stamping. The stamp method is based off of Kirchhoff s Current Law, where each KCL generated equation is organized in matrix form for a circuit network. Figure 3-Sample Circuit Formulation [5] Figure 3 depicts the sample equivalent circuit formulation of a resistive network using KVL. The stamp of the equivalent circuit is reflected by the left hand side matrix. 7

Figure 4-Resistor Stamp [4] Each circuit element has its unique equivalent circuit stamp. Figure 4 shows the stamp for the resistor. However, ordinary nodal analysis, even though is sufficient for most circuit elements, is not suitable for the stamp of elements such as inductor and independent voltage source due to complications of solving the differential equations in frequency domain. The HSVS uses the modified nodal analysis (MNA) that introduce a new current variable for the stamp of the inductor and independent voltage source, which adds a new row and column to the left hand side matrix. Figure 5-Inductor Stamp [4] The complete stamps used in HSVS are shown in Appendix A. B. Mathematical Models The mathematical representations of the equivalent circuit after transforming the 8

elements into their respective stamps can be summarized as the following. B.1 Frequency Domain Representation In frequency domain simulation, the equivalent circuit can be represented by ordinary differential equations in the form of Gx + scx = B (1) s = j2πf Whereas s is the Laplace transform equivalent of dx, and G, C, and B are left hand and dt right hand matrices respectively. The ODE can be solved for its exact solution in the frequency domain by matrix inversion. For large matrices operations, inverting matrix can be done by LU decomposition and forward and backwards substitution. The LU decomposition is accomplished by using the existing LAPACK routine which uses the following method. The complexity, in terms of computational cost associated with the use of LU decomposition is O(n a ), where n is the size of the matrices, and a is between 1.5 and 2. A n = G + sc 9

B.2 Time Domain Representation In time domain simulation, the equivalent circuit can be expressed as ordinary differential equation. Gx + C dx = B (2) dt The time domain results can be obtained by using numerical integration methods, in the case of solving electrical networks, linear multi-step numerical integration method is more suitable than others to approximate a solution. Figure 6-Backward Euler method [5] Figure 7-Trapezoidal method [5] The linear multi-step numerical integration methods used in HSVS are implicit methods which include the Backward Euler and Trapezoidal methods shown in Figure 6 and 7. B.3 Interconnect Modeling In order to simulate interconnects in a circuit network, appropriate electrical models are required to characterize the distributed nature of interconnects. Currently, there are many models in electrical engineering studies that can be used to characterize interconnect. For the design of HSVS, the Quasi-Transverse Electromagnetic Model (TEM) is used among others since the approximation is valid for most practical interconnect structures and offers relative ease and low CPU cost compared to full wave approaches. [6] Using the TEM distributed method, the voltages and currents of single, but most importantly, multi-conductor interconnects are expressed by Telegrapher s equations. 10

Figure 8-Telegrapher's equation [6] Figure 8 shows the Telegrapher s equations for multi-conductor interconnect network for both time and frequency domain. However, since it is represented as partial differential equations, numerical techniques are needed to convert the distributed model into ordinary differential equations. The conventional lumped model is linear ordinary differential model derived from the distributed model of the Telegrapher s equations. For the purpose of HSVS, the model is used for single conductor frequency and time domain simulations, as well as multiconductor simulations with inductive and capacitive coupling. Figure 9-Lumped Model Derivation [6] 11

Figure 10- model for single-conductor transmission line Figure 11- model for multi-conductor transmission line [6] C. Engineering Tools The development of the HSVS requires a multitude of engineering tools to write and test the programs. Primary tools: CSU CRAY Supercomputer C++ programming platform and GNU C++ compiler HSPICE for result reference Secondary tools: Matlab LAPACK routine for matrix operations 12

IV. OBJECTIVES AND CONSTRAINTS A. Goals The goal of the senior design project is to develop a robust VLSI circuit solver that is able to simulate single and multi-conductor interconnect in both time and frequency domain with sufficient accuracy and speed. Of course, the success of this project depends solely on the experimentation, the testing accuracy with which the simulation shows compared to an industry standard reference. The following section describes the steps necessary to achieve the expectations by the end of the fall 2014 semester, as well as the plan of action for the spring 2015 semester. A.1 Fall semester objectives Introduction to C++ programming language and HSPICE solver infrastructure and CRAY high performance computing platform Create a program that can generate an equivalent circuit from any given interconnect layout Create a program that can translate the input equivalent circuit to a mathematical model in a form of an ordinary differential equations (ODEs) shown in equation 2 Solve the equation in frequency domain by using LU decomposition Solve equation 2 in time domain using numerical integration methods backward Euler, and trapezoidal rule. Validation of the frequency and time domain simulations using results from HSPICE A.2 Spring semester objectives Investigate computing accuracy, CPU time, and stability of integration of these integration methods Address the computation time and memory costs for large and complex interconnect networks via parallel simulation using waveform relaxation algorithm and/or model order reduction method B. Technical Performance Measurements As mentioned in previous chapters, the parameters of validation of the simulation from the HSVS are done through the comparison with HPSICE and can be effectively characterized by the following technical performance measurements (TPMs). 13

B.1 Accuracy The accuracy of the simulation in frequency and time domain can be quantified visually through plotting both results from HSVS and HSPICE, and observed visually. Of course, for frequency domain comparison, both magnitude and phase results need to be considered. Besides the visual observation, the accuracy is illustrated more importantly by the use of error norm. e = 1 N S H 2 N 1 (3) Where h represents the step size of the frequency sampling, S and H represents the node voltages from the solver and HSPICE respectively. The performance specification of the solver requires the error norm to be less than 5%. Consequently, this allows the optimization to be done concerning the number of cascading sections n when equating the interconnect layout to the mathematical representation for that n is expected to be the main cause of accuracy. Thus, by knowing the error norm, n can be increased accordingly to meet the performance spec. B.2 Computation Time The computation time, also known as CPU time, is a TPM that is characterized for optimization. The CPU time cannot be compared with the respective computation time from HSPICE, but can be used to set a perspective in order to increase the speed of simulation. n = 20d LC T r (4) From equation 4, the cascade section number n is described. Knowing and expecting that increasing n would allow the accuracy of the solver to increase, but at the same time the computation time would also increase as a result. Thus, the evaluation of Tr (rise time) and d (interconnect length) is necessary to optimize the computational cost. The computational cost of frequency analysis of interconnect networks is expected to be O(n a ), where a ranges from 1.5 to 2. So, as a validation method, n can be varied for a number of simulations to tabulate the respective a. The factor a can be compared with the expected value to show the solver s performance. C. Risks and Constraints The potential risks of this project, besides the technical difficulties of the code, consist of the following: 1. One of the challenges is the fact that the per unit lengths structure of carbon nano-tubes are difficult to obtain. Although we can choose to use the per unit lengths numbers from test cases in published literature or do analytic approximation, the challenge simply shifts to mathematical proficiency. 14

2. Another challenge can be the solver s robustness in terms of its stability, CPU time, and accuracy. The type of integration methods chosen can result in different stability, CPU time, and accuracy characteristics. Therefore we need to configure the system such that it allows flexibility for the user to choose the appropriate combination of these characteristics for his/her simulation. For this particular project, the structural procedure of creating a VLSI solver is the main frame and therefore cannot be changed. However, the options to choose two alternatives to address the computing time problem for large interconnect networks is encouraged. They are parallel simulation using waveform relaxation algorithm and model order reduction method. The parallel simulation using waveform relaxation algorithm can reduce computation time of multilevel interconnect simulation, but has an issue of convergence which will require an investigation of hybrid iteration methods. This plan requires rigorous mathematics incorporated algorithm coding and can be very challenging but nonetheless very rewarding. The alternative model order reduction method is the other route if we choose to or if the first option fails. This method basically reduces the order of the unknown variables, and thus the computation time can be reduced. However, this method presents the tradeoff for a less accurate simulation. [8] V. Testing and Evaluation Plan The test and evaluation plan for this software based project can be categorized through the performance requirements with respect to the timely deliverables. The fall semester deliverable consist of the general interconnect simulator with the ability to perform frequency domain and time domain analysis. The spring semester deliverable will primarily consist of the investigation of model order reduction module to improve the performance of the existing interconnect solver. In order to test the performance of the solver in a reliable and timely fashion, having good reference results is necessary. HSPICE is a commercial circuit simulator and an industrial standard. [3] The testing procedure will involve the comparison between the results of HSPICE and this project and quantify them in terms of accuracy and CPU computation time. The detailed testing and evaluation plan is listed as following. [9] A. Frequency Domain Prior to the comparison to HSPICE, the code needs to be validated thru debugging of the modules used as well as the interactions between them. The debugging procedures can ensure the correct operations from input to output. In order to compare the simulation results of a interconnect circuit to the simulation results from HSPICE, a test schematic is needed such as the one shown in figure 1. 15

Figure 12-Testing template 1 Figure 13-Test template 2 Compare accuracy using L 2 norm of error, and optimize accordingly CPU time using C++ functions B. Time Domain The time domain analysis mainly focuses on utilizing the Backward Euler and Trapezoidal integration methods to approximate a linear representation of the time domain differential equation, which are the mathematical representation of the interconnect circuit in the time domain. Since the program allows the selection of one of the two integration methods to be used during the time domain simulation, the performance of both should be examined separately. Figure 14 shows the multi-conductor testing schematic primarily designed for time domain 16

simulations. Figure 14-Testing Template 3 Compare accuracy using L 2 norm of error, and optimize accordingly CPU time using C++ functions VI. CURRENT DESIGN PROCESS A. Items Completed Currently, at the end of the fall 2014 semester, the expected progress of designing the HSVS has been listed in Appendix E, and the details of current progress can be summarized by the following table. 17

Table 1-Current Progress Completed Items C++ module for equivalent circuit transformation from input net-list C++ module for transforming input interconnect component to equivalent circuit representation Delayed Progress Inaccurate simulation results in frequency domain simulations has delayed progress to optimize the CPU time and accuracy C++ function to call LAPACK function to perform complex matrix LU decomposition and forward, backward substitution C++ code to solve circuit in Time Domain C++ code to solve circuit in Frequency Domain Accuracy calculations due to errors in data collection from HSPICE HSPICE results comparison via Matlab modules and HSPICE-Matlab toolbox The above table illustrates the completed items this fall regarding scheduled expectations, as well as setbacks on items that created delay of progress. The results from the current design of HSVS will be thoroughly presented in the next section, for which the analysis on both the good and failed trials validate the work, as well as justify tasks for improvements in the future. B. Initial Test Simulation Results B.1 Frequency Domain Simulation Results The frequency domain simulation started by simulating the test template number 2 in figure 13, which has the following results. 18

Figure 15-Magnitude Comparison Figure 16-Phase Comparison The above figures reflect the frequency domain result in both the magnitude and phase comparisons. The blue curves denote the HSPICE result, whereas the red curve reflects the results of the HSVS. Clearly, the result is not as accurate as expected, from the noise shown in the phase comparison and the magnitude spike. In fact, this was the simulation result from the second trail of testing which actually reduced the amount of noise from the first trial shown in the Appendix F. The failed trails, although presented flaws in the program, can lead us in a correct direction to increase the accuracy in the future. 19

B.2 Time Domain Simulation Results The following time domain simulations reflect the data gathered from simulating test template number 3. The output extract points are the node at the end of the first and second interconnect labeled V1 and V2. The lengths of the interconnect, in this case, are 0.5cm. The complete simulation include d=0.5cm, d=2cm, and d=10cm, and are used for initial characterization of CPU complexity. Similarly, the red curve represents the HSVS result and the blue curve represents the HSPICE result. Figure 17-V1 for d=0.5cm Figure 18-CPU cost vs Length 20

Figure 19-V2 for d=0.5cm Visually observing the simultion comparisons of the two voltage outputs, several things can be concluded. Accuracy of simulation is a challenge when it comes to the coupled interconnect effects shown in Figure 18. The magnitude of V2 is significantly less than the one from HSPICE, despite resemblence of coupled transient effects. The time domain result comparison of V1, on the other hand, is very promising. The result overlaps most of the time with the HSPICE result with little defficiency. The initial computation time can be observed from figure 19. Where the computation time increases with a linear characteristic with the increase of interconnect length, which is expected. Overall, the time domain result reflect the current capabilities of the simulator. With future improvements and optimizations on accuracy and CPU time using methods discussed in the previous chapters, the solver can be more complete and robust. 21

C. Re-evaluation and Simulation Improvement Knowing what the HSVS solver is lacking in terms of the simulation accuracy, the next step for the team is to perform extensive re-evaluation process of the code through debugging and failure-mode inducement. Between the two team members, the re-evaluation process of both time and frequency domain, couple or un-coupled interconnect codes are distributed evenly to increase the efficiency. Several methods used to debugg the code and attempts to induce different failure modes are direct output of variables with usage of breaks in case of error, usage of Visual Studios to detect syntax error during or before the debugging process. In case of segmentation faults, which are access of false memories, the process becomes more challenging. C.1 Frequency Domain Simulation Improvement The first change introduced in the frequency domain code is the replacement of some size_t type by int type, thus there will be no discrepencies between the need of an unsigned int when passing the values between different loops. The accuracy of the frequency domain simulation is directly dependent on the number of sections of each interconnect, represented by the value n. Therefore, extensive testing of the optimal value of n was needed to detect a relatively accurate model for our purpose. The testing values of n ranged from 1.5 to 5 multiples of the original value. This helped the team determine that 1.9 to 2 multiples of origninal number of sections proved to be useful in increasing the accuracy. Furthermore, the use of the function named LU, which contains the codes that pass the parameters of the final equation to perform sparse matrix inversion using LU decomposition, has been modified and directly implement in the code int main() istead of a function of passing variables. The simulations, in this case, are accomplished by using the testing module 1 in figure 12 which contains 6 un-coupled interconnects with length of 10cm. The improved results are shown below taken across the 100 ohm resistor. 22

Figure 20-Improved Frequency Domain Result V3 Magnitude 23

Figure 21-Improved Frequency Domain Result V3 Phase C.2 Time Domain Un-Coupled Simulation Improvement Similar to the un-coupled frequency domain code, the un-coupled time domain program had issues that were fixed using the same methods. The simulation is done using testing template 1 for a total time of 30ns and a trapezoidal voltage source pulse with rise time of 0.1ns. The improved result is shown below. Figure 22-Improved Time Domain Result V3 24

C.3 Time Domain Coupled Simulation Improvement The coupled time domain simulation has certainly been a challenge due to its substaintial error of characterizing the inductive and capacitive coupled effects on the second interconnect of testing module 3. Through explicit erorr detecting methods such as variable outputs, and long periods of memory adjustment, the error was identified. Because the error comes from part of the program that partitions the interconnect in order to characterize the coupled effects, and by manually turning off either inductive or capacitive coupling, the error was narrowed down to couple lines of code dealing with node partitioning. As it turns out, the code was written incorrectly because the coupling node was accessed at the wrong place for both capacitive and inductive coupling. Fixing this drastically increased the accuracy of the simulation. Figure 23-Improved Time Domain Coupled Result V2 25

C. MODEL ORDER REDUCTION A. Methods and Algorithms Model order reduction (MOR) method essentially reduces the number of unknowns such as the node voltages and branch currents. This method is used to significantly decrease computation time for both time and frequency domain analysis. The following expressions illustrate the model order reduction approach, where x represents the number of unknown variables. This type of order reduction method uses a specific process called implicit moment matching techniques. In this case, it uses Krylov subspace approaches to project large matrices on its dominant eigen-space. [7] GX + C dx = Bu, orignial solver dt X = Qx, MOR dx G r x + C r dt = B ru dim (x) dim (X) (5) The above equations illustrates the basic steps of reducing the number of unknowns in a systems of equations by finding the matrix Q, which is a ortho-normalized matrix constructed through matrix operations on the Kylov subspace matrix. The reduced order system is then obtained using congruent transformation by substitution and pre-multiplying each side of original equation by Q T. Figure 24-MOR based on congruent transformation [7] 26

27 The algorithm used to compute matrix Q is called the Arnoldi algorithm, and is being used as follows.,,,,,,,, 3 2 3 2 1 z A z A Az z k k k k o K K q 2 1 0 q q q q Q Finding q 0 o o o k k q Finding q 1 1 1 1 1 1 1 1 v v q q q v v v Aq v o o o Finding q 2 2 2 2 1 1 2 2 2 2 1 2 v v q q q v q q v v v Aq v o o Finding q3 3 3 3 2 2 3 1 1 3 3 3 3 2 3 v v q q q v q q v q q v v v Aq v o o The Arnoldi Algorithm is a recursive algorithm that is easy to implement in a function and can be reused on the matrix Q if the matrix is not truly an ortho-normalized matrix of the Krylov matrix, in order to generate better result from a projection back to the original x vector. [7] The results from model order reduction needs to be tested and evaluated through comparison to both original solver results and HSPICE results. Accuracy o Visual observation o Evaluate the new n for the solver with MOR implemented, and optimize the error norm to be less than 5% comparing to HSPICE for both time and frequency domain simulations. CPU time o Compare the CPU time between all three methods. Since the CPU time of the MOR solver is expected to be lower than the original, a validation at the beginning is necessary. Then, the new n, d, and Tr, from equation 5, of the MOR solver can be evaluated appropriately to achieve the computational complexity O(p a ) where p is the dimension of x and a is from 1.5 to 2

B. MOR Implemented Simulation Results MOR based simulations were performed in both time a frequency domain for testing module 1 shown in the following figures. Figure 25-MOR Frequency Domain V3 Magnitude Figure 26-MOR Frequency Domain V3 Phase 28

Figure 27-MOR Time Domain V3 Figure 28-Frequency Domain CPU Time Reduction with MOR Figure 29-Time Domain CPU Time Reduction with MOR It is evident that the simulation results in time and frequency domain are simular to the original solver without MOR. However, as shown in figure 21, that there exist some discrepencies in the phase of the same output as compared to HSPICE. This is due to the inevatible trade offs between accuracy and computation time, where computation time has been greatly reduced but at the same time undermines the result accuracy. One way of improving the result is to increase the user determined rank of the Q matrix, which we started by using 400. Overall, the goal of using MOR to achieve significant decrease in computation time has been successful, shown by figure 23 and 24. 29

D. USER MANUAL A. Input format All input files for the all VLSI solvers are in.txt format. The basic circuit elements that the solver currently supports are: R (Resistor), L (Inductor), G (Conductor, 1/R), C (Capacitor), E (Independent voltage source), J (Current source), T (Transmission line), and P (Voltage control voltage source). In this section, the detailed net-listing style input file format will be described in detail for frequency and time domain, and coupled solvers. A.1 Frequency Domain Input Format Frequency domain HSVS is capable of reading the input file with the following circuit elements and their respective input formats. R Node1 Node2 Rval L Node1 Node2 Lval G Node1 Node2 Gval C Node1 Node2 Cval E Node1 Node2 Eval J Node1 Node2 Jval P Node1 Node2 Node3 Node4 Pval T Node1 Node2 L val C val G val R val Tr Length Figure 30-Frequency Domain Sample Input 30

A.2 Time Domain Input Format R Node1 Node2 Rval L Node1 Node2 Lval G Node1 Node2 Gval C Node1 Node2 Cval E Node1 Node2 Eval Tr Pw Tf Offset Tmax T Node1 Node2 L val C val G val R val Tr Length Figure 31-Time Domain Sample Input A.3 Coupled Model Input Format Lt Lt[1][1] Lt[1][2] Lt[1][TL] Lt[2][1] Lt[TL-1][TL] Lt[TL][1] Lt[TL][TL] Ct Ct[1][1] Ct[1][2] Ct[1][TL] Ct[2][1]...Ct[TL-1][TL] Ct[TL][1] Ct[TL][TL] R Node1 Node2 Rval L Node1 Node2 Lval G Node1 Node2 Gval C Node1 Node2 Cval T Node1 Node2 G val R val Tr Length E Node1 Node2 Eval Tr Pw Tf Offset Tmax (time domain) E Node1 Node2 Eval (frequency domain) 31

Figure 32-Coupled Time Domain Sample Input Notes: Node1 and node2 stand for two nodes of the single element; while, node3 and node4 refer to two nodes for dependent source Rval, Gval, Cval, Lval, Eval refer to values of resistance, conductance, capacitance, and inductance, and voltage source value. L val, C va, G val, and R val refer to per unit length parameters for transmission line. Tr, Pw, Tf, Offset, and Tmax stand for rise time, pulse width, offset, and whole time of the trapezoidal pulse in time domain. The unit of Rval, Cval, Lval, Gval, Eval, Time, Length, L val C val G val R val are in ohm, F, H, V, s, cm, H/cm, F/cm, mho/cm, and ohm/cm, respectively. Lt and Ct are per unit length parameters matrices for Coupled Model only, and they are required to be listed on the top (before T). T is required be listed at the bottom of the input file. For time domain, the VLSI solver supports voltage with constant or trapezoidal pulse. 32

B. Compiling and Extracting Results The HSVS was designed to work on Linux with CSU s super computer CRAY, since CRAY provides included libraries such as the Lapack Routine package which is the key open source to solve ordinary differential equations. If users have Lapck Routine package on their own system, they don t have to run the HSVS on a Linux based server as introduced below. CC time.c or frequency.c aprun a.out (wait until computation is done) Output (stored in final.txt ) SFTP (use window server to extract the result file) use MatLab to plot the results(download HSPICE toolbox) Results can be extracted from the designated result file called final.txt which will appear in the current directory after successful simulation. Using any available SFTP clients, the result file can be extracted to a Windows environment for plotting and processing in desired programs. Computation time of any simulation will be printed out on the screen once the computation accomplishes. 33

E. CONCLUSION AND FUTURE WORK A. Future Development A.1 Parallel Simulation Parallel simulation of interconnect mainly consist of using the Waveform Relaxation method to solve coupled interconnects. In order to correctly characterize the High-Speed Interconnect behavior and parallelize the simulation at the same time, transverse partitioning method of multi-conductor interconnects is needed, shown in figure 33. The method used to solve the circuit is through iteration methods of the waveform relaxation such as the Gauss-Jacobi, and Hybrid Gauss-Jacobi methods. [1] A.2 User Interface plus Feedback It is definitely necessary, as a user based software, to provide more feedback on a user friendly platfrom to allow the user to not only customize the desired simulation, but also being able to review the simulation for errors occurred internally. In chances of future design, given that the software coding part of HSVS has been improved and optimized, then designing and developing a user interface application can be the necessary next step. This interface should allow the user to interact with the solver in real time, and change parameters with ease. This will make the HSVS more functional, and at the same time, more marketable and can be appealing to the non electrical engineer users. This implementation will allow the original intended target users, which are engineers in chip manufacuring companies, to be expanded and thus more merchantable. A.3 Carbon Nano Tube Characterization Figure 33 -Waveform relaxation partitioning [1] If needed, the future development of the HSVS can include ability to CAD high tech interconnects such as Carbon Nano-Tubes. To do so, the use of a more sophisticated interconnect model other than the Lumped Model maybe needed for better characterization. The CAD of Carbon Nano-Tubes should include the length being within the milometer range. This implementation will definitely increase the marketability of the HSVS 34

B. Conclusion The purpose of developing a general circuit solver with HSI CAD capabilities has been fulfilled. The current semester serves as the preparation for the spring semester. This means that prior to the investigation of MOR and Parallel Simulations, the HSVS needs to be optimized according to the analysis of current results to solve circuits with appropriate accuracy and CPU time. During the design and development of this project, background literature reading of mathmatical algorithms and C++ coding allowed for increased efficiency. Furthermore, the increasing familiarity with computer code modularity and debugging assisted the design process. Overall, this project has been successful in terms of completion, but more importantly, providing valuable insights of interconnect effects for the use of chip designers and manufacturers. 35

REFERENCES [1] E. Lelarasmee, A. E. Ruehli, and A. L. Sangiovanni-Vincentelli, The waveform relaxation method for time-domain analysis of large-scale integrated circuits, IEEE Trans. CAD Integr. Circuits Syst., vol. 1, no. 3,pp. 131 145, Jul. 1982. [2] J. White and A. L. Sangiovanni-Vincentelli, Relaxation Techniques for the Simulation of VLSI Circuits. Norwell, MA: Kluwer, 1987. [3] Odabasioglu, M. Celik, and L. T. Pilleggi, "PRIMA: Passive reduced-order interconnect macromodeling algorithm," IEEE Trans. Computer Aided Design, vol. 17, no. 8, pp. 645-653, Aug. 1998 [4] Roy, Sourajeet. Chapter 1: Formulation of Network Equations. 2014 [5] Roy, Sourajeet. Chapter 3: Numerical Integration Techniques of Differential Equations. 2014 [6] Roy, Sourajeet. Chapter 5 High Speed Interconnects. 2014 [7] Roy, Sourajeet. Chapter 5 High Speed Interconnects. 2014 [8] S. Roy, A. Dounavis, and A. Beygi, Longitudinal-Partitioning-Based Waveform Relaxation Algorithm for Efficient Analysis of Distributed Transmission-Line Networks, IEEE Trans. on microwave theory and technique, vol. 60, no. 3, Mar. 2012 [9] Star-HSPICE Manual, Release 2001.2, Synopsis Inc., Santa Clara, CA, 2001. (Change the year) 36

APPENDIX-A STAMP Capacitor Stamp Independent Current Source Stamp 37

Resistor Stamp Voltage Control Current Source Stamp 38

Voltage Control Voltage Source Stamp 39

Independent Current Source Stamp Stamp Inductor 40

APPENDIX- B ABBREVIATIONS HSVS = High-Speed VLSI Simulator KCL = Kirchhoff s Current Law KVL = Kirchhoff s Voltage Law MNA = Modified Nodal Analysis MOR = Model Order Reduction ODE = Ordinary Differential Equations PCB = Printed Circuit Board TEM = Quasi-Transverse Electromagnetic Model VLSI = Very Large Scale Integration HIS = High-Speed Interconnect MOR = Model Order Reduction SFTP = Safe File Transfer Protocol 41

APPENDIX-C BUDGET This project is basically based on software design, so we do not expect any expense so far. 42

APPENDIX-D LAPACK CODE & MATLAB SCRIPT LAPACK code for Time Domain (Coupled HSI d=10cm) 43

clear all; addpath('u:\vlsi\hspicetoolbox') load('u:\vlsi\frequncy domain data\data.mat'); y=loadsig('u:\vlsi\frequncy domain data\failed trail\hpice output\one port hspice netlist l=10cm.ac0'); lssig(y); v_3 = evalsig(y,'v_3'); f = evalsig(y,'hertz'); %plot(f,angle(v_3),'b', F, angle(vr+i*vi), '-.r'); plot(f,abs(v_3),'b', F, abs(vr+i*vi), '-.r'); %title('frequencu Domain Phase Analysis for V3 for d=10cm)'); title('frequencu Domain Simulation for V3 for d=10cm)'); xlabel('time/ s'); ylabel('voltage/ V'); legend('hspice','our solver'); Matlab code for Frequency Domain (single transmission line d=10cm) clear all; addpath('u:\vlsi\hspicetoolbox') load('u:\vlsi\time domain data\matrices\10cm.mat'); y=loadsig('u:\vlsi\time domain data\hspice data\hspice code for Coupled HSI for d=10.tr0'); lssig(y); V_3 = evalsig(y,'v_3'); V_6 = evalsig(y,'v_6'); T = evalsig(y,'time'); %plot(t,abs(v_3),'-.b',t,abs(v3_10cm),'r'); %title('time Domain Simulation for V1 for d=10cm)'); plot(t,abs(v_6),'-.b',t,abs(v6_10cm),'r'); title('time Domain Simulation for V2 for d=10cm)'); xlabel('time/ s'); ylabel('voltage/ V'); legend('hspice','our solver'); Matlab code for Time Domain (Coupled HSI d=10cm) plot(length,time,'r'); title('cpu time vs Interconnet length'); xlabel('length/ cm'); ylabel('time/ s'); Matlab code for Time vs Length 44

APPENDIX E SAMPLE CODES FOR STAMP C++ code for Resister Stamp C++ code for Independent Voltage Source Stamp 45

C++ code for Voltage Control Voltage Source Stamp 46

APPENDIX-E TIMELINE 47