Standardization of Interconnects: Towards an Interconnect Library in VLSI Design

Size: px

Start display at page:

Download "Standardization of Interconnects: Towards an Interconnect Library in VLSI Design"

Emory Boyd
5 years ago
Views:

1 Standardization of Interconnects: Towards an Interconnect Library in VLSI Design Submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY by P. Vani Prasad Supervisor: Prof. Madhav P. Desai DEPARTMENT OF ELECTRICAL ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY - BOMBAY MUMBAI May 2006

3 STANDARDIZATION OF INTERCONNECTS: TOWARDS AN INTERCONNECT LIBRARY IN VLSI DESIGN By P. Vani Prasad A Thesis Submitted to Indian Institute of Technology Bombay in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Approved: Prof. Madhav. P Desai Thesis Advisor DEPARTMENT OF ELECTRICAL ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY - BOMBAY MUMBAI May 2006

5 To My parents

7 Approval Sheet Dissertation entitled Standardization of Interconnects: Towards an Interconnect Library in VLSI Design by Ms. P. Vani Prasad is approved for the degree of Doctor of Philosophy. Examiners Supervisor Chairman Date: Place:

9 Indian Institute of Technology-Bombay, India Certificate of Course Work This is to certify that Ms. P. Vani Prasad was admitted to the candidacy of the Ph.D. degree on 1/1/2001, after successfully completing all the courses required for the Ph.D. programme. The details of the course work are given below: Serial no. Course no. Course name Credits 1. EE721 Hardware Description Languages EE634 Simulation of Devices and Circuits EES801 Ph.D. Seminar CS213 Data Structures and Algorithms Audit 5. SI414 Optimization Audit 6. MA401 Linear Algebra Audit I.I.T. Bombay Dy. Registrar (Academic) Dated:

11 Abstract In this thesis, we address the interconnect problem during system-on-chip (SoC) integration in the deep sub-micron (DSM) regime. An SoC is assembled by interconnecting intellectual property (IP) cores with long global wires, which play a critical role in determining the correctness, performance and reliability of a chip. Traditionally, IP cores are first placed and the global interconnections between the cores are routed. Accurate wire-estimates are not available at this stage of the design flow, hence the entire circuit is extracted and simulated for determining the exact behavior of the SoC. If the SoC does not exhibit the desired behavior in terms of correctness and performance, it undergoes several time-consuming design iterations before finally reaching closure. Our goal is to establish an alternative approach to address such interconnect-associated difficulties. In this approach, we propose a simple library of standardized interconnects wherein the library elements are designed for every routing layer of a chip. Each library element is implemented using a regular pre-mid-post (pmp) buffered architecture and shield lines. We characterize the behavior of these elements in terms of various performance metrics, in the same way that a standard communication cable is characterized. We show that the library elements are near-optimal even as we scale the technology and the models for these elements closely approximate their post-layout behavior. We demonstrate that despite the lower routing density due to the presence of shield lines, the bandwidth per unit channel-width performance of the library elements in the presence of random external disturbances, is superior to that exhibited by unshielded architectures. We also demonstrate that this property of the library elements holds as technology scales. To construct complex connections, we propose a simple cut-and-splice (cas) routing model based on the library elements. We prove that these cas connections are near-optimal at all technology nodes. We also show that it is possible to characterize the behavior of such connections in terms of the behavior of their constituent library elements. Further, we demonstrate that the performance characterization of the library aids global routing at the floorplan stage as well as detailed routing. In conclusion, this thesis is a step towards an interconnect-aware design methodology in the DSM regime. We demonstrate that the standardized approach to designing global interconnects in SoCs can guarantee predictability without compromising circuit performance. i

13 Contents Abstract List of Figures List of Tables i vii ix 1 Introduction Impact of scaling on VLSI interconnects Traditional design flow in the DSM regime Difficulties in the traditional design flow and the existing solutions An overview of the existing solutions Scope of this thesis: A standard solution to tackle the interconnect difficulties Organization of the report Standardization of VLSI Interconnects: An Alternative Solution to the Interconnect Problem The interconnect problem in sub-100nm SoC design Interconnect issues Performance issues Un-predictability in interconnect behavior Expensive wire parasitic extraction Traditional solutions that address interconnect issues Performance optimization Controlling the unpredictability in interconnect behavior Parasitic extraction strategies Emerging solutions to the interconnect problem Comments on the traditional solutions Standardization as a solution to the interconnect design problem The standardization concept Proposed standardized VLSI interconnects Construction of the Standard Interconnect Library The basic library element: a point-to-point connection The target performance of the basic element Circuit-level implementation of the point-to-point elements iii

14 3.2.1 Observations about the optimal solution Approximation of the optimal solution using the pre-mid-post (pmp) buffering strategy Layout of the point-to-point elements Neighborhood configurations Pre-characterization of the neighborhood Performance characterization of the point-to-point elements Delay characterization Delay variation characterization Validation of the delay variation model Throughput characterization Energy characterization Towards a throughput-optimal library element Impact of the standard library elements on VLSI circuit design Experimental set-up Results and observations Analysis of the results Inferences Bandwidth projections for lower technology nodes Conclusions Routing With the Standard Library Elements Construction of complex point-to-point connections Optimal implementation of the complex connections Trends for the optimal solution Approximate spliced implementations Cut-and-splice routing model for complex connections Performance characterization of the complex connections Delay characterization Impact of via parameters on delay model On the near-optimality of the cut-and-splice solution Delay-variation characterization Energy Characterization Impact of the standard library elements on VLSI layout Floorplan model Post-layout behavior of the pmp elements and cas connections Discussion Expectations from the standard library during SoC design Conclusions Conclusions and Future Directions Conclusions Contributions of this thesis Open issues iv

15 5.3 Closing Remarks References 91 Acknowledgement 99 v

17 List of Figures 1.1 A typical SoC Wire scaling in the Soc era [5] Simplified SoC design flow [3] Traditional chip-level design flow in SoCs Proposed interconnect-aware SoC design flow (chip-level design) The basic library element Communication between two modules Sample eye diagram Scenario where Throughput/Delay metric is of importance Interconnect density calculation A communication channel Input for unconstrained optimization Output of the unconstrained optimization Trends for the number of buffers Trends for the unconstrained optimization Wire-width trends The pre-mid-post (pmp) buffered architecture Buffer widths as a function of element-length PMP solution for L i = ml criti + x i A pmp element for interconnecting two logic blocks Capacitive coupling, Courtesy: TSMC Neighborhood definition Trends for the victim capacitance Delay through an M4 pmp element (Total delay and error) vs Wire-length for M4 pmp elements Cascaded stage in unconstrained and pmp solution Box model for the process corners Nominal delay of the M3 element as a function of length, for a load of 1PF Distributed five-rc section π model for a section of the pmp element Delay variation due to crosstalk vs element-length for an M3 element Simulation set-up: noise sources Delay variation of the M3 element across the process corners Simulation set up for validating the delay variation model vii

18 3.29 Validation of delay variation model Throughput vs length plots for an M3 element and different processes Energy comparison for M3 element with 10FF load and TT process Throughput of the M3 element as a function of length, for F = 1.8 and different processes Throughput/delay of the M3 element, TT process point A communication channel connecting two blocks in a chip Regions in a VLSI chip The pmp and un-shielded elements Layout of the pmp and unshielded elements on the M3 layer Staggered buffer arrangement for the unshielded elements First bits of the pseudo-random input sequence applied to the elements Delay variation vs length plots for staggered buffer configurations on an unshielded element with 9λ pitch Delay variation vs length plots for the unshielded staggered elements Bandwidth per unit channel-width vs element-length for the pmp elements (pitch: 18λ), unshielded staggered elements (pitch: 20λ) and unshielded unstaggered elements (pitch: 8λ) Intra-layer coupling in the pmp and unshielded elements View of a single-discontinuity connection View of a single-discontinuity cut-and-splice connection Circuit-level view of a single-discontinuity complex connection Unconstrained architecture for single-discontinuity connections Buffer and wire width trends across the origin Spliced architecture: Scheme Spliced architecture: Scheme Spliced architecture: Scheme Spliced architecture: Scheme Cut-and-splice (cas) routing model Constructing complex connections using the standard library elements Delay vs element-length plots for cas connections Impact of via parameters on the delay through M1-M2 cas connections Nominal delay of an M3-M4 cas connection as a function of length Delay variation of M3-M4 cas connections at different process points Simulation set-up for validating the delay variation model for the cas connections Delay variation model vs simulation results for the M3-M4 cas connections Energy comparison for M1-M2 connections, 1PF load and TT process A typical floorplan and routing regions in a chip PMP and CAS connections in a routing region viii

19 List of Tables 3.1 Buffer and wire-widths for different routing layers, relative to the minimum size defined for 0.18µm technology Wire pitch for different layers (λ = 90nm) Coefficients for the delay model in Equation Technology impact on the pmp delay model of an M4 element Coefficients for the nominal delay model (Equation 3.21), for different process points Coefficients for the delay variation model (Equation 3.28), for different process points for an M3 element Coefficients for the delay variation model (Equation 3.28), TT process and different routing layers Coefficients for the energy model (Equation 3.32), different routing layers Nominal delay and delay variation in ideal element u Delay variation and bandwidth per unit width for pmp elements (e i ) Delay variation and bandwidth per unit width vs length for u i -original elements Nominal delay and bandwidth per unit width of u i elements with P umax pitch The splice penalty table for complex cas connections (delay in ps) Standard models vs extracted results ix

20 Chapter 1 Introduction In this chapter, we examine the ever increasing dominance of interconnects in the deep sub-micron (DSM) era and the resulting problems in the traditional VLSI design flow. We also identify the interconnect issues that we address as part of this thesis. We will briefly discuss the drawbacks of the existing solutions to these issues and motivate the need for an alternative standard solution. 1.1 Impact of scaling on VLSI interconnects Scaling 1 of CMOS technology improves device performance, increases transistor density and reduces power consumption [1]. Even as the on-chip computation is getting faster, interconnect is posing difficulties in circuit behavior at the DSM nodes. Typically, an interconnect introduces capacitive, resistive and inductive parasitic effects [2], which affects the signal integrity in the system and increases the propagation delay. Further, improved process technology has resulted in an increase in the wafer-size and the chip-size, and has led to the evolution of systems-on-chip (SoCs). SoCs are based on pre-designed intellectual property (IP) cores (Figure 1.1), the basic idea being integration of all the components in a board onto a single compact chip [3]. A typical SoC consists of several IP cores that are interconnected using long global wires. Such long wires are a major bottleneck in determining precise circuit behavior due to the pronounced and unpredictable wire parasitic effects at sub-100nm nodes, as we shall see in Section 1.2. In a typical VLSI chip, interconnects are routed on several metal layers and are categorized on the basis of their length as local, semi-global and global interconnects. Local interconnects are used for intra-block communication and are generally on the length scale of µm [4]. Semi-global interconnects have a slightly wider pitch and are used for both short and long routes. These wires are routed on the intermediate layers. The top layers are predominantly used for global communication between IP cores, and for power and ground connections. In Figure 1.2 we show the wires in an SoC which scale in length and those which do not. This figure illustrates that scaling leads to shorter local interconnects and 1 The most cited law of the semiconductor industry has been Moore s law which states that the maximum number of devices on a single chip doubles every 18 months. According to Moore, the size of the device shrinks by a factor of 0.7 and his law, surprisingly seems to be holding true over the years! 1

21 BRIDGE Figure 1.1: A typical SoC longer global interconnects. Typically, the number of global interconnects is much smaller than the number of local interconnects in a chip. However, the performance of the global interconnects plays a dominant role in determining the behavior of the chip. In the next section we will see why the traditional design flow cannot cope with such interconnect-associated problems. UNSCALED WIRE UNSCALED WIRE SCALED WIRE SCALED WIRE SCALING Figure 1.2: Wire scaling in the Soc era [5] 1.2 Traditional design flow in the DSM regime We show a simplified version of the SoC design flow in Figure 1.3. From the specifications, a detailed architecture design is obtained, followed by mapping to platform -where the soft, firm and hard IP cores are assembled, and chip-level design. The final step is to release the design for manufacture. We show the top-down chip level design in Figure 1.4, which starts with an RTL description that includes the platform HDL, user-defined HDL and the IP core-hdl. The RTL netlist is then synthesized to obtain a gate-level netlist. This netlist undergoes floorplanning 2

22 SPECIFICATIONS ARCHITECTURE DESIGN CORE LIBRARY MAP TO PLATFORM EMBEDDED SOFTWARE CHIP LEVEL DESIGN Figure 1.3: Simplified SoC design flow [3] wherein the approximate location of each module is determined [6]. Typical cost functions used are area, signal delays and signal integrity. The next step is placement, during which the best position of each module is determined, and the shape and terminals of the module are fixed. The wire-length and area are used as the cost functions for this stage. Once the blocks are placed, the global router finds a rough path to connect two points on the blocks (cores) such that congestion and delay are minimized. The detailed router finishes the final connections. It is well-established that in DSM designs, apart from the gates, interconnects are key factors in determining circuit performance. However, wire parasitics are unknown until a full-chip post-layout extraction is performed, wherein the parasitics from the long global interconnects and their neighborhood are estimated. Note that local interconnects do not affect circuit behavior significantly. The next step is to verify the extracted netlist. If the desired circuit behavior both in terms of correctness and performance is not achieved, the circuit is re-designed at the RTL, synthesis, or placement and routing (P&R) stages. 1.3 Difficulties in the traditional design flow and the existing solutions Exact circuit behavior cannot be known until we extract the entire circuit with wire parasitics from the layout. This implies that design errors can be identified only at a late stage in the design cycle. The logic gates are well-characterized and their extracted behavior can be captured at the synthesis stage itself. Hence these can be directly chosen from the gate-library during logic synthesis to translate the functional description. However, no such pre-characterized library presently exists for global interconnects, which results in difficulties in the SoC design flow. Our goal is to achieve an interconnect-aware design methodology so that the interconnect difficulties can be eliminated. Towards this end, in this thesis, 3

23 PLATFORM HDL CORE LIBRARY HDL USER DEFINED HDL RTL RTL NETLIST SYNTHESIS CIRCUIT NETLIST FLOORPLANNING FLOORPLAN PLACEMENT PLACED NETLIST ROUTING LAYOUT EXTRACTED NETLIST EXTRACTION LAYOUT VERIFIED? NO YES Figure 1.4: Traditional chip-level design flow in SoCs we address the interconnect problem in SoC integration. We identify three major interconnect difficulties which we will tackle as part of this thesis: unpredictable interconnect behavior, expensive parasitic extraction and unoptimized performance. The interconnect design problem is to control these difficulties so that the circuit behaves correctly, and is performance-driven and predictable. We will first briefly present an overview of the traditional solutions to address the interconnect difficulties, and then propose an alternative standardization-based approach An overview of the existing solutions In this section, we present an overview of the existing solutions to the interconnect difficulties. We will discuss these techniques in detail in Chapter 2. Uncertainty in estimating interconnect behavior makes it hard to identify design errors 4

24 at the post-layout phase. Once the errors are identified, they can be tackled using errorfixing solutions at various stages of the design flow. Generally, buffers are inserted and the wire-width is adjusted at the post-layout stage for interconnect delay minimization [7] [10], crosstalk mitigation [11] and power minimization [12]. Other techniques include rip-up and re-route of wires [13], timing driven placement [14], timing-driven floorplanning [15], physical re-synthesis [16] and state-machine recoding at the RTL stage. If the design errors still persist, the original specifications can be weakened. We observe that the higher in the design flow that we need to use these post-layout error-fixing techniques, the longer is the resulting loop in the flow. Further, even after using such techniques, there is no guarantee that the errors can be eliminated. Unknown interconnect behavior leads to excessive design iterations before closure is reached. Curtailing these iterations is a fundamental problem and researchers have tried to solve this by using predictive approaches and by planning for interconnect performance. Stochastic wire-length predictors such as the ones described in [17] [21] can be used up to the floorplan stage for predicting the interconnect-length. However, despite knowing the wirelength, it is hard to determine interconnect behavior unless the entire wire neighborhood is characterized and accurate parasitic estimates are available. Hence interconnects are planned at an early stage of the design flow, such as at the floorplan stage, by incorporating interconnect models [22], [23]. Usually, most of the existing solutions to the interconnect difficulties are based on the buffer insertion strategy [24], hence there is a need to plan for the buffer space too, before the circuit is laid out. Buffer block planning is an active area of research and it is suggested in [25] [27] that a clustered buffer placement be performed during the floorplanning stage itself. From our discussion on the existing solutions, we observe that adopting post-layout strategies to tackle the interconnect problem leads to multiple design iterations before the desired circuit behavior is achieved. Hence it is essential to obtain accurate wire estimates a priori, so that the design converges faster. 1.4 Scope of this thesis: A standard solution to tackle the interconnect difficulties In this thesis we propose an alternative class of solution to tackle the interconnect difficulties in the sub-100nm regime. This solution is based on the use of a library of standardized interconnects. We construct a compact library of parameterized point-to-point interconnects corresponding to each routing layer in a chip. We design two kinds of library elements - delay-optimal and throughput-optimal elements. Ideally, we would like to incorporate the characteristics of a standard communication cable such as quality, predictability and easy usage, into the library elements. Towards achieving this target, we first implement the library elements using a simple and regular pre-mid-post (pmp) buffered architecture [28] and shield lines. We characterize the behavior of these elements in terms of performance metrics such as delay, delay variation in the presence of noise, energy and area. The geometry of the standard library element is chosen so that its electrical properties can be predicted without extracting the entire circuit. Further, we prove the following important property 5

25 for the pmp element: the delay of the point-to-point library element for a given length is at most two inverter delays greater than the best delay for a point-to-point connection for the same length, assuming similar interconnect geometries. We term the delay gap between the two kinds of point-to-point connections as the standardization penalty. We show that this property of the library elements holds even as technology is scaled. Further, we demonstrate that despite the lower routing density due to the presence of shield lines, the pmp elements exhibit superior bandwidth per unit channel-width performance when compared with unshielded architectures. We also show that this bandwidth property is valid as technology is scaled. Thus, the standard library is a collection of interconnects that can be treated as design components which are near-optimal, predictable and regular, and incur minimal usage costs. These design components can therefore be viewed as being similar to standard communication cables. Next, we use the library elements to construct complex connections. We propose a simple cut-and-splice (cas) routing model for making such connections [29] and observe that the behavior of these connections can be expressed in terms of the behavior of their constituent elements. We justify this by demonstrating that the estimated behavior of the connections is close to their post-layout behavior. We also prove that the penalty incurred by the complex connections is the sum of a constant penalty which is an inverter delay at every splice, and the standardization penalty. We demonstrate that this remains true as technology is scaled. RTL RTL NETLIST SYNTHESIS CIRCUIT NETLIST STANDARD INTERCONNECT LIBRARY FLOORPLANNING FLOORPLAN PLACEMENT ROUTING PLACED NETLIST LAYOUT LAYOUT VERIFIED? NO YES Figure 1.5: Proposed interconnect-aware SoC design flow (chip-level design) 6

26 Such a standard library can be easily incorporated in the existing CAD flow. In Figure 1.5, we show the proposed interconnect-aware design flow wherein the standard elements can be used at different stages of the design flow. Using this methodology, we can tackle the three interconnect difficulties that we listed in the previous section. In this thesis, we will show that the first interconnect difficulty (unpredictability in performance) can be controlled, while the other difficulties (full-chip extraction and multiple optimizations) can be eliminated from the design flow. Further, we show that standardization aids global routing at the floorplan stage as well as detailed routing in a chip. We will conclude this thesis by showing that we can alleviate the interconnect difficulties in SoC design by using the library of standardized interconnects without compromising system performance. 1.5 Organization of the report This thesis is composed of five chapters. In Chapter 2 we will discuss the existing solutions to the interconnect problem and suggest an alternative standardization approach. In Chapter 3 we shall look at the construction of the standard interconnect library. We will construct complex connections using the library elements and characterize these connections in Chapter 4. We will also study the impact of the library elements on VLSI layout. Finally, we will discuss the conclusions and the future directions of this work in Chapter 5. 7

28 Chapter 2 Standardization of VLSI Interconnects: An Alternative Solution to the Interconnect Problem In this chapter, we will look at the interconnect problem in the DSM era and also study the solutions currently in use. We will then suggest an alternative standardization approach to tackle this problem. 2.1 The interconnect problem in sub-100nm SoC design In the sub-100nm regime, interconnects tend to dominate circuit correctness, performance and reliability. Hence the exact behavior of the circuits at these nodes can be known only after the entire circuit including the interconnects, is extracted and simulated. If the circuit does not exhibit the desired behavior, it is re-designed at different stages of the design flow. This results in time-consuming design iterations. Further, post-layout optimization techniques have to be adopted if the circuit performance is un-optimized. Hence the interconnect problem in SoC design poses three major difficulties: 1. performance optimization 2. the need for accurate parasitic extraction to calculate expected circuit behavior 3. unpredictability and the need of iteration for design convergence To ensure correct, predictable and optimized circuit operation, we will have to effectively address these difficulties. This thesis is a step towards such an interconnect aware design methodology by using the technique of standardization. During the course of this thesis, we will show that the use of a library of standardized interconnects can eliminate the multiple optimization and extraction steps from the SoC design flow, and reduce the number of design iterations. In the rest of this chapter, we will first discuss the interconnect difficulties and 9

29 look at the traditional techniques used to address these. We will then discuss the proposed standardization approach as an alternative to the traditional solutions. 2.2 Interconnect issues In this section we will look at the main interconnect design and reliability issues and see why it is difficult to handle them during the design of SoCs Performance issues As technology scales, the length of the long interconnects tends to increase. Further, the wire-to-wire spacing reduces and the wire aspect ratio increases. This results in the following performance issues: 1. Delay: Delay is an important performance metric for SoCs where long interconnects are used for connecting IP cores. This is because interconnect delay does not scale with gate delay as we scale technology. 2. Crosstalk: Both capacitive and inductive crosstalk affect the signal activity in DSM circuits. Capacitive coupling can result in delay variation and timing errors. It can also lead to glitches, resulting in logical errors in the circuit. Line inductance can cause reflections and ringing [30], and is specially important for power and clock lines. 3. Power: The interconnect capacitance is significant and will have to be considered while calculating the total switching power of the circuit. 4. Variations: Variations can be of different types: between wafers, between dies, within a die and temporal. The wafer-to-wafer and die-to-die variations affect the yield, while the intra-die variations affect the yield as well as the efficiency and performance of the chip [31], [32]. Environmental changes such as power supply fluctuations and capacitive coupling lead to delay variation in the on-chip wires. Such environmental and process variations impact the actual behavior of a chip, which can differ drastically from the estimated behavior. 5. Power supply integrity: Increasing IR drop and L di dt noise in the power supply network makes it difficult to preserve signal integrity in circuits [33], [34]. 6. Reliability: As technology scales, interconnect reliability due to electromigration (EM) and thermal effects is becoming a serious issue [35]. Increased current densities cause electromigration, which is a particularly severe problem for contacts [36] Un-predictability in interconnect behavior Interconnect behavior cannot be accurately predicted unless the circuit is extracted and simulated. Traditionally, wire-load models are used to estimate the wire characteristics at the early stages of the design flow such as during synthesis. For a wire with a given fanout, the wire-load model specifies the wire capacitance, resistance and area [37]. These models 10

30 suffer from inaccuracies, mainly due to three reasons: unknown wire geometry, lack of good extraction strategies and environmental (neighborhood) effects. To predict wire-lengths, several stochastic estimators [17] [21] are presently available which can be used up to the floorplan stage. However, even if the wire-length is known, precise extraction of the wire RLC parameters and the analysis of the representative network is difficult and time-consuming. Furthermore, the environment also affects interconnect behavior in three ways 1. due to coupling through the supply and ground network, and substrate coupling 2. due to capacitive crosstalk, which depends on the switching activity in the neighborhood [38] 3. due to the inductive coupling and uncertainty, which depends on the return path [39] Hence the actual behavior of the circuit is known only after the entire circuit is extracted, and the design may have to be iterated several times in order to meet the specifications Expensive wire parasitic extraction Wire parasitic extraction is one of the most expensive steps in the design flow. The extracted network is used to verify the functionality and performance of the circuit by means of simulations for timing, signal integrity, reliability, and power and clock grids [40]. Some tools can extract the interconnect resistance as well as the area, fringe and coupling capacitance (distributed model) [41]. However, inductance calculation of an on-chip interconnect is a fundamental problem and very few tools such as the ones described in [42] and [43] support inductance extraction. Further, even after the RLC network is extracted, it is computationally intensive for the circuit simulators to deal with their complex representation. 2.3 Traditional solutions that address interconnect issues In this section we enumerate the existing solutions which address the interconnect difficulties Performance optimization Generally, the solutions for the interconnect performance optimization problem are based on insertion of buffers and sizing of the wire. 1. Delay minimization strategies: Cascaded buffer insertion [44] is widely adopted for minimizing the delay of a point-to-point connection which is modeled as a capacitive load. For simultaneously constructing and buffering an RC tree, heuristics such as the one described in [8] have been suggested. However, Van Ginneken s algorithm introduces at most one buffer per every net. Most works that have appeared thereafter, including Alpert et al, modified this approach to arrive at an optimal solution, and proposed to cut the wire into segments [9] to insert a pre-calculated number of buffers. Wire-sizing, both discrete as well as continuous, is another technique which is widely 11

31 studied [45] [47]. According to this, the wire-width is changed along its length such that the delay is optimized. Simultaneous buffer insertion and wire-sizing has also been suggested for minimizing delay [48], [10]. 2. Crosstalk mitigation techniques: Capacitive coupling can be reduced by increasing the wire-to-wire spacing and by inserting shield lines [49]. The shield lines also provide a controlled current return path and make inductance calculation easier. Further, capacitive crosstalk-effects can be minimized by using a staggered configuration of buffers on adjacent lines [11]. Other techniques which reduce the effects of crosstalk are low-swing differential signaling [50], wire-swizzling [51] and bus-encoding [38]. 3. Power minimization: Usually, the delay and crosstalk mitigation techniques involve buffer insertion. However, the increasing number of buffers leads to a power-delay trade-off in the circuits. To minimize the power in the global interconnects, a repeater insertion strategy is suggested in [12], according to which the buffer sizes are selected such that a low power operation is achieved while paying a low latency penalty Controlling the unpredictability in interconnect behavior Unpredictable interconnect behavior can lead to design errors. Two types of solutions address this difficulty: design error fixing solutions and planning solutions. 1. Design error-fixing solutions: To fix design errors that are identified at the postlayout verification stage, several error-fixing solutions are cited in literature, that are used at different stages of the design flow. Buffer insertion is widely used to meet timing constraints [7] [10] as well as for crosstalk reduction [11] and power minimization [12]. A rip-up and re-route strategy is used for routing those nets which were previously un-routable due to lack of routing resources [13]. Techniques such as timing-driven placement [14] and simultaneous gate-sizing and placement with delay constraints [52] are adopted at the placement stage for meeting timing requirements. Floorplanning techniques such as the one suggested in [15] ensure that timing and area constraints are met during circuit design. At a higher level, a physical synthesis methodology which incorporates interconnect delay models, has been proposed in [53] and [16]. Architectural solutions have also been proposed, such as bus encoding [38] for minimizing crosstalk effects, and delayinsensitive circuits [54] for making the designs functionally insensitive to latency of long wires. At the RTL stage, retiming [55], state machine recoding and pipelining [56] are used to meet the timing, area and throughput requirements of the circuits. If the design errors still persist, the original specifications are weakened. 2. Predictive solutions: To minimize design errors, interconnect behavior should be predictable without going through the physical design stages. This can be achieved by accurately estimating the wire geometry, its RLC parameters and the impact of the wire neighborhood. Accurate wire models can then be incorporated in the early design 12

32 stages itself. Given the geometry of a wire and its neighborhood, several extraction tools precisely extract the wire-parasitics. Planning for the interconnect effects in the early design stages can alleviate the unpredictability problem. Interconnect planning involves incorporating interconnect models at different stages of the design flow. In order to improve predictability at the layout level, a layout methodology was proposed by Khatri et al [49] who suggest that every signal line in a chip should be shielded by power and ground lines on both sides. This kind of layout fabric characterizes the wire neighborhood and eliminates the uncertainty in estimating wire capacitances. Integrated floorplanning and interconnect planning has been suggested in [22] to include wire length as the cost function during the floorplanning stage. Also, an interconnect-centric design style has been suggested in [23] to incorporate interconnect performance estimation models at the floorplan stage. Statistical approaches such as the one suggested in [57] estimate the crosstalk noise and in [58], crosstalk delay as well as noise are estimated using input data correlation. Further, to deal with the increasing number of on-chip buffers, a new class of solutions has been suggested, which pre-allocates regions of the chip for buffer placement. Such a buffer block planning scheme avoids layout changes and the placement of the buffers is such that timing constraints are satisfied [25]. Several extensions to this buffer block planning idea have been proposed recently in [26] and [27]. Ongoing research is directed towards fitting the buffer planning idea in the interconnect-centric flow [23] Parasitic extraction strategies Accuracy and speed are the two most important criteria for a good parasitic extractor. Several three-dimensional extractors extract capacitance accurately [41], [42]. Further, there are mainly four techniques for speeding up the parasitic extraction process. First, instead of the traditional full-chip extraction of the circuit, a region-based extraction is performed wherein the most critical part of the chip is identified and a complete parasitic representation is derived for the logic and interconnect present there. Such a critical area extraction reduces the time-consumed as well as the circuit complexity [59]. Second, parallelization techniques can be adopted for analyzing the layout, and this can reduce the computational time and storage requirements [60]. Third, in order to reduce the complexity of the extracted network, model order reduction techniques such as the one described in [61] are used. Fourth, the dense layout fabric suggested in [49] can be used to pre-characterize the entire neighborhood of the signal line Emerging solutions to the interconnect problem Researchers are exploring options such as 3D ICs to tackle the interconnect problem, where buffering can be built into wiring planes [62]. Optical interconnects are another alternative to expensive metal wires [63], but the hardware cost associated with the optical transmitter and receiver is exorbitant. Current-mode signaling has also been proposed recently [64], [65], but concrete evidence is still not available to highlight its benefits. Despite 13

33 the emergence of such solutions, most of the present tools still use the modified versions of the traditional solutions [66] Comments on the traditional solutions The solutions to the interconnect performance optimization problem are post-layout strategies. There are two difficulties associated with such schemes. First, there is no guarantee that enough space is available for introducing changes in the layout. Second, even if layout changes are possible, each such change at the placement and routing, floorplan or synthesis stage needs to be back-extracted and simulated during verification. This is timeconsuming, since every long global wire and its neighborhood are critical for determining circuit behavior. Hence it is essential to obtain accurate wire estimates a priori so that the design converges faster. The traditional solutions are not pre-planned and construct circuits by correcting design errors. Most importantly, each interconnect will need to be individually optimized every time a design change is introduced. Unlike the traditional ad-hoc solutions, we suggest an alternative correct-by-construction approach. This approach will involve the use of standardized interconnects as a solution to the interconnect problem. Such a solution will be acceptable if it provides predictability without compromising circuit performance. 2.4 Standardization as a solution to the interconnect design problem In this section, we introduce the notion of standardization as a means of addressing the interconnect problem and see why this concept can be considered an alternative to the traditional methods. We will first understand the meaning of standardization before using this concept at the VLSI scale for building standard interconnects The standardization concept Consider the well-known example of standardization in real life, a communication cable. Standard cables are constructed in a specific manner, used in a specific manner, they exhibit a specified behavior which is close to the actual behavior, and have a specific cost. Further, by using standard cables, we eliminate the need to construct customized cables to conform to different performance criteria. Hence if we want to establish communication between two modules in a network, we can select a pre-characterized cable as per the requirements of the network. Multiple standard cables can be selectively used without introducing changes in the system in which these are used. In other words, quality, predictability and easy usage are the characteristics of a standard cable. We intend to bring in this kind of predictability, performance and usage for VLSI interconnects. We will treat interconnects as design components that can be selectively used without affecting the high level operations. However, cables and interconnects have many differences and achieving on-chip interconnect standardization is not easy. First, cables and interconnects work on altogether different scales and interconnects will have to be dealt at very small dimensions. Performance issues 14

34 such as delay, throughput, energy and area are more pronounced in interconnects and further, a larger number of repeaters are needed to control these problems. Second, the method of constructing the interconnects is different from the way the cables are constructed. The VLSI interconnects are characterized by R, C and L parameters and incur more losses. The third issue which is the most worrisome in case of interconnects is that of interference (crosstalk) from the signal lines which are in close proximity, unlike the cables which are well-shielded. Fourth, if the interconnect is to be treated as a design component, its layout should be characterized. Fifth, we need a routing model to use the standard interconnect components for constructing complex connections. Finally, we would expect the interconnects to be as predictable and performance-driven as the cables and their impact on the CAD flow should be quantified. Hence we believe that bringing in such a high degree of standardization to VLSI interconnects is in itself a significant contribution to CAD research Proposed standardized VLSI interconnects We propose to use a library of standardized interconnects to tackle the interconnect problem, wherein the interconnects behave in a similar way as a standard cable. This library can be viewed as a collection of design components. However, there are three issues which need to be settled before this library can be put to use: complexity of the library, characteristics of the library elements and usage of the library in the design flow. Hence our approach in this thesis will be as follows: 1. Construct a compact library of standardized interconnects 2. Provide a regular and near-optimal implementation for the library element, choose the geometry of the element so that its electrical characteristics can be predicted without having to extract the entire circuit 3. Characterize the behavior of the element in terms of various performance metrics and selectively use the elements for either a latency-aware design or a throughput-aware design, without affecting the high level operations 4. Use the library elements to route and characterize complex connections in a chip As a first step towards achieving this, we identify the simplest possible building block of the library that can be standardized. We select uni-directional point-to-point connections, that are one of the most important class of connections in modern and futuristic VLSI circuits as the basic building blocks. We view these elements as standard cables and design them for every routing layer of the chip. Ideally, we expect the standard elements to exhibit three properties: predictability, near-optimality which scales with technology, and regularity. Also, standardization should be achieved for the architecture of the interconnect, its performance, its surroundings that causes interference in the signal activity, as well as for the method used to construct the routes. Similar to the cables, the standard interconnects are to be designed for different performance metrics and should be selectively used, depending on the targeted application. 15

35 Towards attaining this target, we implement the library elements using a regular buffered architecture with shielding. We design these elements for minimum delay and maximum throughput, and characterize the delay, delay variation, energy and layout requirements of the elements. Perhaps one of the most important properties that the library elements exhibit is near-optimality, which holds even as technology is scaled. Further, despite the area overhead due to the shield lines, system performance in terms of the bandwidth per unit channel-width is not compromised. We also demonstrate that the bandwidth property of the library elements is valid as we scale the technology. To construct complex connections, we provide a simple routing model which uses the standard library elements. We prove that such complex connections are near-optimal and this property remains true as technology is scaled. Further, the behavior of these connections is characterized in terms of the behavior of the library elements. We also demonstrate that it is possible to incorporate the standard interconnect models in the present CAD flow. The global route interconnecting two modules in a given chip-floorplan can be constructed and fully characterized using the library elements. Currently, our routing strategy demands buffer space at the locations pre-decided by the standard solution and takes a detour around any blockage encountered while routing point-to-point connections. The library elements along with the routing model are near-optimal and predictable, hence the iterative extraction and optimization steps in the design flow can be eliminated. Usually, most of the existing solutions to the interconnect difficulties are based on the buffer insertion strategy, hence there is a need to plan for the buffer space, even before the circuit is laid out. As echoed in [24], we show in Section 4.4 that the designs at futuristic technology nodes such as the 70nm, tend to be buffer limited, which necessitates pre-planning of buffers. We believe that the predictable yet performance-driven strategy of interconnect standardization is an ideal solution to deal with such a scenario of on-chip buffer explosion. A few iterations may have to be expended at the placement and detailed routing stage to meet the design requirements. However, unlike the long and iterative traditional flow, we will be able to control the design iterations while guaranteeing a close-to-optimal performance. Thus, standardized interconnects can be the basis of an approach to the design of correct, predictable and optimized VLSI interconnects in SoCs. 16

Lecture #2 Solving the Interconnect Problems in VLSI

Lecture #2 Solving the Interconnect Problems in VLSI C.P. Ravikumar IIT Madras - C.P. Ravikumar 1 Interconnect Problems Interconnect delay has become more important than gate delays after 130nm technology