The Pennsylvania State University The Graduate School A RELIABLE DESIGN FLOW FOR PLATFORM FPGAS

Size: px

Start display at page:

Download "The Pennsylvania State University The Graduate School A RELIABLE DESIGN FLOW FOR PLATFORM FPGAS"

Ginger Merritt
5 years ago
Views:

1 The Pennsylvania State University The Graduate School A RELIABLE DESIGN FLOW FOR PLATFORM FPGAS A Dissertation in Computer Science and Engineering by Prasanth Mangalagiri c 2010 Prasanth Mangalagiri Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2010

2 The thesis of Prasanth Mangalagiri was reviewed and approved by the following: Vijaykrishnan Narayanan Professor of Computer Science & Engineering Dissertation Co-Advisor, Co-Chair of the Committee Yuan Xie Associate Professor of Computer Science & Engineering Dissertation Co-Advisor, Co-Chair of the Committee Mary Jane Irwin Professor of Computer Science & Engineering Robert E. Noll Professor Raj Acharya Professor of Computer Science & Engineering Department Head, Computer Science & Engineering Vittal Prabhu Professor of Industrial Engineering Signatures are on file in the Graduate School.

3 Abstract Aggressive technology scaling over the years has led to increased levels of integration and heterogeneity in the design fabric of Field Programmable Gate Arrays (FPGAs). Platform FPGAs today have evolved from mere prototyping devices to powerful domain-specific reconfigurable processors. Traditionally FPGA design flows have been designed to optimize the resulting design for area, performance, and power consumption. Such deterministic optimization techniques are oblivious to the changes in the device characteristics due to variations during the manufacturing process and their subsequent degradation in time due to various operational stress phenomenon. Additionally, as the device feature sizes shrink the impact of operation conditions such as temperature and supply voltage on the lifetime degradation of components increases exponentially. Consequently, the resulting designs are sub-optimal in performance and result in a low mean time to failure MTTF of the target FPGA platform. In this work we address the impact of various aging based failure mechanisms, and process variations on the reliability, performance, and power consumption of the resulting design. We present a tool flow that models the device degradation characteristics and incorporates heuristics based on such an analysis into various key stages of the FPGA design flow. We also studied the impact of process variations on various routing elements of an FPGA and developed a statistically intelligent routing algorithm SIRA to improve the timing and power yields of the target design. We then studied the temperature variations both across and with-in designs by developing a thermal estimation tool Tprof. The thermal map generated by Tprof was used to analyze the impact of temperature variations on the lifetime reliability of platform FPGAs. We then explored the impact of voltage variations in dual-vdd based FPGA architectures on lifetime reliability and power consumption of platform FPGAs. The insights acquired by analyzing the device degradation and lifetime characteristics were used to incorporate reliability and degradation awareness into various stages of the design flow. iii

4 We plan to extend the reliability frame work to study the impact of process variations on thermal and voltage dependent intrinsic failure mechanisms. We also plan to validate the reliability framework by integrating it into FANTOM, an Algorithm Architecture Co-design framework. iv

5 Table of Contents List of Figures List of Tables Acknowledgments vii ix xi Chapter 1 Introduction Overview of Reliability Challenges to FPGA Reliability Impact of Process Variations Contributions Organization Chapter 2 Dynamic Reliability Modeling Introduction FPGA Architectures Heterogenous FPGA Dual-VDD FPGA Architecture Modeling Intrinsic Failures Electromigration Negative Bias Temperature Instability Time-Dependent Dielectric Breakdown Hot Carrier Instability DRM Framework Thermal Simulation v

6 2.4.2 Static Probability Estimation Validation of DRM Framework Analysis of Thermal Dependent Failure Mechanisms Electromigration Negative Bias Temperature Instability Time-Dependent Dielectric Breakdown Chapter 3 Reliability Enhancement Techniques Introduction Reliability Aware Design Flow Voltage Assignment and Reliability Evaluation Reliability Aware Placement Algorithm Reliability Aware Routing Algorithm Experimental Setup and Results Chapter 4 Reliability Analysis Under Process Variations and Aging Introduction Related Work Variation Modeling Timing Yield of a Route Leakage Yield of a Route Routing Architecture Evaluation Statistically Intelligent Routing Algorithm(SIRA) Experimental Setup and Results Impact of Process Variation on Aging Chapter 5 Conclusions & Future Directions Conclusions Future Directions Bibliography 91 vi

7 List of Figures 2.1 Snapshot of Xilinx Virtex-5 FPGA architecture Dual VDD FPGA Architecture CLB Detailed Routing Architecture FPGA Routing Wire Segments Thermal Profile of an architectural template of Virtex-5 XC5VLX50Tff Impact of Temperature and switching activity on Electromigration of single (len = 1) wires Impact of Voltage on MTTF due to Electromigration Delay Degradation Due to NBTI Delay Degradation of Input Multiplexors Due to NBTI (t=10yrs) Impact of temperature on TDDB Impact of voltage on TDDB V th Degradation due to CHC Delay Degradation in Routing MUXs with V th DRM Framework Thermal Profile of PWRD Thermal Validation Setup Variation in Ring Oscillator Frequency with Temperature DRM Thermal Validation Thermal Profile of TripleDes MTTF due to Electromigration (a) and Current Density (b) Characteristics for TripleDes Failures in wires due to EM NBTI Delay Degradation using PTM NBTI Delay Degradation using CPTM MTTF of LUTs due to TDDB (PTM) MTTF of LUTs due to TDDB (CPTM) Reliability Aware Design Flow Temperature Variations in rs decoder vii

8 3.3 Pseudo-code Voltage Assignment Algorithm V dd Assignments for IIR Improvement in MTTF of LUTs Improvement in MTTF of Interconnects (due to EM) Improvement in Delay Degradation (due to NBTI and CHC) Delay distribution of different segments in FPGA FPGA Routing Architecture Leakage distribution of different segments in FPGA Variation Characteristics of Circuit Elements Leakage distribution of LONG wire with completely random and completely correlated variations FPGA Leakage distribution of different switch block types in FPGA Timing, leakage and total chip yield improvement/degradation using a Timing yield Aware Routing approach (TAR) Timing yield improvement using TAR measured w.r.t T nom of TAR Frequency degradation using TAR Total chip yield improvement using SIRA Threshold Voltage Shift due to NBTI Rise Delay of 32x1 Multiplexor in the presence of Vth variations and NBTI Aging Timing Yield Comparison Under Process Variations and NBTI (%) Timing Yield Improvements obtained using SIRA viii

9 List of Tables 2.1 Technology Process Parameters Technology Parameters Benchmark Characteristics Power Consumption Improvement using MVdd ix

10 Nomenclature CHC CMOS FPGA HSPICE NBTI SIRA TDDB Channel Hot Carrier Complementary Metal-Oxide-Semiconductor Field Programmable Gate Array H-Simulation Program With Integrated Circuit Emphasis Negative Bias Temperature Instability Statistically Intelligent Routing Algorithm Time Dependent Di-electric Breakdown x

11 Acknowledgments I would like to thank my doctoral advisors Dr.Vijaykrishanan and Dr.Yuan Xie, for always challenging and inspiring me to explore new ideas in research and life. Their enthusiasm in teaching, openness to accept new challenges and a ravenous work ethic, have been a great source of inspiration and guidance through out my graduate life at Penn State. I have had a great mentor in Dr.V.Kamakoti my under graduate advisor, who ignited my penchant for research and the fun in problem solving. In the past four years i have had the pleasure to work with a great set of individuals in my research group - Suresh, Sungmin, Karthik, Jungsub, Ramki, Kevin, Mike, Srinidhi, Ahmed, Aman and Priya. It has been a great learning experience and helped me gain some of the key insights in my research in Microsystems Design Laboratory (MDL). I would also like to thank my managers at Intel Sridhar Boinapally, and Jose Del Cano for giving me the opportunity to test my self in an industrial environment during my co-op. I am extremely grateful for having the opportunity to be taught and be honed by some of the great teachers through out my academic life. Finally, i would like to thank my elder brother Praveen for being a role model in life that i can always look up to. Last but not least, i am grateful to have Mahrya in my life, for her constant support and companionship in my journey. xi

12 Dedication To my beloved parents Surya Kala and Chandra Mouli and their unconditional love. xii

13 Chapter 1 Introduction Technological advances in various facets of semi-conductor manufacturing industry have contributed towards the production of devices with smaller feature sizes and increasing levels of on-chip integration. This trend, most commonly observed in application specific integrated circuits (ASICs), has also been seen in reconfigurable platforms, especially Field Programmable Gate Arrays (FPGAs). Today FPGAs have evolved from being used as prototyping programmable arrays to highspeed heterogenous processors integrated into domains such as military, aerospace, medicine etc [1] [2]. 1.1 Overview of Reliability In spite of the advances made in the semi-conductor manufacturing process, scaling device sizes into the nano-meter regime poses new challenges to device reliability.

14 2 Devices are increasingly vulnerable to errors during the manufacturing process, transient faults, permanent faults, and lifetime reliability concerns. Design For Reliability (DFR)[3] has become a key step in the design process. With out reliability modeling during component design stage, or reliability analysis in the circuit design flow, designers are forced to resort to an overly conservative design style to achieve the target product yield. Using methods such as guard-banding [?] reduces the chips performance and runs the risk of early failures. By doing a thorough reliability assessments at the design stage a designer can avoid problems in the past may only have been surfaced during the chip burn-in process. Considering the amount of resources and costs of a re-spin and it s impact on time to market, it is extremely valuable to have a design time reliability evaluation methodology. 1.2 Challenges to FPGA Reliability Over the last decade, industrial research has been largely focussed on improving performance and reducing the power consumption. FPGA design flows today support optimizations to meet designer s performance and power consumption targets. Techniques like Dynamic Voltage Scaling (DVS) [4], threshold voltage scaling, etc have been employed in FPGA architectures to reduce dynamic and leakage power consumption. Since power consumption (both dynamic and static power) of a design is strongly dependent on the choice of supply voltage, various low power

15 3 techniques such as supply and threshold voltage scaling, programmable V dd /V th etc, have been successfully employed to reduce the power consumption of a design while meeting performance targets by leading commercial FPGA manufacturers [5]. Over the years, several dual-v dd and V dd programmability architectures have been proposed to lower FPGA power consumption [6] [7]. A programable dual- V dd architecture, and V DD assignments algorithms effective in reducing power consumption have been presented in [8]. V dd programmability has also been applied to reduce interconnect power consumption in [9] [10]. However, supply and threshold voltages are not scaling at the same rate as device feature size due to performance and leakage power concerns [11]. This results in increased on-chip power densities which in turn manifest as high on-chip temperatures. Feature size scaling also leads to increased gate leakage power consumption, which in turn increases on-chip temperatures. Such circular dependency between on-chip power densities and leakage power consumption leads to very high on-chip temperatures. In platform FPGAs today, there are customized hard-blocks such as Digital Signal Processing (DSP) blocks, high speed multi giga-bit transceivers (MGTs), Processor blocks such as Power- PC, to facilitate the use of FPGAs in various domain-specific applications. Such a heterogeneity in the FPGA fabric results not only in higher temperatures but also a variation in the power density of FPGA components. Such a disparity in power density results in a variation of on-chip temperatures, creating thermal

16 4 hotspots. An intra-die temperature variation of up to 20 C was observed on a Virtex-4 FPGA in [12]. It has been demonstrated that the Mean Time To Failure (MTTF), due to various intrinsic reliability factors ( such as, Time Dependent Dielectric Breakdown (TDDB), Electromigration [11], and NBTI [13][14]) vary exponentially with on-chip temperatures. In addition to temperature and supply voltage, the lifetime reliability of a device is also dependent on design specific factors like switching activity, and electric field across the gate oxide. Hence, there is a need to tackle the reliability concerns by introduction of design-for-reliability techniques in the early stages of the design cycle. However, due to major technology changes introduced in a small period of time the development of such techniques is becoming increasingly challenging [15]. Therefore, to meet these constantly changing challenges it is essential to introduce reliability-aware techniques in the design flow of an application. Such an approach is more reasonable in FPGAs, as opposed to ASICs, where there is a greater flexibility in the choice of hardware mapping and placement. In [16] we presented several techniques to enhance lifetime reliability of FPGAs. However, the analysis was presented for a traditional CLB array based architecture not taking into consideration temperature and voltage variations common in heterogenous platform FPGAs. Also selective alternate routing techniques (SART) performed at post place-and-route stage are constrained by the placement of the design.

17 5 1.3 Impact of Process Variations Variations in process parameters during semi-conductor process are more commonly known as process variations. With device size scaling beyond nano-meter regime, process variations have become unavoidable and effect key device parameters such as gate length, oxide thickness, etc. Such variations and their resultant effect of the device threshold voltage (V th ), and on-current (I on ) affect both the performance and power consumption of key FPGA components [17] [18]. In order to analyze the lifetime reliability of an FPGA, it is important to be aware of the underlying manufacturing variations. Inconsistencies in the device characteristics due to manufacturing defects influence the rate of aging due most of the aging based failure mechanisms. To incorporate the effects of process variations on device lifetime, we first model the variations in device characteristics of an FPGA. Device degradation characteristics due to variations in process parameters such as effective channel length, oxide thickness, device threshold etc were obtained using statistical modeling techniques. The device characteristics were in turn used to model the delay degradation and leakage power variations in various routing components such as wire segments, switch blocks, and routing multiplexors. 1.4 Contributions This dissertation makes the following three key contributions

18 6 DTRM, a dynamic thermal-aware reliability modeling framework to analyze and evaluate the impact of various lifetime and performance degradation phenomenon in platform FPGAs [19]. Modeling the impact of environmental stress due to temperature and voltage variations. Integrating reliability awareness in to the placement and routing stages of the design flow [20][21]. A detailed analysis of the impact of process variations on the routing architecture of an FPGA and their impact on NBTI aging. We propose SIRA a Statistically Intelligent Routing Algorithm to improve the timing and leakage yield of mapped designs [22]. 1.5 Organization The dissertation is organized as follows. We present a Dynamic Reliability Modeling (DRM) framework which models the MTTF of various intrinsic failure mechanisms in chapter 2. Modeling techniques using in DRM compute the lifetime reliability of a design due design specific characteristics such as switching activity, and gate leakage, while taking into account with-in design variations in temperature and voltage. In chapter 3 we present a Reliability-Aware FPGA design flow, where the reliability metrics derived from DRM are using in placement and routing stages of the design flow to improve the average lifetime of a design. In chapter

19 7 4, we present SIRA a statistically intelligent routing algorithm that improves the timing and power yield of design in the presence of process variations. We also explore the impact of process variations on NBTI aging and propose solutions to improve timing yield. Chapter 5 summarizes the key contributions and discusses future directions.

20 Chapter 2 Dynamic Reliability Modeling 2.1 Introduction Design flows for Application Specific Integrated Circuits (ASICs) have traditionally been designed to optimize the resulting layout of a given design to meet the performance, power, and reliability targets. However, in reconfigurable platforms the choices made in mapping, placement and routing stages of the design flow are constrained by the available resources on the target platform. Hence, a static pre-layout optimizations effective in ASICs fails to satisfy design specific dynamic optimizations required in FPGAs. Additionally, adapting to the variations in device degradation due to design specific stress conditions, requires dynamic reliability evaluation techniques integrated into the FPGA design flow. In this chapter we present our dynamic reliability modeling (DRM) framework used to evaluate the reliability metrics of various components in an FPGA fabric. We briefly dis-

21 9 cuss various intrinsic failure phenomenon causing lifetime degradation in FPGAs. We present our modeling techniques used to evaluate the impact of such failures on the lifetime of FPGA. In section we present a brief overview of FPGA architectures explored and design tools used in our study. 2.2 FPGA Architectures In our study, we explore two different FPGA architectures chosen to demonstrate the impact of temperature and voltage variations on reliability the reliability of platform FPGAs and Dual-Vdd based architectures respectively Heterogenous FPGA A Xilinx s virtex-5 like FPGA architecture (HET) at 65nm technology node was chosen to demonstrate the with-in design temperature variations. Each Configurable Logic Block (CLB) in this architecture consists of four 6-input LUTs. Apart from the logic blocks this architecture consists of high power consuming heterogenous blocks, such as Multi-Giga Bit Transceivers (MGTs), Clock Management Tiles (CMTs), 25x18 Digital Signal Processor (DSP) Slices, and 36Kb Block RAMs. Figure 2.1 depicts a snap shot of a xilinx vertex-5 FPGA. The high heterogeneity of this architecture enables us to analyze design mappings that not only have an high peak temperature but also contain considerable variation in on-chip

22 10 temperatures. Figure 2.1. Snapshot of Xilinx Virtex-5 FPGA architecture VDDL VDDH CLB S S Interconnect switch matrix LC LC BLE BLE BLE BLE 1 2 LC BLE BLE LC BLE BLE BLE BLE I/O multiplier 1:OMUX 2:IMUX 3:DOUBLE 4: HEX 5:LONGH 6: LONGV Figure 2.2. Dual VDD FPGA Architecture

23 Dual-VDD FPGA Architecture We have chosen a SRAM-based FPGA architecture with Configurable Logic Blocks (CLBs), I/O blocks, and 18x18 multiplier blocks for our study. The basic logic element (BLE) consists of a 4-input LUT and a flip-flop. Each CLB consists of ten such BLEs. In order to show the impact of voltage assignments on LUTs and routing logic reliability we use a dual V dd architecture (DVDD) similar to [8]. Figure 2.4 shows an abstract view of a CLB in this architecture. Each CLB is composed of an interconnect switch matrix, which contains the necessary multiplexors for routing the selected input/output signals through/from the CLB. The size of multiplexors and level restoring buffers used is dependent on the size of the routing wire segment [23]. Figure 2.3 depicts a more detailed view of both global and local routing architecture of a CLB. It also depicts the various timing sub-components that constitute the delay of routed net through a CLB. The logic delay of a CLB mainly consists of the delay of chain of multiplexed pass transistors triggered by the configuration SRAM cells. Net routing delay on the other hand is a combination of various components such as the routing multiplexors, buffers, level restores etc. It can be observed that the net routing delay dominates the total delay of a routed net, when compared to the logic delay of a CLB. Multiplexors ranging from 4-input to 64-inputs were used in this study. It also consists of a set of level converters to enable the selection of two different supply voltages (VDDL and VDDH). In this architecture the level conversion takes place

24 12 only at CLB input pins. Such a choice was motivated by the fact that having level converters (LCs) at the input provides higher flexibility in V dd assignments compared to outputs [8]. Net Routing Delay Te Buf+ Te Mux Ti Buf+ Ti Mux Input MUXs LUT FF Routing Wire Segment Local Buffers Local Routing MUXs Logic Delay BLE N BLEs Figure 2.3. CLB Detailed Routing Architecture To demonstrate the impact of process parameters on various intrinsic failures, we also consider a conservative process technology model known as CPTM. Table 2.1 depicts the key process parameters of PTM for 65nm technology [24] and CPTM. Device Parameter PTM CPTM V dd (V) V th T ox (nm) Table 2.1. Technology Process Parameters

25 13 SINGLE DOUBLE HEX LONG Figure 2.4. FPGA Routing Wire Segments 2.3 Modeling Intrinsic Failures Failures in integrated circuits can be broadly classified into transient and permanent faults. Transient faults such as soft errors or single-event-upsets (SEUs) occur due to particle strikes during exposure to external radiation. Extensive research has been done to develop soft-error resilient architectures and circuit design techniques [25] [26] [27]. Permanent faults on the other hand are caused due to defects in manufacturing process and can be exacerbated due to stress induced by operating conditions. These can be further classified into extrinsic and intrinsic failures. Extrinsic failures are caused by process and manufacturing defects and are usually detected during the burn-in process. Intrinsic failures, however are caused due to the aging/wear-out of devices under specified operating conditions. Such failures

26 14 occur frequently over the years and result in performance degradation or complete failure of the device. Due to the strong dependence of such failures on operating conditions such as on-chip temperature and supply voltage, it is essential to estimate the occurrence and the rate of intrinsic failures to ensure the lifetime reliability of FPGAs. Device 5vLX50t Temp (C) X location(cm) Y location(cm) Figure 2.5. Thermal Profile of an architectural template of Virtex-5 XC5VLX50Tff1136 Figure 2.5 depicts the variation of on-chip temperature of a architectural template that mimics a Xilinx Virtex-5 FPGA, during a scenario of peak power consumption. Note that the dimensions used are architectural estimates, and do not reflect the actual parameters used by Xilinx. The peak power numbers of various components provided in Xpower Estimator (XPE), were used in calculation of

27 15 such variations. A toggle rate of 12.5% was assumed for all the blocks. The temperatures were measured using thermal simulation flow described in section The occurrence of with-in die temperature variations, and temperature hotspots is evident from the figure. Technology Node (nm) V dd (V) V th (V) T ox (nm) Table 2.2. Technology Parameters Electromigration Electromigration is a phenomenon caused by the gradual movement of the conductor metal atoms of an interconnect. Large current densities over the interconnects cause the conducting electrons to transfer their momentum to diffusing metal atoms. As the feature size of metal interconnects decreases, such a migration of metal atoms can either cause voids/open-circuits or can result in shorts due to metal atom pile up between wires. Derived from the widely adapted Black s equation for Electromigration and the models in [28], the mean time to failure (MTTF) due to electromigration is show in equation 2.1. MTTF em (J) n e EaEM kt L eff (2.1)

28 16 Where J is the current density in the interconnect, n and E am are constants dependent on the type of metal interconnect used. L eff is the effective length of the wire across which the current density is measured, k is the Boltzmann s constant, and T is the absolute temperature of the wire in degrees kelvin. The current density of the wire (J) is dependent on the cross-section area of the wire perpendicular to the current flow, and the transition density. Equation 2.2 depicts such relation. J = CV dd WH f p (2.2) Where V dd is the supply voltage, C, W, and H are the capacitance, width, and thickness of the wire respectively. The transition density is depicted as the product of the operating clock frequency f, and the switching probability p of the wire. It can be observed that MTTF of a wire due to electromigration is exponentially dependent on the operating temperature and current density and decreases linearly with the effective length of the interconnect. The current density (J) is in turn increases linearly with supply voltage (V dd ). In [29] the authors demonstrate the impact of scaling of interconnect dimensions on the mean time to failure. When a scaling factor of k is applied, the lifetime of a device due to electromigration reduces by k 2. Using a system level IC simulator called SysRel [30] we obtained the MTTF for a 90nm technology Copper (Cu) interconnect of length 100um to be close to 3.8 yrs. Using this as a base case, and the scaling factors for 65nm technology we

29 17 calculate the mean time to failures of wires in 65nm platform FPGAs Em MTTF (yrs) Temperature (K) Figure 2.6. Impact of Temperature and switching activity on Electromigration of single (len = 1) wires FPGAs contain interconnects of various lengths to obtain high routing flexibility, while improving performance. The choice of routing resources, is usually timing driven, during the routing stage of the design flow. In our study we consider the effective length (L eff ) of a wire to be the longest run of a wire between two vias. Earlier studies show that for wire lengths ranging from 5-25um the impact of electromigration is mitigated due to stress-induced back flow of Cu metal ions [31]. For each, net in the design we calculate the wire-lengths of different segments it is composed of. Each of the wires is assigned the average temperature of the basic blocks in its proximity and voltage of the drivers. This information is obtained from the thermal floorplan of the design, the generation of which is explained in section 2.4. Figure 2.6 depicts the relationship between the MTTF of the design due to

18 electromigration, and factors affecting it namely temperature, and switching activity.

Note that in this study we do not model the temperature dependent variations in device currents.

In a Dual-V dd FPGA fabric this translates into a low EM stress on wires driven by low-v dd blocks and vice versa. Figure 2.

30 18 electromigration, and factors affecting it namely temperature, and switching activity. It can be observed that the MTTF at various switching activities decreases exponentially with increasing temperatures. Note that in this study we do not model the temperature dependent variations in device currents. Switching the supply voltage from a high value to low decreases the current density, which in turn reduces the extent of momentum transfer between charge carrying electrons and Cu metal ions. In a Dual-V dd FPGA fabric this translates into a low EM stress on wires driven by low-v dd blocks and vice versa. Figure 2.7 depicts the MTTF EM due to temperature and voltage variation in iir one of the benchmark designs mapped to DVDD architecture. M T T F E M ( Y r s ) Figure 2.7. Impact of Voltage on MTTF due to Electromigration Low-power driven voltage assignment techniques oblivious to the EM induced stress, will always favor the assignment of low V dd to nets in the non-critical path. However, this could result in a relatively early failure of the nets both on the critical paths and in the proximity of temperature hotspots. Additionally, the availability

31 19 of routing resources in a channel during routing is based on timing slack or congestion cost in timing and routability-driven routing algorithms respectively. Such heuristics that do not take into consideration factors (such as temperatures, supply voltage etc) critical to lifetime reliability of interconnects will result in worst-case EM stress scenarios. Hence, there is a strong need to incorporate lifetime reliability metrics into the criticality heuristics of routing algorithms. (a) PTM (b) CPTM Figure 2.8. using PTM (a), and CPTM (b)

32 Negative Bias Temperature Instability With continued scaling, Negative Bias Temperature Instability (NBTI) has become a major reliability concern in high performance digital IC design. The physics of failure due to NBTI has been extensively investigated and attempts at modeling the same have been made in [32] [33] [34] [35]. NBTI occurs when a PMOS transistor is biased in inversion (V gs = V dd ). Interface traps are generated due to the disassociation of Silicon Hydrogen (Si-H) bonds along the silicon-oxide interface. The rate of generation of these traps is dependent on factors like operating temperature (T), static signal probability (α), supply voltage (V dd ), and the period of stress. The generation of these traps results in the increase of threshold voltage ( V th ) and a reduction in the on-current (I on ) of the PMOS transistors in the design. Due to its strong dependence of electric field across the gate-oxide, and operating temperatures, this phenomenon accelerates with device scaling. Research has shown that removal of the applied negative bias (V gs = 0V) reverses the phenomenon due to annealing of interface traps [36] [13]. The increase in threshold voltage due to NBTI results in performance degradation of PMOS transistors. We adapt a long-term NBTI prediction model from [13] to evaluate the ( V th ) degradation. Equation 2.3 depicts such a model, where K v is a function of carrier concentration, temperature, and electric field, α is the static signal probability and it reflects the stress period relative to clock period

33 l a y d e e e as N B T I D e l a y D e g r a d a t i o n i n %)i cr n ( 1 0 x x x x 1 L R Figure 2.9. Delay Degradation of Input Multiplexors Due to NBTI (t=10yrs) T clk, n (= 0.16) is the time exponential constant. V th = α Kv 2 T clk 1 (1 β t 2n ) 2n (2.3) where, β t = 1 2ξ 1 t e + ξ 2 C (1 α) T clk 2t ox + C t (2.4) Figure 2.8(a) depicts the NBTI delay degradation observed in buffers driving wire segments of varying lengths in an FPGA. The sizing of the buffers was based on the sizing analysis for heterogenous architectures presented in [37]. The duty cycle α of the wires was 0.5 for this analysis. It can be observed from figure 2.8(a),

34 22 that the percentage delay degradation decreases with the increase in length of the wire driven by the buffer. This can be attributed to the low change in V th with the increase in size of the devices in buffers driving longer wires. Figure 2.8(b) demonstrates the delay degradation of the same buffers using CPTM a conservative model shown in table 2.1. More importantly, it is evident that the delay degradation has a strong dependence on temperature, and is a result of the change in threshold voltage of the transistor due to NBTI. With supply voltages scaling below 0.9V, such an increase in threshold voltage can prove to be detrimental to the operation of devices beyond 65nm. Particularly, it affects the stability of configuration SRAMs, and can lead to delay degradation in the circuit elements in the routing network such as, routing Multiplexors, buffers, and level restorers Time-Dependent Dielectric Breakdown One of the CMOS device parameters that has been rapidly scaling along with technology is the thickness of the gate-oxide, to improve performance at the expense of increasing gate-leakage. However, due to a non-ideal scaling of supply voltage such ultra thin gate oxides are subject to high electric fields. Due to such high channel currents, over time charges are trapped in the gate-oxide, creating an electric field followed by charge flow through the oxide. Such a phenomenon which causes the breakdown of gate-oxide over a period of time is termed as Time

35 23 Dependent Dielectric Breakdown (TDDB). The lifetime due to TDDB is highly dependent on thickness and area of the oxide, supply voltage, and has a larger exponential dependence on temperature [38]. The MTTF due to TDDB is directly proportional to the gate leakage current I leak. As the thickness of the gate oxides reduces due to scaling technology trends, there is an exponential increase in the gate leakage and tunneling current. Equation 2.5 depicts the relationship between the mean time to oxide break down and the scaling trends in supply voltage, oxide thickness, area, and temperature as discussed in [29]. MTTF tddb 10 tox A X+ Y +ZT ox T 0.22 e kt (2.5) V (a bt) Figure 2.10 demonstrates the impact of temperature on MTTF due to TDDB for various technology nodes. It is evident from figure 2.10 that with every generation of technology scaling the MTTF due to TDDB decreases rapidly. The increase of on-chip temperatures with technology scaling has an adverse exponential impact on MTTF due to TDDB. In addition to increasing temperatures, in a Dual V dd architecture varying supply voltage across components creates a non-uniform aging due to TDDB. Figure 2.11 demonstrates the impact of supply voltage on MTTF due to TDDB at various temperatures. The strong dependence between supply voltage and lifetime degradation due to TDDB can be observed. In a dual-v dd FPGA by control-

36 nm 90nm 180nm 300 MTTF in years Temperature (C) Figure Impact of temperature on TDDB ling the supply voltage of LUT inputs aging stress due to TDDB can be varied. Hence, by analyzing the voltage assignments, and temperature variations at the post place-and-route phase of the design we generate a new placement that aids in uniform aging of LUTs. In this work we consider the impact of TDDB on FPGA circuit elements like Look-Up-Tables (LUTs) and routing buffers Hot Carrier Instability Channel Hot Carrier Injection occurs when charge carriers in a CMOS transistor gain enough energy to be trapped in the SiO 2 gate oxide. The trapped charges generate defects that can permanently change the electrical characteristics of the transistor. This phenomenon in NMOS transistors occurs when the gate voltage V gs is comparable to the drain voltage V ds. The impact of trapped charges manifests itself as increase in threshold voltage V th. In a dual V dd architecture, such

37 25 70 MTTF (Years) Temperature (K) Figure Impact of voltage on TDDB dependence of V th degradation on gate voltage translates into voltage dependent delay degradation. In our analysis, we use the HCI degradation model from [39] to estimate V th degradation with time. The observed V th degradation at different supply voltages is shown in figure Vth degradation Time (years) Figure V th Degradation due to CHC

38 26 The routing resources in a FPGA consist of programmable routing multiplexers to establish interconnections between wire segments of various lengths, input muxes (IMUX) to connect routing fabric to CLB inputs and output muxes (OMUX) to connect from CLB to the routing fabric [40]. The size and logic depth of these routing multiplexers is dependent on the length of the driving/driven wire segments and are usually implemented as tree of NMOS transistors. Hence the delay degradation of routing muxes is also dependent on the size of the connecting wire segments. Figure 2.13 depicts the delay degradation due to increase in V th for different routing multiplexers. Multiplexers of size 4, 10, 16, 24, and 32 were used to model routing multiplexers driving OMUX, SINGLE, DOUBLE, HEX, and LONG wires respectively. It can be observed that the delay degradation increases exponentially with the size of the routing multiplexers. We capture such dependence of delay on the size of the interconnect and the age of the routing multiplexers in the heuristics of our reliability aware routing algorithm discussed in section DRM Framework It is evident from the description of failure models in section 2.3 that along with temperature and supply voltage, factors like switching activity, power density, static probability etc, play a crucial role in determining the lifetime of an FPGA.

39 27 7 x Delay (secs) SINGLE OMUX DOUBLE HEX LONG Threshold Voltage (Vth) Figure Delay Degradation in Routing MUXs with V th Consequently, to analyze the impact of these factors on lifetime reliability, it is essential to accurately measure these parameters for every design mapped on the FPGA. The power density, and hence the temperature of an FPGA is highly dependent on the configured applications. More importantly, due to the variation in on-chip temperatures in platform FPGAs, the placement of different components of design relative to the thermal hotspots plays a crucial role in estimating the life time. Figure 2.14 depicts the DRM framework used in our experiments to estimate lifetime reliability. The flow starts with the synthesis and implementation of the design under consideration using Xilinx ISE 10.1i tools. We then use post place and route data of a design, to generate a thermal profile of the design, and estimate its lifetime.

40 28 Design NO DVDD? YES ISE Design Flow (HET) VPR(5.0) (DVDD) Power Report (.pwr) Design.xdl DRM GenSP Thermal Simulator ActGen Static Probability Thermal Floorplan Switching Activity NBTI TDDB EM CHC MTTF_NBTI MTTF_TDDB MTTF_EM MTTF_CHC Lifetime MAP Thermal Simulation Figure DRM Framework To calculate the lifetime degradation due to various intrinsic failures described in section 2.3, it is essential to accurately estimate on-chip temperatures. On-chip temperatures of an FPGA, unlike ASICs, are highly dependent on the configured application. Hence, post-layout thermal estimation techniques used in ASICs are inapplicable for FPGAs. The thermal simulator module in the DTRM framework

41 29 is used to generate a design specific thermal profile capturing the temperatures at various regions on an FPGA. Xilinx Virtex-5 devices are equipped with a System Monitor to measure the temperature of the device [41]. However, this is located at the center of the die, and fails to capture the intra-die temperature variations common in platform FPGAs. To capture such variations, we augment Hotspot4.0 [42] a widely used temperature simulator. The high granularity of basic block sizes used in hotspot provides a viable option to compute intra-die temperature variations within an FPGA. Several package and die specific parameters can be fine tuned to achieve accurate temperature estimates. Xilinx provides estimates of worst-case power consumption of different components, like CLBs, BRAM, IO, DSP48E, etc through Xilinx Power Spreadsheets. Using these values as a reference, we estimated the thermal parameters like package-to-air thermal resistance, substrate thickness, and spreader thickness. Hotspot takes as input the floorplan of the device along with the distributed power consumption. From the layout of the Virtex-5 XCV5LX50t-ff1136 device, we created a floorplan that has blocks with granularity as small as a CLB. We use an enhanced Xilinx Xpower Analyzer (XPA) tool to compute the power consumption of various blocks of the design mapped on to the FPGA. The ActGen module depicted in the framework, first generates a routing graph (Rgraph) that models the connectivity of nets and logic blocks, based on the post place and route description of the design obtained from the XDL file. The transi-

42 30 tion probabilities of signals obtained from parsing the XPA settings (XML) file are then propagated by traversing the Rgraph. The updated settings file is input to XPA to compute the final power distribution. XPA reports the power consumption values in terms of Signals, Logic, and IO. The thermal simulation model calculates the total power consumed by a block in the floorplan by assimilating the power consumed by individual FPGA components that are mapped to that floorplan site. Such an association is achieved by parsing the XPA power report file (.pwr) and the traversal of the Rgraph. Using this information Hotspot generates a thermal floorplan that captures the intra-die thermal variations of the entire design. Figures 2.15 shows the thermal profile, and the corresponding post-route floorplan generated for PWRD benchmark. Temp (C) X location(cm) Y location(cm) Figure Thermal Profile of PWRD

43 Static Probability Estimation One of the major factors affecting the NBTI delay degradation of a PMOS transistors is the static probability or duty cycle of the input. Hence, to measure the impact of such a degradation on the timing of a design, it is essential to obtain the static probabilities of all the PMOS inputs of the FPGA. GenSP module of the DTRM framework, computes such information for all the signals of a design mapped on to the FPGA. It has been observed that the delay in the routing elements of an FPGA accounts for more than 60-80% of the critical path delay [43]. Configuration SRAM cells used to store the logic information in Look Up Tables (LUTs) of a CLB. However, in order to reduce the leakage power consumption these are designed today using triple-oxide di-electrics. As a result, impact of NBTI on the configuration logic is subdued. Also, the SRAM configuration cells do not effect the critical path delay of a design. Hence in this work, we only analyze the delay degradation of PMOS transistors in the routing fabric, which includes buffers, level restorers, routing multiplexers, and switch blocks. Since each of the metal wires belonging to a net share the same static probability, we estimate the static probabilities of all the nets in a design. Similar to the method used for estimation switching activity, we create a logic graph (Lgraph) of the design by parsing the XDL description of a design. The XDL description of a logic slice of the design captures the stored logic in the form of a configuration string cfg. Each node of the Lgraph stores the logic of the corresponding LUT

32 that it represents. This information is used to hierarchically propagate the static probabilities of input signals. 2.4.

44 32 that it represents. This information is used to hierarchically propagate the static probabilities of input signals Validation of DRM Framework In this section, we present the methodology used to validate the results predicted by the thermal simulator. In order to get real time estimates of the on-chip temperatures, we introduce ring-oscillator based probes in to different regions of the FPGA floor plan. The delay dependence of a ring oscillator on temperature provides a convenient means to measure temperature variations on an FPGA. We used Xilinx ML550 board with XCV5LX50T, a Virtex-5 65nm FPGA for our experiments with aid of Sun Systems EC11 thermal chamber and a HP 5386A frequency counter to control the ambient temperature and measure the ring oscillator frequency. The experimental setup used is depicted in figure Figure Thermal Validation Setup

45 33 Xilinx Virtex-5 FPGAs are equipped with a System Monitor, which aids in thermal management and measurement of on-chip power supply voltages. However, the system monitor is located at the center of die and hence cannot account for the temperature variations caused by the heterogeneity of components resulting in different power densities. To accurately measure the temperatures at various thermal hotspots on a chip, we first analyze the relationship between the delay, and hence frequency, of the ring oscillator and temperature. To achieve this, we configure various regions of the FPGA fabric to emulate a ring oscillator module, and measure frequency of oscillators in different locations, while varying the ambient temperature in gradual increments. At any given temperature the oscillator exhibits a fixed frequency of oscillation in some degree. The frequency measured provides the required relationship between the ring oscillator frequency and onchip temperatures. Note that the frequency counter used in measuring the ring oscillator frequency can detect changes at a granularity of 10 3 MHz. Figure 2.17 depicts the observed relationship between temperature and oscillator frequency. The floorplan sites corresponding to the ring oscillator placement are represented in the format SL XCYR. Where C, and R stand for the row and column values of the floorplan grid. It can be observed that there is a linear relationship between temperature and the frequency of the ring oscillators within the temperatures. In order to validate the temperatures calculated by the thermal simulator, we augment the PWRD design with ring oscillator modules. These modules act as

46 34 Frequency (Mhz) SL_X0Y18 SL_X0Y48 SL_X24Y8 SL_X24Y57 SL_X24Y100 SL_X44Y27 SL_X44Y Temperature (C) Figure Variation in Ring Oscillator Frequency with Temperature thermal sensors and are placed in regions formerly identified as hotspots by the thermal simulator. Such a placement is achieved by introducing LOC placement constraints in the user constraints file (.ucf) for each of the ring oscillator modules of the design. However, we observed a supply voltage drop of 0.996V to 0.955V when the PWRD circuit is switching. In order to minimize frequency dependency by the supply voltage variations, we designed a clock control logic. This logic disconnects a clock from the circuit while the oscillator frequencies are being measured using frequency counters. The disconnection duration was about 0.1sec, which is minimum time required by the frequency counter to capture a signal. Figure 2.18 demonstrates the difference between the actual temperatures measured using the ring oscillator methodology to the results of temperature simulation. The thermal simulator parameters like airflow, and thermal resistivity were adjusted to be similar to that of the thermal chamber. An average error of 0.64 C in temperature

47 35 measurements was observed over 6 floorplan sites. Ring Osc Hotspot Temperature (C) X0Y8 X0Y48 X8Y18 X24Y8 X24Y57 X24Y100 Floorplan Site Figure DRM Thermal Validation 2.5 Analysis of Thermal Dependent Failure Mechanisms In order to demonstrate the detrimental impact of on-chip temperatures on the intrinsic failures discussed in section 2.3 we study the failure rates in a selected set of benchmarks. The benchmark suite used in our analysis consists of 6 different designs, most of them generated using Xilinx tools. PWRD is a Xilinx benchmarks used particularly to demonstrate the maximum power consumption characteristics of the chosen Virtex-5 FPGA. FFT Coregen and DDS Compiler are generated using Coregen. XAPP867 is a Xilinx Virtex-5 benchmark and implements a DDR3 memory controller interface. AuroraV5 and TripleDes are generated using

48 36 the Virtex-5 Aurora and Data Encryption Standard (DES) IP s of coregen. The choice of benchmarks was made to demonstrate a good range of on-chip thermal variations. Table 2.3 shows the resource utilization and thermal characteristics of each of the benchmarks. Design FFT DDS XAPP867 TripleDES AuroraV5 PWRD Table 2.3. Benchmark Characteristics Resource Temp( Hotspot C) Freq Utilization Peak Variation (MHz) Slice: 10% IOB: 4% DSP48E: 95% BRAM BRAM: 100% Slice: 18% DSP48E: 68% DCM BRAM: 10% Slice: 13% IOB: 20% DCM DCM: 8% Slice: 38% IOB: 62% IOB Slice: 27% IOB: 6% GTP GTP: 60% Slice: 5% DSP48E: 95% IOB: 4% BRAM BRAM: 100% Electromigration As observed in section 2.3.1, the lifetime of wires due to electromigration is inversely proportional to the length of the wires. Hence, for each of the nets in a design, to measure the MTTF due to electromigration, we have to extract the wire-lengths of different metal wires that connect its source and destination terminals. This is accomplished by parsing the post-place description of the mapped design in Xilinx Development Language (XDL). The switching activity of each net, critical in estimating the current density (J), is extracted from the updated

49 37 power report by the ActGen module. Due to intra-die temperature variations, it is essential to accurately estimate the temperature of the wire segments of a net. To achieve this, we first determine the floorplan sites close to the wire by examining its pin connections in the Rgraph. The temperature of the wire is then obtained by correlating the position of the wire with the temperatures in the thermal profile of the design MTTFEM (yrs) X location(cm) Y location(cm) Figure Thermal Profile of TripleDes Figure 2.19 depicts the temperature variations observed for TripleDes. Figures 2.20(a) and 2.20(b) depict the evaluated Mean Time To Failure and the current density characteristics of the interconnect respectively. The positive correlation between the current density and mean time to failure is evident from the figures.

38 (a) MTTF Current Density 700 700 600 500 Current Density 400 300 200 100 0 600 500 400 300 200 100 0 0 0.2 0.4 0.6 X location(cm) 0.8 1 1.2 0 0.2 0.4 0.6 1.2 1 0.

50 38 (a) MTTF Current Density Current Density X location(cm) Y location(cm) (b) Current Density (J) Figure MTTF due to Electromigration (a) and Current Density (b) Characteristics for TripleDes Figure 2.21 shows the cumulative failure rates of wires for each of the benchmarks. It can be observed that in the case of FFT, AuroraV5, and PWRD, where the chip temperatures were in the range of C, the more than 18% of wires fail in less than 5yrs. The relatively low failure rate of TripleDes can be attributed to the high base line resource utilization( refer to Slice/IOB utilization in table 2.3).

51 39 (%) Cumulative Failure FFT PWRDemo DDS xapp867 TripleDes AuroraV Time in Years Figure Failures in wires due to EM 18 (%) Delay Degradation yrs 5yrs 10yrs 2 0 PWRD FFT DDS XAPP867 Design Figure NBTI Delay Degradation using PTM Negative Bias Temperature Instability The static probabilities generated using the GenSP module are given as input to measure the delay degradation caused by NBTI. The buffer sizing and the number of stages of buffers used in the routing fabric is highly dependent of the length of the wire and its fan-out [37]. The XDL description of a net is parsed to generate the lengths of different wires it is comprised of and the size of the routing multiplexers

40 Figure 2.23. NBTI Delay Degradation using CPTM used is determined by examining whether the wire spans across a channel or is constrained to a specific logic slice.

52 40 Figure NBTI Delay Degradation using CPTM used is determined by examining whether the wire spans across a channel or is constrained to a specific logic slice. Using the PTM [24] model for a 65nm, we model the NBTI degradation of various routing elements, and use these values to determine the delay degradation between any two nodes of the Rgraph. Nodes closer to thermal hotspots or belonging to the critical path of the design are of particular interest, due to their extent of impact on the timing of the device. The routing elements in the critical-path of a design, and their operating temperatures, are obtained by correlating the critical-path logic resources obtained from detailed post Place-and-Route static timing report of Xilinx ISE10.1 tools with the Rgraph. During place and route, the switch and interconnect matrices in the FPGA fabric are used to connect the inputs and outputs of different FPGA circuit elements (CLBs, BRAMs etc). The configuration of the switch matrix and the set of routing elements used in modeling a net in the routing network is obtained

53 41 by parsing the XDL file. The length of the wires and the number of pass transistor switches used in modeling a net are obtained from the PIPs ( Programmable Interconnect Points) constructs used in XDL file. Based on the obtained lengths of the wire-segments and their fanout, we model the sizing of buffers. For each of the benchmarks, we compute the delay degradation in the routing elements (such as buffers and level restorers) and determine the % increase in the delay of the critical path. Figure 2.22 shows the observed percentage delay degradation in the top ten critical paths of the designs. Over all the six benchmarks, an average delay degradation of 6.8% was observed over 3 yrs. On average from 3 to 10 yrs, degradation due to NBTI increases by a further 4.1%. In benchmarks like PWRD and AuroraV5, due to high temperatures in the nodes of the critical path a relatively high rate of NBTI degradation was observed. Figure 2.23 shows the corresponding percentage delay degradation when using CPTM parameters. The impact of process parameters on the delay degradation is evident from the results Time-Dependent Dielectric Breakdown As depicted in section 2.3.3, TDDB is a strong function of temperature, and leakage power. In this work, we evaluate the mean time to failure of LUTs, and routing buffers of a FPGA fabric. LUTs are primarily comprised of multiplexer circuits designed using pass transistor logic. The gate inputs are driven by nets in the design, but the multiplexer inputs are stored in configuration SRAMs. Due to the

54 42 use of 6-input LUTs in the Virtex-5 device family, we model a LUT by creating a layout of 64 1 multiplexer in 65nm technology using BSIM4 models [24]. In order to compute the gate leakage of an LUT, it is essential to determine the static probability of the inputs, and the configuration SRAM bits. However, the actual configuration bits of an LUT are highly dependent on the coding styles used by a vendor. Hence, in this work we use the average leakage value computed over all the input combinations. As mentioned earlier, we model the sizing of routing buffers based on the length of the driven wire segment. Leakage values obtained from such measurements are used in modeling the impact of TDDB on the lifetime of using equation 2.5. Figure 2.24 demonstrates the percentage of failed LUTs due (%) Failure in LUTs PWRD FFT DDS XAPP867 TripleDes Aurora Time in Years Figure MTTF of LUTs due to TDDB (PTM)

55 43 to TDDB over a span of 9 years. It can be observed that in benchmarks with high peak temperatures like FFT, and TripleDes more than 50% of the failures occur in less than 3 years. Where as, designs with relatively lower temperatures, like DDS, and XAPP867 the majority of the failures occur late. The dependence of MTTF due to TDDB on supply voltage can be exploited by the use of multi-vdd FPGAs. By lowering the voltages of non-timing critical blocks in the vicinity of a hotspot, we can decrease the impact of TDDB on both logic and routing elements of an FPGA. conservative process a slight increase in threshold voltage, and oxide thickness. Figure 2.25 demonstrates the increase in lifetime achieved using such a process with Vth = 0.28nm and Tox = 1.66nm. It can be observed that in both the cases the failure rate is concentrated towards the low and high extremes. This is in accordance with the observed trend in static probability distribution in LUTs.

56 Figure MTTF of LUTs due to TDDB (CPTM) 44

57 Chapter 3 Reliability Enhancement Techniques 3.1 Introduction The impact of intrinsic failure mechanisms (mentioned in section 2.3) on the lifetime reliability of components on a FPGA is highly dependent on the operating conditions such as temperature and voltage in addition to design specific attributes. For instance, the power density (and hence the temperature) of a design module mapped onto an FPGA depends on the decisions made in mapping, placement and routing stages of the design flow. A multiplier module in a design can be implemented using Configurable Logic Blocks (CLBs) or can be mapped to hard multiplier blocks. The placement of the module relative to its neighboring circuit elements, and the power consumption of the routing elements used will in turn have an impact on its power density. Such a non-uniform distribution of these factors, due to their design dependence and heterogeneity of the FPGA fabric,

58 46 results in a rather non-uniform aging phenomenon. Components closer to design hotspots or in the design critical path (assigned high V dd ), will age aggressively compared to the rest of the fabric. 3.2 Reliability Aware Design Flow ABC Blif Net list ACE_HET TV-PACK Placement (RA-VPR) lifetime map Routing (RA-VPR) DRM VDD Assignment PT-SIM Tprof Configure.bit File yes Overflow(AgeCounter)? Figure 3.1. Reliability Aware Design Flow By analyzing the aging characteristics of components used in a design at the post place-and-route phase of the design flow, we can obtain a lifetime map of the FPGA. A lifetime map of an FPGA captures the impact of various intrinsic failure mechanisms on the lifetime reliability of the used components. Today, FPGA design flows can be typically optimized for both power and performance.

59 47 Using the notion of a lifetime map during reconfiguration, we can now introduce aging characteristics in determining the criticality heuristics in various stages of the design flow. This can also be used to vary the placement and routing of a single design over time to ensure uniform aging of its components. Figure 3.1 depicts key stages in our reliability aware design flow. In order to incorporate the impact of temperature variations in our reliability analysis, we augmented the heterogenous version of VPR with the power model [44] [45]. The activity estimation module (ACE) was modified to handle the propagation of activities and static signal probabilities across heterogenous components such as multiplier blocks. The power model itself was modified to include the effects of gate leakage and sub-threshold leakage (due to Drain Induced Barrier Lowering). The power consumption due to level converters (used in the dual V dd architecture) and multiplier blocks was obtained from circuit simulations of custom SPICE layouts. The PT-SIM module integrates the power evaluation and thermal simulation phases of the design flow. Once the power consumption of the design is estimated using the enhanced VPR powermodel, a thermal simulator is used to calculate on-chip temperature and its variations. TSIM [46] a hotspot [42] based thermal simulator was used to generate the temperature profile (Tprof). Figure 3.2 depicts the temperature profile of the rs decoder 1 benchmark. Thermal hotspots across the columns containing the multiplier blocks can be observed.

60 48 RS DECODER-1 Temperature Variations Temperature (C) X location(cm) Y location(cm) Figure 3.2. Temperature Variations in rs decoder Voltage Assignment and Reliability Evaluation The voltage assignment stage of the design flow uses the post place-and-route timing estimates to determine the critical paths in the design. In this study we used 0.9V and 1.1V as the low (VDDL) and high voltage(vddh) respectively. A criticality based voltage assignment algorithm similar to the one proposed in [8] is used. However, during the V dd assignment phase we use a heuristic that not only takes in to account the slack information of paths passing through CLBs but also their aging information from the lifetime-map. The cost function used in determining the order of V dd assignments is shown in equations 3.2 & 3.1.

61 49 Assign VDDL to all routing Multiplexors and logic blocks in the design P L = List of all paths in the design T crit = Originaldelayof the critical path of the design T α = T crit α, where α 1is a user defined metric CP L = {P i ǫp L delay(p i ) T α } For each logic block B i Criticality(B i ) = Crit(B i ) 3.2 While (CP L ) { P j = Max( (delay(cp L ))) B j = Set of all blocks on path P j Sort B j decreasing order of Criticality While(delay(P j ) > T α ) { B ij = first(b j ) B j = B j - B ij Assign VDDH to B ij and all the MUXs driving inputs of B ij Update Delay of all paths passing through B ij } CP L = CP L P i } Figure 3.3. Pseudo-code Voltage Assignment Algorithm AACost T (B i ) = failures j 1 MTTF j (B i ) + λ h AACost T 1 (B i ) (3.1) Crit(B i ) = AACost T (B i ) + (N i ) 1 Texp (3.2) Figure 3.4 depicts the V dd for IIR benchmark (mapped to DVDD architecture) using the proposed metrics. Using assumptions similar to the sum-of-failure rates model [11] we compute the current aggregate aging cost AACost T of components as the sum of lifetime

50 V dd L V dd H Figure 3.4. V dd Assignments for IIR acceleration factors due to all relevant failure mechanisms. Where T denotes the current iteration of the reliability evaluation.

62 50 V dd L V dd H Figure 3.4. V dd Assignments for IIR acceleration factors due to all relevant failure mechanisms. Where T denotes the current iteration of the reliability evaluation. The history of stress due to failure mechanisms is captured as the product of history parameter λ h and the aging cost in the previous aging interval. The criticality of block b i is computed as the sum of its AAcost and N i number of timing critical paths passing through it. The timing exponent term T exp controls the trade-off between timing and lifetime when determining logic block criticality. The reliability aware placement algorithm takes as input a lifetime map of the fpga along with the technology mapped net-list of the design. During the first placement we assume that the components are stress free and are practically immortal. However, after every subsequent aging-interval, the

63 51 updated values of component lifetimes are obtained from the lifetime map created by DRM during the previous run. We observed empirically that an aging-interval of 1.5 years was effective for reliability evaluation. The reliability evaluation module takes as input design specific attributes (such as transition density of nets, signal probabilities etc) and dynamic information about operating conditions (such as assigned supply voltage, temperature). The placement of logic blocks and routing information of interconnect nets is also passed to DRM after place-and-route. The temperature of different components is obtained by correlating the placement and routing information with Tprof of the design. A component based lifetime analysis is performed on LUTs, routing multiplexers and interconnects for the failures discussed in section Reliability Aware Placement Algorithm The placement phase of the design flow in contemporary FPGA CAD flows employs simulated annealing based placement algorithms. An initial placement is generated by randomly assigning each logic block in the design to a legal FPGA location. The placement is then improved by iterative moves that involve swapping the positions logic blocks. The effectiveness of each move is evaluated using a timing-driven cost function. A move is accepted if it reduces the total cost of the placement. Moves that increase the cost are accepted with a probability based on current temperature of the annealing schedule, to avoid local minima. In timing-driven

64 52 placement algorithms the cost function is targeted to optimize the routing of timing critical nets and to minimize the average wirelength needed to route each net. To optimize the lifetime of components, we introduce metrics the capture aging characteristics due to variations in operating conditions. Cost = (3.3) Nets [ bbx (i) (1 λ β) q(i) C av,x (i) + bb ] y(i) C av,y (i) allconn +λ +β j blocks i i criticality(j) delay(j) criticality(b i ) AACost(b i ) (3.4) Equation 3.3 shows a modified cost function used in our reliability aware placement algorithm. The first two terms in the equation are the same as timing-driven placement. The first term captures the wirelength needed to route each net i by estimating the bounding box span (bb x and bb y ) in each direction. Where q(i) is the fan-out correlation factor, that compensates for the error in wirelength estimation for high fanout nets. The second term in the equation favors the placement of timing-critical nets to be optimized for low routing delay. Where the terms λ and β denote the proportionality constants used to control the trade-off between timing optimization and lifetime reliability of the design. The third term captures

65 53 the lifetime characteristics of logic blocks. Where AACost(b i ) and criticality(b i ) (derived from equations 3.1 & 3.2) denote the aggregated aging cost of each block b i, and the timing criticality of logic blocks respectively. This term ensures that after every aging interval favors the placement of blocks on timing critical paths in CLBs that have a relatively high MTTF due to intrinsic failures. Such a placement is in correlation with the V dd assignment heuristic that assigns a high V dd to blocks on timing critical paths, thus ensuring the uniform distribution of aging stress Reliability Aware Routing Algorithm We propose a reliability aware routing algorithm to integrate the delay degradation of routing multiplexers and the aging characteristics of electromigration in wires (presented in section 2.3 in the design flow. The routing phase of the design flow in FPGA employs a routing algorithm to establish connections between various logic blocks inputs and outputs by configuring programmable routing switches. We use Pathfinder negotiated congestion-delay algorithm [44], a timing and congestion driven iterative maze router in our study. The algorithm works in a greedy manner controlling the congestion-delay trade-off of each connection based on the timing criticality of the connecting net. Backtracking is employed when the greedy strategy results in a congestion or over-use of the routing resources. The nets are iteratively ripped-out and re-routed until all the congestion of routing resources is resolved. The cost functions used in choosing the routing resources connecting

66 54 two pins on a net is depicted in equations 3.5 and 3.6. Equation 3.5 shows the original cost function used in the Pathfinder, while equation 3.6 shows the modified reliability aware cost. The cost due to a pin selection is computed as the sum of the connections made so far (backpathcost), currentpincost, and estimated cost of the remaining connections (estimatedpathcost). The parameter α controls the aggressiveness of the expected cost evaluation metrics. Cost = backp athcost + currentp incost +α expectedcost (3.5) newcost = AACost(BP C) backp athcost +AACost(CP C) currentp incost +AACost(P P C) α expectedp athcost (3.6) AACost(P i ) = wires i Deg(Mux(C i )) MTTF em (C i ) (3.7) The aggregated aging cost of a path shown in equation 3.7 is computed as the average sum of delay-lifetime product of the routing multiplexers (Mux(C i )) and

67 55 wire segments C i in the path. From analysis in section 2.3 it is evident that the delay degradation of routing multiplexers is dependent on the length of the connecting wire-segments. Also the MTTF due to electromigration is directly proportional to the assigned supply voltage and transition probability. Hence, in a dual-v dd FPGA architecture the length of the wire-segments chosen during the routing stage can be varied to achieve low delay degradation and high lifetime based on factors such as the assigned voltage, transition density of the routed net. 3.3 Experimental Setup and Results To demonstrate the impact of operating conditions on variation of lifetime reliability of components we considered three variants of the DVDD architecture in our experiments. First we consider novar, a single V dd architecture agnostic to detailed temperature and voltage variations. The reliability aware design flow is applied excluding the V dd assignment and thermal simulation phases. In this case, a single supply voltage (VDDH) and design s peak temperature were used during reliability evaluation. The MTTF variations observed in case of novar are governed only by variations in design specific characteristics such as switching activity, static probability, gate leakage etc. In the second case, to demonstrate the impact of voltage variations we consider a multi-v dd architecture MVdd, that does not take into consideration temperature variations. We use 1.1V for VDDH and 0.9V for VDDL

68 0 D similar to the voltages used in commercial FPGAs that support programmable V dd [5]. Similar to novar we use the peak temperature of the design during reliability evaluation of MVdd. Finally, we consider the scenario Tvar-MVdd where both temperature and voltage variations were considered during reliability evaluation. Table 3.1 demonstrates improvements in power consumption achieved by MVdd architecture across 11 benchmarks which heterogenous components. novar MVdd Tvar + MVdd (%) improvement in MTTF fir f c _ CRC3 _ a de s are _ f per de s _ ir 1 ir 1 mac Benchamarks cpu 54 _ oc 2 mac 1 de r _ de co s _ r dic cor f c _ Av e Figure 3.5. Improvement in MTTF of LUTs Figure 3.5 depicts the improvement in MTTF of LUTs achieved using our reliability-aware design flow. An average of 21.2%, 41.9% and 65.8% improvement in MTTF of LUTs was observed using our reliability aware placement techniques for novar, MVdd, and Tvar-MVdd respectively. In case of interconnects our relia-

69 0 D bility aware routing techniques achieved an average improvement of 28.2%, 54.7%, and 75.5% was observed for novar, MVdd, and Tvar-MVdd respectively as shown in Figure 3.6. It can be observed that our reliability aware techniques achieve lifetime improvements even in the case of novar novar MVdd Tvar + MVdd (%) improvement in MTTF fir f c _ CRC3 _ a de s are _ f per de s _ ir 1 ir 1 mac Benchmarks cpu 54 _ oc 2 mac 1 de r _ de co s _ r dic cor f c _ Av e Figure 3.6. Improvement in MTTF of Interconnects (due to EM) We also evaluate the delay degradation in routing multiplexers over a period of 5 years. The top 15% of the critical paths were used to estimate the delay degradation. The observed differences in delay degradation was dependent on both the utilization of different routing MUXs and the total number of nodes in the

8 6 4 2 0 D264 cor c 58 Table 3.1. Power Consumption Improvement using MVdd Design Power(MVdd) baseline(w) dynamic(%) leakage(%) cf fir 3 8 8 0.00949 34.2 65.8 CRC22 D264 0.20315 37.8 43.2 des area 0.

70 D264 cor c 58 Table 3.1. Power Consumption Improvement using MVdd Design Power(MVdd) baseline(w) dynamic(%) leakage(%) cf fir CRC22 D des area des perf iir iir mac oc54 cpu mac rs decoder cf cordic Average critical path. As depicted in Figure 3.7 our reliability aware design flow resulted in only 5.6% delay degradation in the presence of voltage and temperature variations. This is a considerable improvement compared to the 11.9% delay degradation observed in case of novar novar Tvar-MVdd (%) Delay Degradation 1 0 f fir c _ CRC3 _ a de s are _ f de s per _ ir ir 1 1m a c Benchmarks cpu 54 _ oc 2m a c 1 der de rs c _ o _ dic f _ Ave Figure 3.7. Improvement in Delay Degradation (due to NBTI and CHC) The following key observations can be inferred from the achieved results. The

71 59 proposed reliability aware techniques were effective in achieving improvements in all the three configurations. The considerable improvements observed in MVdd and TVar-MVdd cases compared to novar demonstrate the importance of considering voltage and temperature variations in enhancing lifetime reliability. Furthermore, we minimize the effective delay degradation of routing multiplexers using reliability aware voltage assignment and routing techniques.

72 Chapter 4 Reliability Analysis Under Process Variations and Aging 4.1 Introduction The reliability modeling techniques presented in Chapter 2 evaluate the changes in device characteristics due to operational stress and aging. It is evident from the modeling techniques that the impact of aging stress due to failures is highly dependent on the device process parameters such as threshold voltage (V th ), oxide thickness (T ox ) etc. Consequently, variations in the device parameters due to manufacturing defects influence the rate of aging due to failure mechanisms. Hence, to accurately model the lifetime of an FPGA, it is essential to model the nature and impact of such variations on the resulting FPGA fabric. Resources in FPGAs are primarily dominated by the interconnect fabric. Vari-

73 61 ations in the interconnect impacting the lifetime degradation due to intrinsic failures, in addition to the timing and leakage yields were considered in our study. The programmable interconnect in FPGAs typically comprise of wire segments of different lengths to provide flexibility in the form of faster connections between distant blocks and vice versa. To establish any connection for a given net, the segments are chosen by the routing algorithm based upon the timing criticality and congestion. To obtain the statistical delay and leakage of any proposed routing, we perform a comprehensive study on the impact of variations in process parameters, for each of the different wire segments. Our observations show a significant difference in the timing and leakage yield characteristics of various segments used in the FPGA. Wire segments are typically connected across X-Y axes using a switch block. Most commonly used switch blocks are the Subset, Universal and Wilton switch blocks. Each such switch blocks have different connection flexibility with universal switch being the most flexible among all. We provide an architectural analysis on the impact of variations on the timing and leakage of the different types of switch blocks. The estimated variations in the timing and leakage yields of the different wire segments and switch blocks are introduced into the routing algorithm to design a Statistically Intelligent Routing Algorithm(SIRA). SIRA is targeted towards optimizing both the timing and leakage power yield of any proposed routing, using the detailed information on the yields of the individual routing resources. We demon-

74 62 strate the importance of considering both leakage and timing yields together, by comparing the proposed algorithm with a purely timing yield optimization based routing scheme. Finally, we analyze the combined impact of process variations and aging phenomenon on the performance degradation of designs. We also augment SIRA to take into account delay degradation due to aging phenomenon to improve the timing yield of the resulting routing. 4.2 Related Work FPGA architectures have been evaluated for manufacturing variations in [47], where the variations impact on leakage power and timing is shown to be close to 3X and 2X respectively. Such an unpredictability is supposed to aggravate with technology scaling due to various stringent manufacturing constraints [48]. The problem of variations has been tackled to some extent at the design automation front by variation aware algorithms, at synthesis, placement and routing phases [49] [50]. Various variation models to precisely capture the systematic and random components of variations have been presented in [51]. Based on such models statistical timing based optimizations and analysis have been looked upon in great details from circuit optimizations [52] to architectural solutions [53]. Unlike ASICs, FPGA devices can be tested at a post-silicon stage for obtaining

75 63 a complete device map using additional testing circuitry as presented in [54]. Variation aware placement techniques based on such post-silicon testing is presented in [23]. The additional overheads of testing along with overhead of device tuning based on body biasing as presented in [54] adds significant cost both in terms of the inherent hardware used and the techniques employed for achieving them. Consequently, statistical algorithms that do not rely on ability to monitor exact variation map have drawn increasing attention. [55] presents a routing algorithm aware of process variations by reducing the congestion in switches and improving the statistical timing yield of the mapped application. However, given that FPGA architectures are typically dominated by the routing resources w.r.t timing, power and area, a comprehensive analysis of timing/leakage yield of routing resources is quite essential before proposing statistically optimal routing algorithms. Statistical routing scheme presented in [55] optimizes the timing yield of the routing switches used, however leakage variation analysis is not being studied in that work. We perform an in depth study of the impact of process variations on various routing structures of an FPGA and provide solutions to the problem. 4.3 Variation Modeling We perform a statistical analysis to obtain the leakage and performance yield of applications when mapped onto FPGAs. The analysis of variations is a two stage

76 64 process. In the first stage, we perform a spice based analysis of different circuit elements present in the FPGA. During such an analysis, we assume a normal distribution for the parametric variations of threshold voltages(v th ), effective channel length(l eff ), and oxide thickness (t ox ). We then perform a Monte Carlo simulation to obtain the delay and leakage spread of the individual circuit elements. We use the Predictive Technology Model(PTM) [56] for 65nm technology simulations. The variations in V th (15%), L eff (10%), and t ox (10%) were assumed to be same for all the transistors within individual circuits under analysis. Followed by the first stage SPICE analysis, we perform a statistical estimation of the yield of different routes in the net-list. Due to the inherent difference in the nature of distributions of leakage power consumption and delay, different strategies are employed for their variation modeling Timing Yield of a Route Statistical Static Timing Analysis(SSTA) techniques have been effective in accurately predicting the critical paths in a circuit that might violate timing constraints of a given design [57]. The individual arrival times of different inputs of a logic element are modeled as Gaussian distributions during SSTA. To obtain the statistically critical input of every logic element under variations, SSTA operations like MAX and MIN are applied to compute the extremum of the arrival times in presence of variations. However, the statistical analysis to estimate the delay of a

77 65 single route in the presence of variations is simpler than that for the entire logic network. A route in a design mapped to an FPGA typically consists of a connection between two pins established by configuring a set of pass-transistor switches. This eliminates the scenario where there is a contention for arrival times and therefore simplifies the problem of estimating statistical route delay to a statistical SUM operation [55]. Figure 4.1 demonstrates the distribution of a 4x1 multiplexer in FPGA interconnect, obtained from the spice analysis. It can be observed from the figure that the delay distribution of the multiplexer follows normal distribution. This trend is observed for all the basic circuit elements. We use the statistical SUM operator to obtain the effective mean and variance of the delay distribution of any give route using the equations 4.1 and 4.2. σ 2 path = i µ path = µ i (4.1) i σ Cov ij (4.2) i j Cov ij = σ i σ j.c ij (4.3) C ij = basecorr D ij (4.4)

78 66 Cov 11 Cov Cov (n n) =... C ij σ i σj Cov n1 Cov n2... Where i,j are iterated on all the components in a given path. C ij is the correlation coefficient that captures the effects of spatial correlation due to systematic variations. C ij is the correlation coefficient which captures the spatial correlation between the components due to systematic variations. This correlation coefficient is used to obtain the covariance, depicted by Cov ij. The correlation matrix is formed using a distance based metric, as depicted in equation 4.4. D ij denotes the Euclidean distance between the components i and j, has a direct impact on the degree of correlation between them. The basecorr factor is the degree of systematic relationship in the distribution of two neighboring components at a unit distance(in our case unit distance is distance between two CLBs). The results are obtained for different values of this basecorr factor in order to capture the varying impact of systematic and random components of the system. Once the delay distribution is obtained using the computed mean and variance, we compute the timing yield of a given route using equation 4.5

79 67 T yield = Tnom+δ PDF (µ,σ) (t), dt (4.5) T nom denotes the nominal delay of a route under normal(no process variation) case, and δ denotes the % cut-off around the nominal. We have chosen the value of δ to be 5% for our study, similar to the cut-off used in [47] Leakage Yield of a Route The leakage power consumption of a design is computed as the sum of active and standby leakage power of different resources as show in equation 4.6. The factor λ defines the degree of leakage power savings obtained using common leakage optimization schemes. Leak chip = Leak active + λ usedresource unusedresource Leak standby (4.6) Consequently, to maximize the chip leakage yield we should analyze both the used an unused resources created in the whole system by a given route. The standby leakage power consumption of unused components, however, is typically

80 68 negligibly low due to the employment of various leakage aware schemes. Hence a low value of λ directs our focus to the active power consumed by the used resources in the FPGA device while implementing a given application. To achieve that, we first obtain the individual distributions for different components similar to the delay analysis using HSPICE. Figure 4.3 demonstrates the Monte-Carlo simulation results for the leakage power consumption of a single wire length segment. It is evident from the figure that leakage variation follows a lognormal distribution. We use the model similar to [47] for estimating the total leakage power consumed by a given set of components under analysis. The leakage power consumption is first formulated as a value exponentially dependent upon device threshold V th, oxide thicknesst ox and effective channel-length(l eff ). Followed by obtaining such curve fitting parameters, we use the summing up of lognormal variables using the equations presented in [47]. The chip mean and variance are finally used to estimate the yield based on a leakage threshold of 10% over the nominal leakage power consumption. A metric to evaluate the chip yield is obtained by us as depicted in equation 4.7, which is computed as the product of the leakage and the timing yield, similar to [47]. Note that although the timing and the leakage yield are by themselves dependent parameters, hence their product does not give the actual chip yield but just a measure to evaluate it.

81 69 Y chip = Y timing Y leak (4.7) 4.4 Routing Architecture Evaluation In this section we provide a detailed analysis of the timing and leakage yield of various circuit components used in a FPGA routing fabric. The models used for process variation are first employed in isolation without correlations to analyze each of the individual circuits. Followed by such a standalone analysis, the distribution are used in the STA based route analyzer discussed in section 4.3 to estimate the yield of different applications mapped onto the FPGA device. The FPGA routing structure primarily comprises of routing multiplexers of different sizes and switch boxes to connect the vertical and horizontal tracks. In our analysis we used multiplexers of sizes, 2, 4, 10, 16, 24 and 32 respectively. Each such multiplexers are used to connect the CLB pins with the routing segments of different lengths. A larger multiplexer is required to connect to a longer segment, primarily due to more options that it provides to connect the inputs. The multiplexers were custom designed with appropriate buffer insertion to optimize the delay of the routing segments. A Monte Carlo analysis of delay and leakage power consumption was performed in HSPICE using PTM device models.

82 70 Figure 4.1 demonstrates the timing spread of the different segments used in FPGA routing fabric for 65nm gate length technology. An important motivating observation in this figure is the fact that larger wire segments have a wider spread of delays, implying a lower timing yield. Such an observation may be attributed to the larger logic depths in the multiplexers driving longer wires. In our experiments, since we have assumed a completely systematic behavior of all the transistors within a wire, increasing the logic depth should ideally not increase the standard deviations of the distributions. However, the increase in the logic depth in this completely correlated system, affects the distribution in the form of a second-order effect. The second order effect is primarily due to the non-linear increase in the rise and fall of each of the stages as explained in [58]. A similar analysis for the leakage power consumption is depicted in figure 4.3. Once again a skewed behavior is observed across different multiplexers. The two main observations from this plot are, firstly the nature of the curve itself and secondly the trend in the standard deviation for different wires. It is clear from the plot that the leakage distribution of the wires show a lognormal behavior, which could be attributed to the exponential dependence of leakage power on the parameters varied. The other interesting trend in the figure is the reduction in deviation with increasing wire lengths. The reason for this behavior is not quite trivial, since unlike the addition of Gaussian variables in case of delays, the addition of lognormal leakage values has an exponential dependence on the total number

83 71 of samples and their actual mean and variances. We also estimated the impact of # of occurences Delay distribution for different FPGA wire segments Double Single HEX Long Normalized Delay Bins Figure 4.1. Delay distribution of different segments in FPGA random variations in our circuit analysis using HSPICE. Figure 4.5 demonstrates how introduction of completely random variations within a wire may impact the leakage yield. The value LONGRAND is obtained by performing a Monte Carlo analysis with randomly varying process parameters during each iteration. It is interesting to observe that the leakage spread improves with increasing random variations, however, it is still lower than the spread of a perfectly correlated leakage distribution of a double wire segment. Another circuit element analyzed for leakage power yield is the switch block of an FPGA. The most common types of switch blocks are the Disjoint, Wilton and

84 72 Figure 4.2. FPGA Routing Architecture Universal Switch Block [59]. Figure 4.2 depicts some of key components of FPGAs routing architecture. Routing multiplexors are used to connect different logic block outputs to channel wire segments, and switch and connection blocks control the routing of signals with-in and across channels respectively. The architecture of each of the switch blocks differs in terms of the number and location of the output pins each input pin of the switch block connects to. Such a difference in architecture of the switch blocks leads to varying leakage power variations. An analysis of the distribution of leakage power consumption of each of the switch boxes is presented in figure 4.6. Apart from the lognormal behavior, it can also be observed that leakage power of a disjoint switch block varies less when compared to wilton or

85 Leakage power distribution of FPGA wire segments DOUBLE HEX LONG SINGLE # of occurences Normalized leakage power consumption Figure 4.3. Leakage distribution of different segments in FPGA Leakage yield of IMUX, OMUX and LUT IMUX LUT OMUX Delay distribution of LUT, OMUX and IMUX OMUX IMUX LUT # of occurences # of occurences Normalized leakage (a) leakage Normal delays (b) delay Figure 4.4. Variation Characteristics of Circuit Elements USB architectures. Once again increase in number of transistors leads to a wider spread of leakage power among the switch blocks. Note that the wilton and USB have almost same distributions due to almost same number of transistors used for providing connectivity which are also sized similarly due to same fan-out properties. Similarily figure 4.4 shows the distribution of variations in delay and leakage

86 74 current for basic circuit elements of a CLB viz. LUT, Input and Ouput Multiplexors. All the above analysis bring about important motivation behind our statistical route optimization. Since a probabilistic delay/power analysis of any path based upon the statistical distributions of the embedded circuitry may provide interesting contrasts while choosing different routes. For example, a long wire may provide faster connections at the cost of possibly a lower timing yield. We computed similar distributions for the delay and leakage power consumption of the CLBs which was required for the statistical timing analysis of the designs presented in section Impact of systematic/random effects on leakage DOUBLE LONGRAND LONG 350 # of occurrances Normalized leakage Figure 4.5. Leakage distribution of LONG wire with completely random and completely correlated variations FPGA

87 75 Leakage power distribution of different switch blocks Disjoint Wilton USB # of occurences Normalized leakage power distribution Figure 4.6. Leakage distribution of different switch block types in FPGA 4.5 Statistically Intelligent Routing Algorithm(SIRA) Using the characterization of different circuit elements presented in section 4.4 we propose a statistically intelligent routing algorithm for FPGAs. Contemporary FPGA design automation tools employ Pathfinder [59] algorithm for establishing the routes between various pins. The algorithm works in a greedy manner optimizing upon the congestion of the total route based on the timing, congestion of any given route. Backtracking is employed in cases where no route could be established due to previous incorrect decisions. The decision at each step evaluates the cost of establishing a route based on a cost estimated as demonstrated in equation 4.8, 4.9. Equation 4.8 shows the original cost computation while equation 4.9 shows the modified cost computation. The cost is typically estimated as the sum of the path chosen so far(backpathcost), current pin cost(currentpincost) and the

88 76 predicted estimated cost(predictedpathcost) due to a particular pin selection. We associate a yield cost factor to each of the costs(shown in equation 4.10, where the yield exp factor is computed determined based upon the weight-age associated with timing as compared to the yield of the system. We observed empirically that a very high yield exp may lead to selection of routes in a yield greedy fashion which may cause timing constraint violations or even resource unavailability. Another factor yield path provides the tunable portion of the algorithm, as demonstrated in equation The parameters W leak and W timing are the weights which may be adjusted based on the designers leakage and timing budgets. In case W leak is assigned to 0, the algorithm becomes purely a Timing yield Aware Routing(TAR) approach. We demonstrate using experiments in section 4.6, how a purely timing yield aware approach significantly deteriorates the leakage yield. Cost = backp athcost + currentp incost +predictedp athcost (4.8) New cost = Y bp backpathcost + Y cp currentpincost + Y pp predictedpathcost (4.9) Y path = (1 yield path ) yield exp (4.10)

89 77 yeild path = Y leak W leak + Y timing W timing (4.11) Y Bp, Y cp and Y pp are the yield of the back path, the current pin and the predicted forward path respectively, where the yield of each of them may be computed as depicted in equation The computation of the new cost is once again adding to the greedy strategy employed in the problem. This methodology therefore does not impact the correctness of the algorithm, but just enhances the quality of the algorithm. During the selection of routing architecture, we use the statistical distributions of leakage and delay of wire segments, and perform a normalized assignment of routing resources at each track. 4.6 Experimental Setup and Results We use the Versatile Place and Route (VPR) tool to model the proposed study and test our new yield optimization algorithm. We use 10 different MCNC benchmarks which are mapped onto the minimal sized devices. The statistical timing and leakage analyzer was integrated into VPR to provide estimates of different routes. We first demonstrate the effectiveness of our Timing yield Aware Routing(TAR) algorithm in improving the timing yield of the system. In this case we assume no costs associated with the leakage power consumed by the routing resources used. The yield is evaluated based on T nom values set as the no PV case, with the

90 78 Figure 4.7. Timing, leakage and total chip yield improvement/degradation using a Timing yield Aware Routing approach (TAR) effective channel length set to 65nm. Figure 4.7 demonstrates the improvement in yield obtained in our algorithm for different benchmarks. As observable from the figure, the timing yield of the designs improve by close to 21%. The second bar in the plot demonstrates an average of 20% reduction in the leakage yield of the resources. An important observation of this study is reflected by this leakage yield comparison of the original and TAR based implementation. As plotted in figure 4.7, the leakage yield of the routing resources significantly drops in presence of a purely timing yield driven approach. An average leakage yield drop in the routing resources of 20% progressively motivates us to the next experiment which includes leakage yield awareness in our algorithm and optimizes the routes for both leakage

91 79 Figure 4.8. Timing yield improvement using TAR measured w.r.t T nom of TAR and timing yield. Finally, the product of leakage and timing yield which is used to evaluate the quality of our results, shows both improvement and degradation for different applications. On an average the total yield shows an increase by 5%. Figure 4.9 shows a marginal reduction in the operating frequency by 4% using TAR approach. This is because of sub-optimal timing decisions made by the algorithm while finding a route between different cells. To demonstrate the effectiveness of TAR we measured the timing yield of the system under at the same frequency threshold. Figure 4.8 demonstrates the improvement in timing yield using TAR when measured at same frequency(higher of the two approaches for each application). As observable from the figure we still obtain an average improvement

92 80 Figure 4.9. Frequency degradation using TAR of close to 16% for the benchmarks. Followed by the timing analysis we tested our combined leakage and timing driven approach to analyze the average chip yield with respect to both leakage and timing metric. We assigned equal weights to the timing and leakage yield components used in equation 4.7. Figure 4.10 demonstrates the percentage improvement in the timing, leakage and the total yield of the routing resources, using our final algorithm. We observe and average improvement of 19% in the chip total yield considering the leakage and timing yields together. The individual timing and leakage yields improved in all the benchmarks and on an average improved by 11% and 9%, respectively. The achieved results and modeling techniques have

81 Figure 4.10. Total chip yield improvement using SIRA been presented in detail in [22]. 4.7 Impact of Process Variation on Aging In section 2.3.

93 81 Figure Total chip yield improvement using SIRA been presented in detail in [22]. 4.7 Impact of Process Variation on Aging In section we studied the modeling of threshold voltage (V th ) shift in PMOS transistors due to NBTI. Such a model assumes an initial value for key device parameters such as threshold voltage, gate oxide thickness etc. However, the initial value of a V th suffers from variations introduced in such key device parameters during fabrication. In this section, we analyze the combined impact of aging and variations on the timing of a design mapped to an FPGA. We also augment SIRA to model the combined impact of NBTI and process variations to achieve yield

Thermal Characterization and Optimization in Platform FPGAs

Thermal Characterization and Optimization in Platform FPGAs Priya Sundararajan, Aman Gayasen, N. Vijaykrishnan, T. Tuan {psundara,gayasen,vijay}@cse.psu.edu, tim.tuan@xilinx.com ABSTRACT Increasing power