Performance-Driven Dual-Rail Routing Architecture for Structured ASIC Design Style Fu-Wei Chen and Yi-Yu Liu, Member, IEEE

Similar documents
A Dual-V DD Low Power FPGA Architecture

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting

Worst Case RLC Noise with Timing Window Constraints

Power Modeling and Characteristics of Field Programmable Gate Arrays

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques

Low-Power Technology Mapping for FPGA Architectures with Dual Supply Voltages

Oscillation Ring Test Using Modified State Register Cell For Synchronous Sequential Circuit

UNEXPECTED through-silicon-via (TSV) defects may occur

Lecture #2 Solving the Interconnect Problems in VLSI

Alexander Danilin, Martijn Bennebroek, and Sergei Sawitzki. A Novel Routing Architecture for Field-Programmable Gate-Arrays

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

Blockage and Voltage Island-Aware Dual-VDD Buffered Tree Construction

Analysis and design of a low voltage low power lector inverter based double tail comparator

COFFE: Fully-Automated Transistor Sizing for FPGAs

A Novel Low-Power Scan Design Technique Using Supply Gating

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

Leakage Power Minimization in Deep-Submicron CMOS circuits

Pulse propagation for the detection of small delay defects

RECENT technology trends have lead to an increase in

Optimization and Modeling of FPGA Circuitry in Advanced Process Technology. Charles Chiasson

Optimization of Power Dissipation and Skew Sensitivity in Clock Buffer Synthesis

FIELD-PROGRAMMABLE gate array (FPGA) chips

Zhan Chen and Israel Koren. University of Massachusetts, Amherst, MA 01003, USA. Abstract

A Novel Encoding Scheme for Cross-Talk Effect Minimization Using Error Detecting and Correcting Codes

Fast Placement Optimization of Power Supply Pads

CROSS-COUPLING capacitance and inductance have. Performance Optimization of Critical Nets Through Active Shielding

AS very large-scale integration (VLSI) circuits continue to

A New Enhanced SPFD Rewiring Algorithm

RESISTOR-STRING digital-to analog converters (DACs)

CS 6135 VLSI Physical Design Automation Fall 2003

A Scan Shifting Method based on Clock Gating of Multiple Groups for Low Power Scan Testing

FPGA Device and Architecture Evaluation Considering Process Variations

THE GROWTH of the portable electronics industry has

Methodologies for Tolerating Cell and Interconnect Faults in FPGAs

Managing Cross-talk Noise

Wire Width Planning for Interconnect Performance Optimization

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip Buses

Acknowledgement. I would like to express my gratitude to my advisor, Professor Benton H. Calhoun for his useful comments,

Impact of etch factor on characteristic impedance, crosstalk and board density

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

Signal Integrity Management in an SoC Physical Design Flow

MDSI: Signal Integrity Interconnect Fault Modeling and Testing for SoCs

Low-Power Digital CMOS Design: A Survey

ICCAD 2014 Contest Incremental Timing-driven Placement: Timing Modeling and File Formats v1.1 April 14 th, 2014

Novel Devices and Circuits for Computing

Analysis of Laddering Wave in Double Layer Serpentine Delay Line

Logic Rewiring for Delay and Power Minimization *

TRENDS in technology scaling make leakage power an

Testability Synthesis for Jumping Carry Adders

White Paper Stratix III Programmable Power

Timing analysis can be done right after synthesis. But it can only be accurately done when layout is available

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers

Special Section Short Papers

A Lithography-friendly Structured ASIC Design Approach

Microcircuit Electrical Issues

AUTOMATIC IMPLEMENTATION OF FIR FILTERS ON FIELD PROGRAMMABLE GATE ARRAYS

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

Single Event Transient Effects on Microsemi ProASIC Flash-based FPGAs: analysis and possible solutions

PROCESS and environment parameter variations in scaled

A10-Gb/slow-power adaptive continuous-time linear equalizer using asynchronous under-sampling histogram

Design and simulation of a QCA 2 to 1 multiplexer

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Sensing Voltage Transients Using Built-in Voltage Sensor

LSI Design Flow Development for Advanced Technology

An Efficient Model for Frequency-Dependent On-Chip Inductance

Perturb+Mutate: Semisynthetic Circuit Generation for Incremental Placement and Routing

Leakage Power Modeling and Reduction Techniques for Field Programmable Gate Arrays

PROGRAMMABLE ASIC INTERCONNECT

Towards PVT-Tolerant Glitch-Free Operation in FPGAs

UNIT-III POWER ESTIMATION AND ANALYSIS

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Latch-Based Performance Optimization for Field-Programmable Gate Arrays

PHASE-LOCKED loops (PLLs) are widely used in many

Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design

IN RECENT years, the phase-locked loop (PLL) has been a

Accurate In Situ Measurement of Peak Noise and Delay Change Induced by Interconnect Coupling

/$ IEEE

A Case Study of Nanoscale FPGA Programmable Switches with Low Power

Efficient logic architectures for CMOL nanoelectronic circuits

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

An Energy-Efficient Near/Sub-Threshold FPGA Interconnect Architecture Using Dynamic Voltage Scaling and Power-Gating

Nanowire-Based Programmable Architectures

An Area Efficient Decomposed Approximate Multiplier for DCT Applications

Signature Anaysis For Small Delay Defect Detection Delay Measurement Techniques

Engr354: Digital Logic Circuits

Course Outcome of M.Tech (VLSI Design)

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

Test Automation - Automatic Test Generation Technology and Its Applications

Design of Low Power High Speed Fully Dynamic CMOS Latched Comparator

User2User The 2007 Mentor Graphics International User Conference

Variable-Segment & Variable-Driver Parallel Regeneration Techniques for RLC VLSI Interconnects

Capturing Crosstalk-Induced Waveform for Accurate Static Timing Analysis

A Review of Clock Gating Techniques in Low Power Applications

Dual-Threshold Voltage Assignment with Transistor Sizing for Low Power CMOS Circuits

INF3430 Clock and Synchronization

SHOULD FPGAS ABANDON THE PASS-GATE? Charles Chiasson and Vaughn Betz

Delay-Locked Loop Using 4 Cell Delay Line with Extended Inverters

THERE is a growing need for high-performance and. Static Leakage Reduction Through Simultaneous V t /T ox and State Assignment

CMOS VLSI IC Design. A decent understanding of all tasks required to design and fabricate a chip takes years of experience

Transcription:

2046 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010 Performance-Driven Dual-Rail Routing Architecture for Structured ASIC Design Style Fu-Wei Chen and Yi-Yu Liu, Member, IEEE Abstract In recent years, structured application-specific integrated circuit (ASIC) design style has lessened the importance of mask cost. Multiple structured ASIC chip designs share the same pre-fabricated device and wire masks. Nevertheless, the interconnection delay in a pre-fabricated wire slows down circuit performance as a result of high capacitive load. We propose a dual-rail routing architecture that reduces wire delay by 10% to 15% compared to the original routing architecture. Furthermore, we propose a dual-rail insertion algorithm to reduce routing area overhead. The experimental results demonstrate that our dual-rail technique reduces wire delay by 9.8% with 4.8% routing area overhead and improves overall circuit performance by 7.0%. Index Terms Crossbar switch, crosstalk, dual-rail, routing, structured application-specific integrated circuit (ASIC). I. Introduction With the advances of very-large-scale integration (VLSI) manufacturing technology, lithographic difficulties have dramatically increased mask cost. The one-time-use mask cost is no longer affordable for small and medium volume application-specific integrated circuit (ASIC) designs. This results in a higher threshold for conventional standard cell design style. To reduce mask cost, structured ASIC is proposed to share multiple masks among different designs [1], [2]. In this paper, we are going to discuss interconnection issues of structured ASIC. For pre-fabricated metal layers, we propose a uniform dual-rail routing architecture as a wire tapering alternative to improve circuit performance. Furthermore, we propose a dual-rail insertion algorithm to reduce routing area overhead caused by extra wire segments. The rest of this paper is organized as follows. The background knowledge of structured ASIC and performance-driven routing architectures is given in Section II. Section III illustrates our SPICE simulations on different routing architectures. The dual-rail insertion technique and algorithm for structured ASIC design style are proposed in Section IV. Section V reports the experimental results. Section VI concludes this paper with future research directions. II. Preliminary A. Structured ASIC Design Style Structured ASIC is proposed to balance the problems of mask cost and circuit performance between the field pro- Manuscript received December 3, 2008; revised November 30, 2009 and April 20, 2010; accepted July 1, 2010. Date of current version November 19, 2010. This work was supported in part by the National Science Council of Taiwan, under Grants NSC-96-2221-E-155-070 and NSC-97-2221-E-155-071-MY2. This paper was recommended by Associate Editor L. Scheffer. F.-W. Chen is with the Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan (e-mail: d9762812@oz.nthu.edu.tw). Y.-Y. Liu is with the Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan (e-mail: yyliu@saturn.yzu.edu.tw). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2010.2063111 0278-0070/$26.00 c 2010 IEEE grammable gate array (FPGA) and standard cell. In contrast to the FPGA design style, the structured ASIC design style replaces both SRAMs and transmission-gates by maskprogrammable vias. Therefore, a structured ASIC trades field programmability for circuit performance. Several previous works utilize island-style pre-fabrication in their structured ASIC design [1] [3]. The island-style pre-fabrication is one of the well-known chip-level pre-fabricated design styles [4], [5]. By properly configuring mask-programmable vias, various VLSI circuits can be implemented in island-style prefabrications. The structured ASIC design style is also known as via patterned gate array. In order to understand resource utilization of pre-fabricated design styles, we conduct a profiling experiment. First, benchmark circuits from MCNC are optimized, mapped, and packed by using SIS [6], FlowMap [7], and T-Vpack [4], respectively. After that, we perform placement and routing by using VPR [4]. Table I shows the results. In Table I, Columns CLB and Net are numbers of CLBs and nets, respectively. Columns U CLB and U Channel are utilizations of CLB and routing channel, respectively. From Table I, we can see that the channel utilization is relatively low compared to the CLB utilization. Many unused routing tracks are wasted in a prefabricated design style. B. Performance-Driven Routing Architectures Conventionally, non-uniform wire tapering architecture is used to reduce interconnection delay. Since there are many unused tracks in a pre-fabricated design style, we propose a uniform dual-rail routing architecture for wire delay optimization. Our dual-rail merges two adjacent wires into one signal propagation channel. According to the anti-miller effect, coupling capacitance between the merged wires can be greatly reduced [8]. Additionally, wire resistance of this new routing channel is reduced by two. Hence, wire delay can be reduced by using our dual-rail routing architecture. The schematics of tapering and dual-rail routing architectures are drawn in Fig. 1(a) and (b), respectively. III. Simulations A. Simulation Setup The SPICE netlists of tapering and dual-rail routing architectures are drawn in Fig. 2(a) and (b), respectively. In Fig. 2, R, C c, C g, L, and M represent wire resistance, coupling capacitance, ground capacitance, self inductance, and mutual inductance, respectively. The suffixes, w and n, indicate thick (wide) wire and thin (narrow) wire, respectively. For simplicity, we use victim and aggressor to indicate the middle wire and the upper/lower wire, respectively. Our simulations target on 65 nm process technology. The wire segment length is 25 µm for each distributed π-model. The source isa16x-inverter and the sink isa2x-inverter. All technology parameters are adopted from public documents, predictive technology model [9] and international technology roadmap for semiconductors 2007 [10]. Table II lists all parameters used in our simulations.

CHEN AND LIU: PERFORMANCE-DRIVEN DUAL-RAIL ROUTING ARCHITECTURE FOR STRUCTURED ASIC DESIGN STYLE 2047 TABLE I Resource Utilization of Pre-Fabricated Design Style Circuit CLB Net U CLB (%) U Channel (%) apex4 1262 1271 97.8 65.6 bigkey 1707 1936 80.0 47.9 C6288 2416 2448 97.4 66.1 des 1591 1847 76.1 38.7 diffeq 1497 1561 93.6 67.8 dsip 1370 1599 76.3 36.8 elliptic 3604 3735 93.6 53.1 ex1010 4598 4608 99.6 64.4 ex5p 1064 1072 93.7 65.1 frisc 3556 3576 96.3 63.9 misex3 1397 1411 98.0 67.3 s298 1931 1935 99.5 76.0 seq 1750 1791 95.8 66.1 spla 3299 3315 98.2 58.5 tseng 1047 1099 85.7 61.0 average 92.1 59.9 Fig. 1. Schematics of tapering and dual-rail routing architectures. (a) Tapering routing architecture. (b) Dual-rail routing architecture. Fig. 2. SPICE netlists of the tapering and dual-rail routing architectures. (a) Tapering routing architecture. (b) Dual-rail routing architecture. B. Property Simulations We first observe basic properties of different routing architectures in terms of delay, noise, and power. The total wire length is set to 1000 µm. We incrementally insert one-tenth total wire length (i.e., 100 µm) of tapering wire and dualrail wire from source and then record the simulation results. Fig. 3(a) shows the worst case signal propagation delay. The worst case condition occurs when victim signal transits in one direction and both aggressor signals transit in the opposite direction. The resulting crosstalk noise is drawn in Fig. 3(b). The crosstalk noise is the maximum voltage fluctuation of a stable victim when both aggressor signals transit to the same direction. Fig. 3(c) draws power consumptions of both the worst and the best propagation delay conditions. TABLE II Technology Parameters Type Thin Wire Thick Wire Wire width (µm) 0.1 0.14 Wire spacing (µm) 0.1 0.14 Wire thickness (µm) 0.2 0.35 Wire cross-section area (µm 2 ) 0.02 0.049 Dielectric constant (k) 2.8 2.8 Wire resistance ( ) 27.499 11.224 Self inductance (nh) 0.028 0.025 Mutual inductance (nh) 0.022 0.02 Ground capacitance (ff) 0.617875 0.855525 Coupling capacitance (ff) 1.7081 1.834275 Total capacitance (ff) 4.65195 5.3796 According to the simulation results, dual-rail routing architecture outperforms tapering and original routing architectures in delay and noise. For power consumption, dual-rail routing architecture is better than tapering routing architecture and similar to original routing architecture. Furthermore, we notice that the wire resistance of a dual-rail wire is even 22.5% greater than that of a tapering wire since the cross-section area of a tapering wire is 0.049 =1.225 times larger than that of 0.02 2 a dual-rail wire in our simulation. Hence, the aforementioned anti-miller effect in dual-rail routing architecture counteracts the increased wire resistance penalty compared to the tapering routing architecture. C. Scalability Simulations We conduct simulations for five conditions with total wire lengths from 500 µm to 2500 µm. For each of the five conditions, we incrementally insert one-tenth total wire length of tapering wire and dual-rail wire from source and then perform the worst case simulation. After the best wire ratio of signal propagation delay is found, we record crosstalk noise and power consumption accordingly. Fig. 4(a) shows the best signal propagation delay of different total wire lengths in the worst case condition. Fig. 4(b) and (c) is the results of crosstalk noise and power consumption, respectively. The simulation results indicate that dual-rail routing architecture outperforms tapering routing architecture in terms of crosstalk noise and power consumption. For total wire length below 2000 µm, both tapering and dual-rail routing architectures obtain similar signal propagation delay. Therefore, dual-rail is a promising routing architecture for structured ASIC design style with 10% to 15% wire delay improvement compared to the original routing architecture. D. Resistance and Capacitance Characterizations In order to efficiently apply the dual-rail technique in wire routing, we need accurate resistance and capacitance (RC) values for dual-rail routing architecture. The RC values can be used for fast timing analysis with the Elmore delay model. The dual-rail wire resistance is reduced by two since there are two identical wire segments. For dual-rail wire capacitance, it is difficult to obtain a closed analytical formula. Hence, we perform effective capacitance fitting for both original and dualrail wires. We first remove coupling capacitance in our SPICE netlists to prevent interference from crosstalk effect. After that, we fine tune the effective values of ground capacitance for both original wire and dual-rail wire in order to obtain similar worst

2048 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010 Fig. 3. Property simulation results in terms of (a) signal propagation delay, (b) crosstalk noise, and (c) power consumption. Fig. 4. Scalability simulation results in terms of (a) signal propagation delay, (b) crosstalk noise, and (c) power consumption. case simulation results within 1 ps delay deviation. Finally, the capacitance factor is the ratio of the effective capacitance in dual-rail wire to that in original wire. Fig. 5 summarizes our fitting results with 25 µm unit wire length. The fitting results indicate that the effective capacitance of dual-rail wire is about 16.8% larger than that of the original wire. Given an n-segment transmission line, assume that the driver resistance, load capacitance, unit wire resistance, and unit wire capacitance are R d, C L, r w, and c w, respectively. According to the Elmore delay model, the signal propagation delay of the transmission line can be derived in Delay = R d C L + nr d c w + nr w C L + 1 2 n2 r w c w. (1) In (1), the first term is a constant in both original and dualrail routing architectures. The second term will be increased by 16.8% in dual-rail routing architecture. The last two terms will be decreased by 50% and 1 0.5 1.168=41.6% in dual-rail routing architecture, respectively. Therefore, our dualrail routing architecture can be utilized for interconnection performance improvement unless the second term dominates the last two terms. Fig. 5. Effective wire capacitance characterization per 25 µm. IV. Dual-Rail for Structured ASIC Design Style A. Dual-Rail Insertion Technique To enable dual-rail routing, we need to insert a dual-rail wire segment next to the original wire segment. After that, we need to modify corresponding connection-blocks and switchblocks to complete dual-rail routing. For layout regularity, we take crossbar switch-block as an example to illustrate dual-rail insertion for structured ASIC design style [1]. Fig. 6. Dual-rail insertion in connection-block and switch-block. Fig. 6 draws the schematic of dual-rail insertion within a connection-block and a switch-block. By enabling dualrail insertion capabilities within both connection-blocks and switch-blocks, we are capable of reducing wire delay at the cost of routing area overhead. Therefore, we need to effectively select a small set of wire segments for dual-rail insertion.

CHEN AND LIU: PERFORMANCE-DRIVEN DUAL-RAIL ROUTING ARCHITECTURE FOR STRUCTURED ASIC DESIGN STYLE 2049 B. Dual-Rail Insertion Algorithm The dual-rail insertion algorithm is performed after timingdriven routing by using original wires. Once the initial routing is completed, we use the Elmore delay model to compute wire delay. Then, we use Steiner points to decompose all nets into several 2-pin wire segments. After that, static timing analysis is performed to compute the criticality of each wire segment. Wire segments with small slack are candidates for dual-rail insertion. We define some notations before illustrating the cost function of our dual-rail insertion algorithm. For simplicity, we set the required time of a circuit to the maximum arrival time (i.e., circuit delay). The wire slack is the required time minus the arrival time. The maximum slack is the maximum wire slack among all wires. Definition 1: The wire criticality of wire i can be formulated as wire slack i Crit(i) 1 maximum slack. Definition 2: The delay significance of wire i can be formulated as ( ) delay of wire i Sig(i) min W circuit delay, 1 where W is a user-specified constant to weight each wire according to the wire delay. In this paper, we set W = 15%. Definition 3: The channel congestion of wire i in routing channel x can be formulated as ci x number of routed wires. width of the channel Definition 4: The wire congestion of a wire i can be formulated as Cong(i) MAX(ci x ), routing channel x of wire i. Fig. 7. Dual-rail insertion algorithm. According to the simulation results in Section III, the dualrail technique achieves 10% to 15% wire delay reduction. Hence, we select wires with slack less than and equal to 15% of circuit delay as candidate wires for dual-rail insertion. After that, we use the following heuristic to decide whether a dualrail wire could be inserted or not for routing area overhead reduction: Crit(i) Sig(i) Cong(i) k (2) where k is a user-specified constant. The dual-rail wire is inserted if the condition of (2) satisfies. In (2), the product of wire criticality and delay significance implies the potential gain from dual-rail insertion. The wire congestion minus k implies the tolerance of routing area overhead. Hence, dualrail wire can be inserted only if both the wire delay is large and the wire is on a critical or near-critical path under a given congestion constraint. Finally, a safe removal procedure is performed to reduce routing area overhead without increasing circuit delay. We iteratively remove dual-rail wire segments with slack larger than the delay improvement done by dual-rail insertion. The algorithm of dual-rail insertion is summarized in Fig. 7. Fig. 8 draws an example to illustrate our dual-rail insertion algorithm. A netlist with wire criticality, delay significance, Fig. 8. Example of dual-rail insertion. (a) A netlist example. (b) Dual-rail insertion result. and wire congestion is drawn in Fig. 8(a). Assume k =0.1, edges (A1, A25), (B1, G27), (G27, A25), (B25, PO2) satisfy (2). Hence, we insert dual-rail wire segments in order to reduce wire delay without incurring too much routing area overhead in Fig. 8(b). V. Experimental Results We use a FPGA physical design automation environment, VPR [4], as our simulation platform. Both the crossbar switch-block and dual-rail insertion algorithm proposed in Section IV are implemented in VPR and denoted as VPDR. In our experiments, all benchmark circuits are optimized by SIS using rugged script [6]. Then, we use FlowMap [7] to perform technology mapping on the optimized circuits. All the circuits are mapped into 4-input LUTs and flip-flops. After that, we use T-Vpack to pack 4-LUTs and flip-flops into CLBs. Finally, we perform placement and dual-rail routing by our VPDR.

2050 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010 TABLE III Results of Dual-Rail Insertion Original VPR+D VPDR Circuit w h w v d logic d wire d total Ow h Ow v Id wire Id total Ow h Ow v Id wire Id total apex4 37 37 3.28E-09 1.03E-08 1.36E-08 29.73% 10.81% 13.45% 10.21% 2.70% 2.70% 1.49% 1.13% bigkey 37 37 1.94E-09 1.46E-08 1.66E-08 100.00% 32.43% 13.12% 11.59% 8.11% 5.41% 4.08% 3.61% C6288 34 34 6.72E-08 1.42E-08 8.14E-08 23.53% 5.88% 20.44% 3.58% 2.94% 2.94% 2.34% 0.41% des 30 30 2.73E-09 2.03E-08 2.30E-08 76.67% 26.67% 25.33% 22.32% 3.33% 3.33% 22.32% 19.67% diffeq 37 37 7.88E-09 4.95E-09 1.28E-08 8.11% 5.41% 9.75% 3.77% 2.70% 2.70% 10.17% 3.93% dsip 26 26 1.94E-09 1.24E-08 1.43E-08 92.31% 30.77% 12.75% 11.02% 3.85% 3.85% 11.47% 9.92% elliptic 31 31 1.94E-09 1.81E-08 2.01E-08 80.65% 22.58% 8.85% 8.00% 9.68% 6.45% 17.27% 15.61% ex1010 46 46 8.19E-09 5.78E-09 1.40E-08 10.87% 10.87% 8.42% 3.49% 4.35% 2.17% 8.04% 3.33% ex5p 31 31 3.82E-09 1.11E-08 1.49E-08 22.58% 16.13% 19.13% 10.56% 3.23% 3.23% 9.34% 3.28% frisc 33 33 6.24E-09 2.06E-08 2.68E-08 90.91% 45.45% 15.19% 11.65% 6.06% 6.06% 14.66% 11.25% misex3 29 29 3.28E-09 3.28E-08 3.61E-08 44.83% 37.93% 19.71% 17.92% 3.45% 3.45% 19.70% 17.91% s298 29 29 8.49E-09 1.86E-08 2.71E-08 100.00% 37.93% 8.77% 4.01% 6.90% 3.45% 6.02% 2.12% seq 32 32 3.28E-09 1.82E-08 2.15E-08 34.38% 21.88% 16.85% 14.28% 3.13% 6.25% 3.29% 2.79% spla 34 34 8.19E-09 1.83E-08 2.65E-08 14.71% 17.65% 11.77% 10.19% 5.88% 5.88% 14.35% 9.91% tseng 17 17 7.88E-09 2.19E-09 1.01E-08 76.47% 29.41% 2.86% 0.62% 5.88% 5.88% 1.97% 0.43% Average 53.72% 23.45% 13.76% 9.55% 4.81% 4.25% 9.77% 7.02% In Table III, we compare performance improvement and area overhead of different approaches. Column Original represents the placement and routing results from VPR. We use the results of Original as a baseline. Column VPR+D represents the results of dual-rail insertion for wires with slack less than and equal to 15% of circuit delay. That is, we perform dual-rail insertion for all candidate wires described in Fig. 7. Column VPDR represents the results of our proposed algorithm, where we use k = 0.15 in our experiment. Columns w h and w v are channel widths in horizontal and vertical channels, respectively. Columns d logic, d wire, and d total are logic delay, wire delay, and total delay, respectively. Columns Ow h, Ow v, Id wire, and Id total are the horizontal and vertical routing area overhead, wire delay, and circuit delay improvement compared to the baseline, respectively. From Table III, dual-rail routing architecture reduces wire delay by 9.8% with 4.8% routing area overhead and improves circuit performance by 7.0%. In Table IV, we report the number of dual-rail 2-pin wires. Columns 2-pin, Cand, Dual, and Removal represent numbers of all 2-pin wires, candidate wires, inserted dual-rail wires, and dual-rail wires after safe removal, respectively. Columns R c, R d, and R r are ratios of candidate wires, inserted dual-rail wires, and dual-rail wires after safe removal to total number of 2-pin wires. From Table IV, we can see that our proposed algorithm effectively selects a small portion of wire segments for dual-rail insertion. VI. Conclusion and Future Work We have proposed a dual-rail routing architecture to improve circuit performance of structured ASIC design style. By exploiting the anti-miller effect between two adjacent wires with the same signal source, this dual-rail routing architecture achieves 10% to 15% wire delay reduction. Additionally, we present a technique to insert dual-rail wires for structured ASIC design style. Furthermore, we propose an effective dualrail insertion algorithm to reduce routing area overhead. The experimental results demonstrate that our proposed algorithm improves overall circuit performance by 7.0% with about 4.8% routing area overhead. There are some practical issues that can be further addressed. The concept of dual-rail insertion can be further TABLE IV Number of Dual-Rail 2-Pin Wires Circuit 2-pin Cand R c Dual R d Removal R r apex4 5757 165 2.87% 55 0.96% 44 0.76% bigkey 8473 822 9.70% 467 5.51% 459 5.42% C6288 7280 2878 39.53% 45 0.62% 41 0.56% des 7959 932 11.71% 172 2.16% 157 1.97% diffeq 7234 53 0.73% 30 0.41% 28 0.39% dsip 7468 122 1.63% 87 1.16% 83 1.11% elliptic 17 491 269 1.54% 193 1.10% 176 1.01% ex1010 20 686 197 0.95% 129 0.62% 127 0.61% ex5p 5074 135 2.66% 75 1.48% 73 1.44% frisc 17 234 854 4.96% 381 2.21% 380 2.20% misex3 6397 1187 18.56% 36 0.56% 33 0.52% s298 8894 946 10.64% 112 1.26% 104 1.17% seq 7984 481 6.02% 92 1.15% 89 1.11% spla 11 475 447 3.90% 67 0.58% 56 0.49% tseng 5244 227 4.33% 20 0.38% 19 0.36% Average 7.98% 1.35% 1.28% integrated into initial global routing in order to reserve enough routing tracks for important and critical wires. Moreover, it is worthwhile to use dual-rail as a new routing alternative in different metal layers for ASIC routing resource management. References [1] C. Patel, A. Cozzie, H. Schmit, and L. Pileggi, An architectural exploration of via patterned gate arrays, in Proc. Int. Symp. Phys. Des., Apr. 2003, pp. 184 189. [2] Y. Ran and M. Marek-Sadowska, Via-configurable routing architectures and fast design mappability estimation for regular fabrics, in Proc. Int. Conf. Comput.-Aided Des., May 2005, pp. 25 32. [3] T. Zhang and S. S. Sapatnekar, Buffering global interconnects in structured ASIC design, in Proc. Conf. Asia South Pacific Des. Autom., Jan. 2005, pp. 23 26. [4] V. Betz, J. Rose, and A. Marquardt, Architecture and CAD for Deep- Submicron FPGAs. Boston, MA: Kluwer, 1999. [5] D. Chen, J. Cong, and P. Pan, FPGA Design Automation. Delft, The Netherlands: Now Publishers, 2006. [6] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. Sangiovanni- Vincentelli, SIS: A system for sequential circuit synthesis, Electron. Res. Lab., Univ. California, Berkeley, Mem. UCB/ERL M92/41, May 4, 1992. [7] J. Cong and Y. Ding, Flowmap: An optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 13, no. 1, pp. 1 12, Jan. 1994. [8] A. B. Kahng, S. Muddu, and E. Sarto, On switch factor based analysis of coupled RC interconnects, in Proc. Des. Autom. Conf., Jun. 2000, pp. 79 84.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010 2051 [9] The Predictive Technology Model. (2007) [Online]. Available: http://www.eas.asu.edu/ ptm [10] The International Technology Roadmap for Semiconductors. (2007) [Online]. Available: http://www.itrs.net/links/2007itrs/home2007. htm DiSC: A New Diagnosis Method for Multiple Scan Chain Failures Sunghoon Chun and Alex Orailoglu Abstract In scan-based testing environments, identifying the scan chain failures can be of significant help in guiding the failure analysis process for yield improvement. In this paper, we propose an efficient scan chain diagnosis method using a symbolic fault simulation to achieve high diagnostic resolution and small candidate list for single and multiple defects in scan chains. The main ideas of the proposed scan chain diagnosis method are twofold: 1) the reduction of the candidate scan cells through the analysis of the symbolic simulation responses, and 2) the identification of final candidate scan cells using the backward tracing method with the symbolic simulation responses. Experimental results show the effectiveness. Index Terms Diagnosis, fault simulation, scan-based test. I. Introduction Functional scan chains are critical to the testing and diagnosis of the logic portions of the circuit. Hence, it is necessary to identify the fault locations in faulty scan chains. Scan chain diagnosis is the first step in the fault isolation process. Increasingly, more attention is being paid to it as it helps to accelerate product yield improvement, and to guarantee scanbased logic diagnosis. Previous scan chain fault diagnosis methodologies are classified into two categories: 1) hardware-based scan chain diagnosis [1], [2], and 2) software-based scan chain diagnosis [3] [10]. Hardware-based scan chain diagnosis [1], [2] may not be acceptable in practice because of the extra hardware overhead required, performance penalties, and the occurrence of the possibility of faults caused by the additional circuits. Software-based scan chain diagnosis techniques [3] [10] are algorithmic approaches used to identify failing scan cells. Since these techniques do not require any modification of the original scan chain design, they are more popular than the hardware-based approaches. However, previous softwarebased scan chain diagnosis methods [3] [10] suffer from large candidate lists and low diagnostic resolution (DR). Some methods [3] [6] cannot handle the diagnosis problem if multiple faults exist in the same scan chain. Although several Manuscript received September 28, 2009; revised January 29, 2010, April 18, 2010, and June 28, 2010; accepted July 1, 2010. Date of current version November 19, 2010. This work was supported in part by the Korea Research Foundation, under Grant KRF-2008-357-D00194, funded by the Korean Government. This paper was recommended by Associate Editor F. Lombardi. S. Chun is with the Flash Solution Development Team, Samsung Electronics, Hwasung City, Gyeonggi-Do 445-701, Korea (e-mail: sh77.chun@samsung.com). A. Orailoglu is with the Department of Computer Science Engineering, University of California at San Diego, La Jolla, CA 534-2230 USA (e-mail: alex@cs.ucsd.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2010.2061110 0278-0070/$26.00 c 2010 IEEE research efforts can be found in the literature dealing with this problem [7], [8], they are time consuming and cannot achieve high DR for multiple-fault scan cells. In this paper, we propose a new scan cell fault simulationbased scan chain diagnosis method called diagnosis for scan chain failures (DiSC) to address the scan chain DR problem for a single as well as for multiple faults. The main idea of DiSC is to propagate a symbol representing the fault effect of the faulty candidate scan cell through a symbolic scan cell fault simulation, and to identify the exact defect candidates by analyzing the information of the symbolic fault simulation. The major contributions in this paper are the following: 1) the reduction of the candidate scan cells through the analysis of the symbolic simulation responses, and 2) the identification of final candidate scan cells using the backward tracing method with the symbolic simulation responses. II. Preliminaries A. Definitions We provide the following definitions to simplify the explanation of the proposed diagnosis method. Definition 5 (Simulation Response): The simulation response of a scan chain is the value combination of the flipflops after a logic simulation using a given pattern. Definition 6 (Observed Response): The observed response of a scan chain is the scanned out version of a simulation response. In a fault-free circuit, it is equivalent to the simulation response. Definition 7 (Faillog Response): The faillog response of a scan chain is the unloaded response of a faulty scan chain in a chip. Definition 8 (Capture Image): The capture image of a scan chain is the value combination of the flip-flops after a capture procedure in the circuit under test (CUT). However, the capture image of a scan chain in a failing chip is not actually available because we do not know the real defect location in the scan chain. Definition 9 (Scan Cell Fault (SF) State Value): To consider the effect of faults and for distinguishing the location of the faulty scan cell, we define a new symbolic logic state value SF i, which is the location of a fault candidate scan cell, i.e., the ith scan cell in the faulty scan chain. Definition 10 (SF Response): The SF response of a scan chain is the value combination of the simulated outputs after the proposed symbolic simulation. Definition 11 (Propagated Fault Effects): The effects of the scan cell faults can be propagated to the faulty scan chain as well as to the good scan chains. The locations of these effects are defined as the propagated fault effects (PFEs). Definition 12 (Single Propagated SF): In SF response, if only one SF i value, not the combination value of two or more SF values, is propagated to a good scan chain, this value is defined as a location of a single propagated SF value (SPSF); the state value of the SPSF is called the SFSF value. Definition 13: [Multiple Propagated SF Value] In an SF response, a location of a multiple propagated SF value (MPSF)