Keywords: Speed binning, delay measurement hardware, process variation.

Similar documents
IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

High Speed ADC Sampling Transients

ECE315 / ECE515 Lecture 5 Date:

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

Uncertainty in measurements of power and energy on power networks

antenna antenna (4.139)

Figure 1. DC-DC Boost Converter

Control of Chaos in Positive Output Luo Converter by means of Time Delay Feedback

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

Figure 1. DC-DC Boost Converter

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

Graph Method for Solving Switched Capacitors Circuits

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

High Speed, Low Power And Area Efficient Carry-Select Adder

Digital Transmission

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

MASTER TIMING AND TOF MODULE-

Vectorless Analysis of Supply Noise Induced Delay Variation

Calculation of the received voltage due to the radiation from multiple co-frequency sources

Adaptive System Control with PID Neural Networks

SRAM Leakage Suppression by Minimizing Standby Supply Voltage

Multiple Error Correction Using Reduced Precision Redundancy Technique

Harmonic Balance of Nonlinear RF Circuits

Shunt Active Filters (SAF)

A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip-Flops

29. Network Functions for Circuits Containing Op Amps

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

HIGH PERFORMANCE ADDER USING VARIABLE THRESHOLD MOSFET IN 45NM TECHNOLOGY

CMOS Implementation of Lossy Integrator using Current Mirrors Rishu Jain 1, Manveen Singh Chadha 2 1, 2

Voltage Quality Enhancement and Fault Current Limiting with Z-Source based Series Active Filter

MOSFET Physical Operation

Comparison of V I c control with Voltage Mode and Current Mode controls for high frequency (MHz) and very fast response applications

MTBF PREDICTION REPORT

A Current Differential Line Protection Using a Synchronous Reference Frame Approach

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

THE GENERATION OF 400 MW RF PULSES AT X-BAND USING RESONANT DELAY LINES *

Total Power Minimization in Glitch-Free CMOS Circuits Considering Process Variation

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Research on Controller of Micro-hydro Power System Nan XIE 1,a, Dezhi QI 2,b,Weimin CHEN 2,c, Wei WANG 2,d

Space Time Equalization-space time codes System Model for STCM

Strain Gauge Measuring Amplifier BA 660

Estimating Mean Time to Failure in Digital Systems Using Manufacturing Defective Part Level

Dual Functional Z-Source Based Dynamic Voltage Restorer to Voltage Quality Improvement and Fault Current Limiting

Sensors for Motion and Position Measurement

Learning Ensembles of Convolutional Neural Networks

Improvement of the Shunt Active Power Filter Dynamic Performance

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Sizing and Placement of Charge Recycling Transistors in MTCMOS Circuits

Modeling and Control of a Cascaded Boost Converter for a Battery Electric Vehicle

COMPARISON OF VARIOUS RIPPLE CARRY ADDERS: A REVIEW

Yield Optimisation of Power-On Reset Cells and Functional Verification

Chapter 13. Filters Introduction Ideal Filter

Power Factor Correction with AC-DC Buck Converter

A Novel Soft-Switching Two-Switch Flyback Converter with a Wide Operating Range and Regenerative Clamping

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

Simulation Methodology for Analysis of Substrate Noise Impact on Analog / RF Circuits Including Interconnect Resistance

Implementation Complexity of Bit Permutation Instructions

ECE 2133 Electronic Circuits. Dept. of Electrical and Computer Engineering International Islamic University Malaysia

Micro-grid Inverter Parallel Droop Control Method for Improving Dynamic Properties and the Effect of Power Sharing

RC Filters TEP Related Topics Principle Equipment

Lecture 30: Audio Amplifiers

Network Reconfiguration in Distribution Systems Using a Modified TS Algorithm

AC-DC CONVERTER FIRING ERROR DETECTION

DUE TO process scaling, the number of devices on a

Improved corner neutron flux calculation for Start-up Range Neutron Monitor

Simulation and Closed Loop Control of Multilevel DC-DC Converter for Variable Load and Source Conditions

High Gain Soft-switching Bidirectional DC-DC Converters for Eco-friendly Vehicles

Customer witness testing guide

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson

Prevention of Sequential Message Loss in CAN Systems

Dynamic Power Consumption in Virtex -II FPGA Family

Process Variability Modeling for VLSI Circuit Simulation

Suppression of Co-Channel Interference in High Duty Ratio Pulsed Radar Receivers

Lecture 10: Bipolar Junction Transistor Construction. NPN Physical Operation.

Analysis, Design, and Simulation of a Novel Current Sensing Circuit

Block-wise Extraction of Rent s Exponents for an Extensible Processor

Process Variation Aware SRAM/Cache for Aggressive Voltage-Frequency Scaling

Beam quality measurements with Shack-Hartmann wavefront sensor and M2-sensor: comparison of two methods

A Fuzzy-based Routing Strategy for Multihop Cognitive Radio Networks

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network

A High-Speed Multiplication Algorithm Using Modified Partial Product Reduction Tree

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Webinar Series TMIP VISION

A method to reduce DC-link voltage fluctuation of PMSM drive system with reduced DC-link capacitor

4.3- Modeling the Diode Forward Characteristic

DIMENSIONAL SYNTHESIS FOR WIDE-BAND BAND- PASS FILTERS WITH QUARTER-WAVELENGTH RES- ONATORS

California, 4 University of California, Berkeley

1 GSW Multipath Channel Models

Mismatch-tolerant Capacitor Array Structure for Junction-splitting SAR Analog-to-digital Conversion

PERFORMANCE EVALUATION OF BOOTH AND WALLACE MULTIPLIER USING FIR FILTER. Chirala Engineering College, Chirala.

An Adaptive Over-current Protection Scheme for MV Distribution Networks Including DG

Malicious User Detection in Spectrum Sensing for WRAN Using Different Outliers Detection Techniques

BI-DIRECTIONAL EDGE-RESONANT SWITCHED CAPACITOR CELL-ASSISTED SOFT-SWITCHING PWM DC DC CONVERTER FOR RENEWABLE ENERGY APPLICATIONS

Transcription:

A Novel On-chp Measurement Hardware for Effcent Speed-Bnnng A. Raychowdhury, S. Ghosh, and K. Roy Department of ECE, Purdue Unversty, IN {araycho, ghosh3, kaushk}@ecn.purdue.edu Abstract Wth the aggressve scalng of the CMOS technology parametrc varaton of the transstor threshold voltage causes sgnfcant spread n the crcut delay as well as leakage spectrum. Consequently, speed bnnng of the hgh performance VLSI chps s essental and t costs sgnfcant amount of test applcaton tme. Further, the knowledge of the actual delay n the crtcal path of the crcut enables effcent use of typcal low power methodologes e.g., voltage scalng, adaptve body basng etc. In ths paper, we have proposed a novel on-chp, low overhead and process tolerant delay measurement crcut whch can estmate the crtcal path delay n a sngle clock perod. Ths has the advantage of effcent on-chp speed bnnng. Keywords: Speed bnnng, delay measurement hardware, process varaton. I. Introducton Systematc as well as random varatons n the process parameters have posed serous challenges to future hgh performance mcroprocessor desgn. Varatons n length, oxde thckness and random dopant effects n nano-scaled transstors result n sgnfcant fluctuatons n the transstor threshold voltage (V T ). Ths can be from one de to another (nter-de) as well as wthn de (ntra-de). Consequently, the spread n delay s consderable and a 3% frequency dstrbuton s typcally estmated [1]. Ths varaton n frequency has ntroduced the concept of frequency bnnng. On one hand, some of the chps are faster (low-v T ) than the nomnal and they tend to support hgher clock frequences at a system level. These chps add sgnfcantly to the proft margn. On the other hand, some of the hgh V T chps are er than the nomnal but they can be used at lower clock speeds. Thus t s essental to effcently perform speed bnnng, not only to earn extra proft for the hgher performance chps but also to salvage the er but non-faulty chps n a possble gono-go stuaton. Speed bnnng has, thus, emerged as an ndspensable part of delay fault testng. Tradtonally, speed bnnng s perfor by ncreasng the clock frequency of the crtcal path secton (or, ts replca) of the crcut tll t fals. Ths process s expensve n terms of test applcaton tme and desgn complexty of the test hardware. In ths paper, we propose a novel crcut to effcently speed bn a hgh performance processor n a sngle clock cycle by drectly measurng the delay of ts crtcal path. We have desgned a low-overhead, process tolerant delay measurement hardware (DMH) that can detect the bn that a partcular chp belongs to. Conventonally, speed bnnng as well as adaptve technques (e.g. body basng, dynamc voltage scalng) are perfor on crtcal path replcas [2]. In our methodology, we have used a smlar technque. Consequently, the nserton of DMH does not load the crtcal path of the crcut. The replca crcut tracks nterde varatons effcently and smulaton results show that even under hgh ntra-de varatons the technque can correctly bn the crcut wth more than 96% confdence. The output of the DMH s a dgtal word that represents the bn that the crcut belongs to. The novelty les n the fact that the DMH can detect the speed bn n a sngle clock cycle thereby savng valuable test applcaton tme. The organzaton of ths paper s as follows. Secton II descrbes the operaton of DMH for effcent speed bnnng. The ndvdual blocks of DMH are descrbed n Secton III. In Secton IV, we have demonstrated the need of speed bnnng due to process varaton. The expermental setup for performng speed bnnng wth the proposed DMH s explaned n Secton V. The effect of parametrc varaton on speed bnnng s studed n Secton VI. Fnally, conclusons are drawn n Secton VII. II. Methodology Before gong nto the detaled descrpton of the delay measurement hardware, t wll be worthwhle to menton the prncple of operaton of the DMH. Let us assume that the crtcal path s a combnatonal logc block wth a state nput comng from flp-flop FF1 and the output gong to FF2 as shown n Fg. 1. Frst, we replcate the crtcal path of the crcut and nstead of a flp-flop we place the DMH at ts end. Secondly, the replcated crtcal path s senstzed usng test patterns appled by the Bult-n Self- Test (BIST). The BIST s clocked by the system clock and the test pattern s launched at T1. Let us assume that node X (output of the crtcal path) makes a fallng transton from a logc one to a logc zero and T D s the tme nterval between the clock edge T1 and the tme when the voltage at node X makes a fallng transton (Fg. 2). Proceedngs of the 11th IEEE Internatonal On-Lne Testng Symposum (IOLTS 5) 1

FLASH ADC FF 1 BIST Logc CRITICAL PATH CRITICAL PATH REPLICA TAH Sawtooth Generator X Comparator C3 fast V CE Comparator C2 Table 1: Flash ADC outputs and correspondng speed bns fast Medum Slow est bn1 1 bn2 1 1 bn3 1 1 1 We propose the system llustrated n Fg. 1 for estmatng the delay, T D. A sawtooth generator s so desgned that the sawtooth waveform s extracted from the reference clock tself and t has pulse duraton equal to the tme perod (T) of the reference clock. The output of the sawtooth generator goes nto a track-and-hold crcut (TAH) and the samplng swtch of the TAH s controlled by the observaton node (X). As long as the node X s hgh, the TAH swtch s on and the output of the TAH tracks the sawtooth waveform. When X makes a fallng transton, t turns the TAH swtch off and the output capactor of the TAH holds ts value (say, V TAH ). The greater the delay T D s, the lower s V TAH. The value of V TAH can be used to estmate the speed bn of the bn3 bn2 V CE Comparator C1 V Fg. 1: Speed bnnng archtecture usng the proposed DMH. Node X Sawtooth TAH CE T D T Fg. 2: Tmng dagram of the DMH. FF 2 V CE bn1 Sawtooth fast V ref V ref V ref V OL Fgure 3: Speed bns and correspondng T max. crcut. The TAH drves a flash analog-to-dgtal comparator (ADC). The flash ADC conssts of three comparators C1, C2 and C3. The output of the comparators.e., bn1, bn2 and bn3, ndcates the speed bn of the crcut. As evdent from Fg. 3, we have dvded the speed nto four bns, namely, fast, um, and est. The chps belongng to the est bn are dscarded. Table 1 shows the outputs of the flash ADC correspondng to each speed bn. The reference voltages, ( V, V ref ref andv fast ref ) are the three nputs to the flash ADC. Consderng a lnearly fallng sawtooth waveform, the reference voltages can be estmated as: T T V VDD 1 VOL (1) T T where T s the clock perod and represents the bn,.e.;, or fast. Here, the sawtooth waveform s assu to be between V DD and V OL and T represents the maxmum allowable delay of the th bn. Thus T also represents the lower delay threshold of the th bn. For example, T represents the maxmum delay that a crcut may have to be placed n the um frequency dn. Thus, t represents the boundary between the um and the bns. Mathematcally: T FAST MEDIUM max delay of bn (2) The concept of T max determnaton for a partcular bn s llustrated n Fg. 3. The boundares between dfferent bns are shown wth bold lnes. The ADC evaluates at the next clock cycle when the Comparator-Enable (CE) sgnal goes hgh (Fg. 2). For the crtcal path replca to belong to the th bn, we requre: V V (3) TAH When (3) s true, the comparator output goes HIGH (logcal 1 ). Here, we have dscussed the case when X makes a fallng transton. Snce we perform the speed bnnng n the test cycle we can decde a-pror what nput vector can excte a hgh to low transton at node X and apply t correspondngly. Also, note that the proposed SLOW SLOWEST T fast T T Proceedngs of the 11th IEEE Internatonal On-Lne Testng Symposum (IOLTS 5) 2

V DD V DD T FF _2T M 2 C out M1 Vbas OUT 2TZ (a) V TAH_OUT V SZ CE (a) IN OUT V DD RESET I 1 I 2 I ref1 I ref2 S C HOLD (b) Fg. 4: Schematc dagrams (a) Sawtooth Generator; (b) Track-and-Hold (TAH) crcut. speed bnnng methodology can be easly extended to N number of bns. A D1 D2 B R 1 R Vref1 Rref1 V ref2 R ref2 III. Desgn of the ndvdual DMH blocks Sawtooth generator: The sawtooth generator s based on the prncple of constant current dscharge. The schematc dagram of the sawtooth generator s shown n Fg. 4a. A T flp-flop s used to generate a clock (_2T) wth a perod equal to twce the perod of the reference system clock. Consder that the node OUT n Fg. 2a s precharged to V DD when _2T s low. When _2T goes hgh (_2TZ goes low) the constant supply voltage (V DD ) provdes a constant current through the NMOS M1. Ths current dscharges the capactor C lnearly as long as M1 s n saturaton. Durng ths phase the PMOS M2 remans off and the output node shows a lnearly fallng waveform. At the end of the clock perod, the _2TZ sgnal goes hgh. Ths creates a low resstve path across the capactor through M2 and thus helps to charge OUT back to V DD. The gate voltage V bas of M1 provdes the requred current n the saturaton regon. The dscharge s lnear (gnorng Early effect) as long as M1 s n the saturaton regon. Hence we requre V ds >= V bas -V t. To ensure ths, the output node s allowed to dscharge tll V OL =V bas -V t (chosen to be 18mV, n ths case) n a sngle clock perod. Track-and-Hold network: The track-and-hold network for the crcut s a complementary pass transstor swtch (b) Fg. 5: The schematc dagram of the (a) comparator; (b) reference voltage generator. wth a capactve load (Fg. 4b). The voltage at the observaton pont X s the nput sgnal, S to the TAH. As long as S s hgh t wll charge the capactor, C HOLD. The value of the capactor n our desgn s 1fF. To dscharge the capactor before the next delay measurement, an NMOS swtch, trggered by a RESET sgnal s used n parallel wth C HOLD. The RESET pulse s generated after the comparson between V and V TAH has been made. It s also worth-mentonng, that the samplng swtch has been made a complementary one to avod charge njecton and clock feed-through [4]. Flash ADC: The flash ADC s the fastest ADC whch parallely compares the nput to a set of reference voltages. In the proposed desgn, we have used a 3-bt ADC. The ADC comprses of three comparators. The comparator used here s a latch-based sense amplfer, as been llustrated n Fg. 5a. The value of the sgnal at node TAH_OUT goes nto the comparator nput. After the trackand-hold phase, CE goes hgh and the output of the comparator s noted n the next clock cycle. The three Proceedngs of the 11th IEEE Internatonal On-Lne Testng Symposum (IOLTS 5) 3

comparator outputs form the ADC output word whch represents the speed bn of the crcut as gven n Table 1. The reference voltages of flash ADC are calculated by usng (1) and assumng T fast = N 1 Dm; T = N 2 Dm; (4) T = N 3 Dm; T = N 4 Dm; Where D m s the mean delay of the crtcal path of the crcut and N s determne the tmng boundares between bns. In our smulatons, we have chosen reasonable values of N s, namely, N 1 =.75, N 2 = 1.1, N 3 = 1.3 and N 4 = 1.6. In other words, N 1 =.75 means that all chps whose max crtcal path delays are less than 75% of the nomnal desgn are called fast chps. Smlarly all chps whose max delay s between 75% (N 1 ) and 11% (N 2 ) of the nomnal delay are called the um frequency chps. Fnally, f the chp has a delay of more than 1.3 tmes of the nomnal delay (N 3 D m ) then we dscard the chp as a faulty one. Note that n the test phase the test clock has a tme perod whch s 6% more than that of the nomnal clock perod. Ths ensures that the chps that are non-faulty but er than the nomnal desgn are properly bnned and can be used. Generaton of reference voltages In the proposed DMH, the stable voltage references (V ref s) for the DMH have been desgned based on the desgn of band-gap reference voltage. Ths has been llustrated n Fg. 5b. The prncple of operaton of sub 1V bandgap reference crcut s descrbed n [5]. Ths has been sutably modfed and used n the proposed DMH. The opamp equalzes the voltage between nodes A and B. Hence all the PMOS transstors at the top have the same V gs and hence they mrror the same current. I 1 = I 2 = I 3 = I ref1 = I ref2 =. (5) Let V d be the voltage across the dode D 1. It has a negatve temperature coeffcent (NTC). dv s the voltage dfference between dode D 1 and dode D 2 (of area N tmes that of D 1 ) and hence dv = V T ln(n), where V T s the thermal voltage havng a postve temperature coeffcent (PTC). The total current I 2 s: VB Vd Vd dv I2 R R1 R R (6) 1 Due to the current mrror, the same current s pumped nto the reference voltage generator arm. The output reference voltage of the th arm s thus: Vd VT ln( N) Vref () Rref () (7) R R1 N s chosen such that the net temperature coeffcent s zero. Note that, any voltage can be generated by changng the value of R ref. Several dfferent arms have been shown n the crcut below. Further, the output voltage s not dependent on transstor parameters. Even under de-to-de parameter varatons, the reference generator wll delver a stable reference voltage. It can be mentoned that bandgap reference usually forms an ntegral part of all mxed-sgnal and dgtal crcuts. We can use the bandgap reference already present n the crcut and we can add the reference generator arms to t to obtan a wde range of temperature nsenstve and stable voltage references. Ths wll reduce the area overhead nvolved wth the generaton of stable reference voltages. If the bandgap reference s not already present, we can use one bandgap reference (as n Fg. 5b) and share t for all reference voltages. Impact of process varaton It has been mentoned that we generate process varaton tolerant reference voltages usng a modfed bandgap reference. The other mportant DMH block that can be affected by process varaton s the sawtooth generator. Process varaton changes the dscharge rate of the capactor C and hence, mpacts V OL and the choce of V. To compensate for ths, we propose an ntal calbraton cycle where the capactor C s trm dependng on the process corner, thereby ensurng a lnear dscharge from V dd to V OL across all des. The TAH crcut s a transmsson gate wth large-szed transstors. Hence, process varaton cannot consderably mpact functonalty or performance of ths block. Fnally, the comparator n the ADC s dfferental n nature and de-tode varaton cannot mpact ts functonalty. Further, latchbased comparators, tolerant to wthn-de varaton, have been reported n [6] and extensvely studed n the desgn of our proposed DMH. IV. varatons: Necessty of Speed bnnng It has already been mentoned that the process varaton manfests tself as chp speed varaton n nanometer desgns [1]. We have studed a number of benchmark crcuts and they show sgnfcant spread n crtcal path delays. Fg. 6 shows the crtcal path delay dstrbuton of some ISCAS 89 benchmarks under process varaton. All the benchmarks have been modeled n HSPICE usng the BPTM [7] 7nm technology node. In our study we have assu an nter-de V T varaton wth =25% and ntrade varaton wth =15%. From Fg. 6 we can note that on an average the standard devaton () of the crtcal path delay s approxmately 27% of the nomnal delay. Ths consoldates the argument that speed bnnng n nanoscaled desgns s absolutely necessary. V. Expermental setup To explan the expermental setup let us consder one of the benchmark crcuts, namely, s838. Frst, we extracted ts crtcal usng Synopsys prmetme tool. Next the test pattern, to senstze the crtcal path, s obtaned usng the Proceedngs of the 11th IEEE Internatonal On-Lne Testng Symposum (IOLTS 5) 4

4 15 1 35 9 3 8 1 7 Number of chps 25 2 15 Number of chps 5 Number of chps 6 5 4 3 1 2 5 1 2 3 4 5 6 7 8 9 1 x 1 1.2.4.6.8 1 1.2 1.4 1.6 1.8 x 1 9 2 4 6 8 1 12 14 16 x 1 1 (a) s838 (b) s1196 (c) s5378 12 1 1 1 9 8 9 8 Numbre of chps 8 6 4 Number of chps 7 6 5 4 3 Number of chps 7 6 5 4 3 2 2 1 2 1.5 1 1.5 2 2.5 3 3.5 x 1 9.5 1 1.5 2 2.5 3 3.5 x 1 1 2 3 4 5 6 7 8 9 1 x 1 1 (d) s1327 (e) s1585 (f) s35932 Fg. 6: spread of ISCAS 89 benchmark crcuts w.r.t. number of chps due to parametrc varatons Synopsys Tetramax [8] ATPG tool. The entre DMH and benchmark has then been smulated n HSPICE at the 7nm technology node. We have appled the test pattern obtaned from Tetramax to both the orgnal crcut and the replcated crtcal path. The smulaton result from the spce s depcted n Fg. 7. The bn nformaton output of the DMH s observed at the end of test cycle and verfed for correctness wth the exact delay of the orgnal crtcal path. It can be notced that there s a fallng transton at the output node X of replcated crtcal path. The delay of the crtcal path falls nto um speed category whch s verfed by the DMH outputs.e. bn1, bn2 and bn3. suffers from ntra-de varaton. VI. Effect of varaton If there are no ntra-de varatons then both the orgnal as well as the crtcal path replcas would have dentcal delays and the speed bnnng would be perfect. However, ntra-de varatons tend to produce delay skew between the actual crtcal path and the crtcal path replca. Therefore, there can be chances of bn mspredcton f the replcated crtcal path and orgnal crtcal path severely Table 2: Bn predcton usng DMH for 1 runs Crcut Bn (correct) Bn Correct (mspredcted) predcton (%) s838 962 38 96.2 s1196 97 93 9.7 s5378 97 3 97. s1327 985 15 98.5 s1585 991 9 99.1 s35932 986 14 98.6 Fg. 7: Spce smulaton of s838 usng DMH and bn determnaton. DMH outputs (bn1, bn2, bn3) = (, 1, 1) ndcates that the crcut falls under um speed category. Proceedngs of the 11th IEEE Internatonal On-Lne Testng Symposum (IOLTS 5) 5

Table 3: Bn predcton of s838 for 1 process condtons Smulaton # 1 2 3 4 5 6 7 8 9 1 Actual bn 2 1 3 3 1 1 3 2 3 3 Predcted bn 2 1 3 3 1 1 3 1 3 3 bn=1 s est, bn=2 s, bn=3 s um and bn=4 s fast. In stuatons where the crcut speed s at the boundary of two neghborng bns, the chances of mspredcton ncreases due to varaton n crcut delay owng to process fluctuatons. To study the effect of process fluctuatons on bn predcton, we smulated each of the benchmark crcuts for 1 dfferent process condtons. Smulaton results are shown n Table 2. It can be observed that the chances of correct bn predcton under severe nter- and ntra-chp varatons are approxmately 96% on an average. Note that the DMH offers two gate capactance loads at the end of the crtcal path replca nstead of two dffuson cap loads of the flp flop at the end of the actual crtcal path. Further, we provde some extra threshold whle estmatng the reference voltages V ref for bn boundary determnaton. Hence, our bn predcton s pessmstc due to extra loadng of DMH. Therefore, the faulty chps (under est category) cannot pass through to consumers. Further, the mspredcton occurs when the chp under consderaton s at the boundary of two bns. Ths s llustrated n Table 3, where we have smulated benchmark s838 for 1 dfferent process condtons and compared the correct and predcted bn The ms-predcted chp was found to be at the boundary of est and bn. 7. Berkeley Predctve Technology Models: http://wwwdevce.eecs.berkeley.edu/~ptm/ 8. Synopsys Inc., Tetramax ATPG, www.synopsys.com/products. VII. Conclusons In ths paper, we have proposed a novel on-chp, low overhead and process tolerant delay measurement crcut whch can estmate the crtcal path delay n a sngle clock perod. Ths has the advantage of effcent on-chp speed bnnng. Smulaton results have shown an average of 96% correct bn predcton even under severe nter- and ntrachp varatons. References: 1. S. Borkar et al, Parameter varatons and mpact on crcuts and mcroarchtecture, DAC, pp. 338-342, 23. 2. J. W. Tshanz et al, Adaptve body bas for reducng de-to de and wthn-de parameter varatons on mcroprocessor frequency and leakage, IEEE JSSC, pp. 1396-142, 22. 3. N. Dragone et al., An adaptve on-chp voltage regulaton technque for low-power applcatons, ISLPED, pp. 2-24, 2. 4. B. Razav, Desgn of Analog CMOS Integrated Crcuts, McGraw Hll, USA, 2. 5. Banba et al., A CMOS bandgap reference crcut wth sub-1-v operaton, IEEE JSSC, pp.67-674, 1999. 6. Sarpeshkar et al., Msmatch senstvty of a smultaneously latched CMOS sense amplfer, IEEE JSSC, pp. 1413-1422, 1991. Proceedngs of the 11th IEEE Internatonal On-Lne Testng Symposum (IOLTS 5) 6