IN the face of shrinking feature size, one of the major

Similar documents
/$ IEEE

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.3

An Optimal Design of Ring Oscillator and Differential LC using 45 nm CMOS Technology

Delay-based clock generator with edge transmission and reset

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 2, February 2012

CHAPTER 6 PHASE LOCKED LOOP ARCHITECTURE FOR ADC

ECEN 720 High-Speed Links: Circuits and Systems. Lab3 Transmitter Circuits. Objective. Introduction. Transmitter Automatic Termination Adjustment

Single-Ended to Differential Converter for Multiple-Stage Single-Ended Ring Oscillators

A Multiobjective Optimization based Fast and Robust Design Methodology for Low Power and Low Phase Noise Current Starved VCO Gaurav Sharma 1

ECEN 720 High-Speed Links: Circuits and Systems

A PROCESS AND TEMPERATURE COMPENSATED RING OSCILLATOR

An Analog Phase-Locked Loop

A Reset-Free Anti-Harmonic Programmable MDLL- Based Frequency Multiplier

Lecture 11: Clocking

ECEN 720 High-Speed Links Circuits and Systems

ECE1352. Term Paper Low Voltage Phase-Locked Loop Design Technique

A Digital Clock Multiplier for Globally Asynchronous Locally Synchronous Designs

DESIGN AND VERIFICATION OF ANALOG PHASE LOCKED LOOP CIRCUIT

Quiz2: Mixer and VCO Design

A Low Phase Noise LC VCO for 6GHz

AVoltage Controlled Oscillator (VCO) was designed and

1P6M 0.18-µm Low Power CMOS Ring Oscillator for Radio Frequency Applications

Quadrature GPS Receiver Front-End in 0.13μm CMOS: The QLMV cell

Analysis and Design of a 1GHz PLL for Fast Phase and Frequency Acquisition

An Efficient Design of CMOS based Differential LC and VCO for ISM and WI-FI Band of Applications

REDUCING power consumption and enhancing energy

Enhancement of VCO linearity and phase noise by implementing frequency locked loop

Lecture 7: Components of Phase Locked Loop (PLL)

Optimization of Digitally Controlled Oscillator with Low Power

CMOS Current Starved Voltage Controlled Oscillator Circuit for a Fast Locking PLL

DESIGN OF MULTIPLYING DELAY LOCKED LOOP FOR DIFFERENT MULTIPLYING FACTORS

Design of VCOs in Global Foundries 28 nm HPP CMOS

ISSCC 2002 / SESSION 17 / ADVANCED RF TECHNIQUES / 17.2

Design of Low Phase Noise and Wide Tuning Range Voltage Controlled Oscillator for Modern Communication System

SiNANO-NEREID Workshop:

DFT for Testing High-Performance Pipelined Circuits with Slow-Speed Testers

20Gb/s 0.13um CMOS Serial Link

International Journal of Scientific & Engineering Research, Volume 4, Issue 6, June ISSN

Lecture 160 Examples of CDR Circuits in CMOS (09/04/03) Page 160-1

UMAINE ECE Morse Code ROM and Transmitter at ISM Band Frequency

Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting

A CMOS Phase Locked Loop based PWM Generator using 90nm Technology Rajeev Pankaj Nelapati 1 B.K.Arun Teja 2 K.Sai Ravi Teja 3

Delay-Locked Loop Using 4 Cell Delay Line with Extended Inverters

Synchronous Mirror Delays. ECG 721 Memory Circuit Design Kevin Buck

A Review of Phase Locked Loop Design Using VLSI Technology for Wireless Communication.

CHAPTER 6 DESIGN OF VOLTAGE CONTROLLED OSCILLATOR (VCO) USING 45 NM VLSI TECHNOLOGY

A Low Noise, Voltage Control Ring Oscillator Based on Pass Transistor Delay Cell

A 2.4 GHz to 3.86 GHz digitally controlled oscillator with 18.5 khz frequency resolution using single PMOS varactor

A Variable-Frequency Parallel I/O Interface with Adaptive Power Supply Regulation

NEW WIRELESS applications are emerging where

THE BASIC BUILDING BLOCKS OF 1.8 GHZ PLL

6.776 High Speed Communication Circuits and Systems Lecture 14 Voltage Controlled Oscillators

Research on Self-biased PLL Technique for High Speed SERDES Chips

A 60-GHz Broad-Band Frequency Divider in 0.13-μm CMOS

Novel Buffer Design for Low Power and Less Delay in 45nm and 90nm Technology

Design and Simulation of Low Voltage Operational Amplifier

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A Robust Oscillator for Embedded System without External Crystal

10 GHz Voltage Controlled Ring Oscillator for High Speed Application in 130nm CMOS Technology

Design of Phase Locked Loop as a Frequency Synthesizer Muttappa 1 Akalpita L Kulkarni 2

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

Low-Power Realization of FIR Filters Using Current-Mode Analog Design Techniques

Geared Oscillator Project Final Design Review. Nick Edwards Richard Wright

ISSN:

20 GHz Low Power QVCO and De-skew Techniques in 0.13µm Digital CMOS. Masum Hossain & Tony Chan Carusone University of Toronto

A Low-Jitter Phase-Locked Loop Based on a Charge Pump Using a Current-Bypass Technique

Self-Biased PLL/DLL. ECG minute Final Project Presentation. Wenlan Wu Electrical and Computer Engineering University of Nevada Las Vegas

An Improved Bandgap Reference (BGR) Circuit with Constant Voltage and Current Outputs

CMOS 0.35 µm Low-Dropout Voltage Regulator using Differentiator Technique

A Low Power Switching Power Supply for Self-Clocked Systems 1. Gu-Yeon Wei and Mark Horowitz

Taheri: A 4-4.8GHz Adaptive Bandwidth, Adaptive Jitter Phase Locked Loop

Design technique of broadband CMOS LNA for DC 11 GHz SDR

A Dual-Step-Mixing ILFD using a Direct Injection Technique for High- Order Division Ratios in 60GHz Applications

Design of Low Noise 16-bit CMOS Digitally Controlled Oscillator

Due to the absence of internal nodes, inverter-based Gm-C filters [1,2] allow achieving bandwidths beyond what is possible

NOVEMBER 28, 2016 COURSE PROJECT: CMOS SWITCHING POWER SUPPLY EE 421 DIGITAL ELECTRONICS ERIC MONAHAN

A Divide-by-Two Injection-Locked Frequency Divider with 13-GHz Locking Range in 0.18-µm CMOS Technology

1-13GHz Wideband LNA utilizing a Transformer as a Compact Inter-stage Network in 65nm CMOS

Comparison And Performance Analysis Of Phase Frequency Detector With Charge Pump And Voltage Controlled Oscillator For PLL In 180nm Technology

THE GROWTH of the portable electronics industry has

Design of 2.4 GHz Oscillators In CMOS Technology

DESIGN OF LOW-VOLTAGE WIDE TUNING RANGE CMOS MULTIPASS VOLTAGE-CONTROLLED RING OSCILLATOR

EE290C - Spring 2004 Advanced Topics in Circuit Design High-Speed Electrical Interfaces. Announcements

NOVEL OSCILLATORS IN SUBTHRESHOLD REGIME

A Performance Comparision of OTA Based VCO and Telescopic OTA Based VCO for PLL in 0.18um CMOS Process

CML Current mode full adders for 2.5-V power supply

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

Bootstrapped ring oscillator with feedforward inputs for ultra-low-voltage application

Design of a Temperature-Compensated Crystal Oscillator Using the New Digital Trimming Method

Design and Simulation of RF CMOS Oscillators in Advanced Design System (ADS)

ISSN:

Performance of a Resistance-To-Voltage Read Circuit for Sensing Magnetic Tunnel Junctions

Phase Locked Loop Design for Fast Phase and Frequency Acquisition

PROCESS and environment parameter variations in scaled

Voltage Controlled Ring Oscillator Design with Novel 3 Transistors XNOR/XOR Gates

ISSCC 2004 / SESSION 21/ 21.1

Transmission-Line-Based, Shared-Media On-Chip. Interconnects for Multi-Core Processors

DESIGN FOR LOW-POWER USING MULTI-PHASE AND MULTI- FREQUENCY CLOCKING

Interconnect-Power Dissipation in a Microprocessor

Low Phase Noise CMOS Ring Oscillator VCOs for Frequency Synthesis

Transcription:

1 An Analysis of Injection Locked Clocking with Ring Oscillators Suchit Bhattarai and Rachel Nancollas Abstract In the recent years, injection locked clocking (ILC has been proposed as a solution to the power and skew problems of high-speed clocking in mixed-signal VLSI systems and microprocessors. A number of ILC schemes have been proposed that use local oscillators based on LC-tanks and complex digital feedback systems, but these have proved impractical for area-constrained digital design. Here we explore the possibility of using current staved ring oscillators for ILC. We develop a simple analytic model for performing optimal sizing of conventional and ILRO clock trees. Through analytic and Cadence simulations, we find that ILROs consume about 4% more energy and have potentially 23% more delay than conventional clock trees. Index Terms Injection Locked Clocking, Injection Locked Oscillators, Ring Oscillators, Skew 1 INTRODUCTION IN the face of shrinking feature size, one of the major challenges faced by digital designers is the distribution of clocks in highly integrated systems such as microprocessors. In particular, on-chip clocks must drive large capacitive gains (often on the order of 10 5 or higher while minimizing skew and jitter and meeting tight dynamic power specs[8]. In conventional clocking schemes that employ a global PLL-locked reference distributed through buffer chains and clock grids, the power required to constantly switch the large capacitive loads can consume 40% of the chip s total power budget [2]. This is a significant problem in modern low-power multi-core systems. Furthermore, with increasing clock speeds, compensating for skew and jitter requires an increasingly large percentage of the clock period, which reduces time allowed for the critical path [2]. In recent years, injection-locked clocking has been proposed as a solution for reducing the skew, jitter, and power consumption of the clock network. However, the existing injection-locked clocks are too area intensive, as these oscillators rely mostly on analog circuit blocks, such as L-C tanks. In this study we investigate the feasibility of injection-locked ring oscillators as a low-power, low-area and an easily integrable solution for clock distribution. Our major focus is to do a comparative analysis between conventional clocking approach through buffer chains and the injection-locking based distribution scheme in the presence of interconnect parasitics and their associated process variations. In section 2, we present a brief analysis of the current state-of-the-art in injection locked clocking, followed by a rigorous analytical derivation of the considerations in designing injection-locked ring oscillator systems in section 3. In section 4, we describe our circuit-level simulations, followed by a comprehensive summary of our results both at the MATLAB design level and circuit-level in section 5. 2 EXISTING ILO SCHEMES 2.1 Injection Locking In the past decade, several research groups have contributed significant effort towards advancing the concept of injection-locked clocking. Injection-locking involves injecting a global clock into local voltage-controlled oscillators (VCOs, known as injection-locked oscillators (ILOs. If the relative power of the injected signal is sufficiently large compared to the local oscillation and their frequency difference is small, the ILO will lock to frequency of the injected signal [2][3][4]. This is illustrated in the block diagram in figure 1. Although there are a number of injection schemes that could be represented by the summation block, current summing is both the simplest and most common. Using ILOs means the load seen by the global clock is reduced, which can reduce the number of buffers in the global clock tree. Compared to conventional clocking, ILC has been shown to reduce the total dynamic power by about 25% due to this reduced load. The ability to injection-lock to harmonics of a signal also makes injection-locked oscillators (ILOs ideal for frequency multiplication [1], which can further reduce the clock s power consumption by allowing for a lower frequency global clock. Finally, if the total number of gates is reduced in an injection-locked clocking scheme, this decreases the skew compared to conventional clocking because of the smaller probability of buffer mismatch [1]. This also has a positive impact on the net accumulated jitter, which emerges primarily from power supply noise experienced by buffers in the clock path [1][5][8]. While all these advantages of ILCs can be exploited for clocking the next generation of multi-core systems, injection-locking does come with a few challenges, the most prominent of which is the frequency locking range. Locking range is defined as the injection frequencies (ω L around the desired oscillation frequency (ω o over

2 Fig. 1. Block diagram showing ring oscillator VCO locking to injection frequency with some phase offset [6]. which an ILO can lock [1][2][3][4][5]. A locking range of 17% was observed by [2][3][4], where a 10 GHz global clock used to lock an H-tree of divide-by-2 ILOs to 5 GHz using LC tanks serving as the ILOs. Using a ringoscillator instead of an LC tank, [5] observed a much lower 2.5% of locking range around 1 GHz. Fig. 2. MDLL with Injection Locked Slave Oscillator for a CDR [9]. The slave oscillator is injected from the MDLL through Wm 2.2 Oscillator Implementation: LC-Tank While injection locking offers the advantages outlined above, developing an efficient and robust clocking scheme suitable for digital circuits is a direct function of the type of oscillator used. As mentioned, [2], [3], and [4] demonstrated a single H-tree distribution scheme relying on injection-locked LC tanks. Extremely low power consumption of 7.3 mw in the ILOs and 53 mw for the entire chip was observed. However, the shortcoming of their clocking scheme is that the oscillators are made of up of inductors and tunable capacitors, which occupy a significant chip area and are prone to mismatch. 2.3 Oscillator Implementation: Clock Data Recovery with MDLLs In order for ILC to be practical in large digital circuitry, several groups have implemented ring oscillators with complex digital feedback strategies. Specifically, [9] developed a clock data recovery (CDR circuit that uses injection-locked ring oscillators (ILROs as part of a multiplying delay locked loop (MDLL to filter high frequency incoming data jitter as well as low frequency power supply distortions. While the data recovery process described in the paper is of little relevance to our goals in injection-locked clocking, the techniques for producing clocks with ultra-low jitter are of importance. In particular, we are interested in the MDLL circuit shown in figure 2. The MDLL portion of the circuit functions like a PLL in that it detects the phase of the oscillation, filters high frequency jitter, and adjusts the bias voltage of the current starved ring oscillator to control the frequency. However, it also uses a reference clock that is injected every N cycles to clear out accumulated jitter. It then uses injection locking to drive a slave oscillator which benefits from similar low jitter performance because its inverters are controlled by the same bias. Using this control loop (as well as another loop in the CDR to filter out lower frequency noise, the circuit offers a very low jitter coupling of 2.65 ps (for a 1 ps input reference clock jitter at the output clock. Although this system provides excellent phase jitter performance and is purely digital (thus removing the need for special process features to create inductors, it is both power and area intensive. Compared to the LC-tank oscillator, the CDR circuit consumes a total of 80 mw, which is somewhat higher than the 53 mw reported for an LC tank based ILO in [1][2][3][4]. Moreover, the CDR occupied.16mm 2, which is several times larger than the LC-tank which was.05mm 2 [9] [3]. Therefore, given that a clock tree would need multiple MDLLs, this CDR is not a feasible strategy for LO generation in a digital system. 2.4 Current Starved Ring Oscillators Given the large area and power consumed by digital feedback oscillators such as MDLLs, it seems that a simple digital oscillator such as a current starved ring oscillator could provide the benefits of ILC yet be feasible in digital systems. Based on our literature review, it appears that ILC has not been implemented with simple ring oscillators. This may be because they also present several challenges such as high phase noise. Ring oscillators are much more susceptible to power supply variations than LC oscillators [7], which directly translates into phase noise and jitter as their oscillation frequency is inversely proportional to V DD (f osc α 1 V DD. Although these challenges make it difficult to use ILROs on large chips, with proper supply filtering, smaller chips might offer lower frequency and supply fluctuations, which could make ROs a good choice for ILC. Therefore, in this paper we explore three questions regarding the feasibility of ring oscillators in ILC: (1 is it possible to do injection locking with current starved ROs (2 how do the energy and skew of a ILRO clock tree compare with a conventional clock tree and (3 how is skew on an ILRO clock tree affected by interconnect variations.

3 3 METHODS: ANALYTIC APPROACH 3.1 Clock Tree Models To explore these questions, we simulated both a conventional clock tree and an injection locked ring oscillator clock tree with the same load and wire capacitance. The models we used for the conventional clock tree and ILRO clock tree are shown in figure 3. Although a typical clock tree would likely involve branching, we chose to model one branch because we felt it would produce a more tractable analytic model. To model a conventional clock tree, the key variables are the clock load (C L which represents local distribution of the clock signal and wire parasitics. For simplicity, we modeled wire parasitics as a simple wire capacitance (C W evenly distributed between buffers. The interconnect total wire capacitance was estimated by assuming that the longest clock distribution path is 1 mm long across-chip. Then, considering 200 ff/mm for a 65 nm technology [10], and scaling rules with S=2, we estimated a wire capacitance of 150 ff/mm. Thus, the conventional clock tree is simply a buffered clock signal that is loaded by wire and output load capacitance. The ILRO clock tree is similar in that the first stage is simply a buffered oscillator driving a equal wire cap. However, instead of driving C L, this stage drives a slave oscillator that is further buffered to drive the final load C L. We assume the second stage sees no wire capacitance. This slave oscillator represents the ILRO. The effective load on the Nth buffer in the first stage of the ILRO clock tree (C inj is determined by the size of the slave oscillator C slave and the injection strength S: C inj = C slave S. 3.2 Energy and Skew Optimization In the models described above, there are several variables that need to be chosen to optimize the clock network. In the conventional clock network, we need to choose N the number of buffers and f the electrical fanout of each buffer, which we assume is equal in each stage. In the ILRO clock network, we need to chose N and M the number of buffers in the first and second stages, f N and f M the electrical fanout of buffers in the first and second stages, and C inj the effective load on the Nth buffer. To calculate the optimal values for these variables, we aim to minimize the energy and skew of the clock trees. Energy is minimized by reducing the total network capacitance while skew is effectively optimized by making the total delay smaller. Parameters including the wire capacitance C W = 150fF, the injection strength S = 5, the load capacitance C L = 250fF, and the input capacitance to the clock network C in =.81fF were determined though simulation and review of prior works. To minimize energy and skew, we begin by formulating equations for the energy and delay of the clock trees shown in figure 3. For the first stage of the ILRO and the conventional clock tree, we use the model shown in the bottom right of figure 3 to represent the energy and delay due to the buffers and wire capacitance. Using an Elmore delay approximation, we find that delay in the conventional clock tree is given by: t d = (N 1R inv C in (1 + f + R inv Cw N ( 1 f 1 N 1 1 f + Rinv ( C in + C L f N 1 where γ is assumed to be 1. Likewise, the energy is given by: E = α 0 1 V 2 DD C tot C tot = Cw + C in (1 + f [( 1 f N 1 f f N 1 ] + C in f N 1 + C L These equations also apply to the first stage of the ILRO clock tree except that C L is replaced by C inj, and f is replaced by f N. For the second stage of the ILRO, there is no wire capacitance, so energy and delay are calculated for a simple buffer chain driving a load capacitance C L. The only difference is that extra capacitance was added to the energy calculation to model the energy consumed by the slave oscillator: t d = t inv (1 + f M M f M = C L SC inj E = α 0 1 V 2 DD SC inj ( 1 f M M (1 + f M + 10 1 f M To find the optimal N, M, f N, f M, and C inj, we performed two optimization loops in MATLAB. Specifically, we optimized the energy-delay product on each stage of the ILRO as a function of C inj to find the optimal N, M, f N, and f M at each C inj. We then calculated the total energy-delay product and selected the C inj the produced the lowest EDP. We chose to optimize EDP rather than ED 2 because we wanted to equally minimize the energy and skew of the clock tree. In this way, we produced a system of equations to produce optimal sizing of both the conventional and ILRO clock trees based on our models. 4 METHODS: CIRCUIT-LEVEL SIMULATIONS In addition to doing this analytic analysis, we wanted to compare our results to circuit level simulations so we designed a current starved ring oscillator and determined how large the injection buffer needed to be to provide sufficient frequency lock. 4.1 Current-Starved Ring Oscillator Design Our current starved ring oscillator design consists of a 5- stage voltage controlled ring oscillator provided in figure 4. In this circuit, the PMOS M2 serves to mirror the same bias current in all the stages of the ring oscillator, while the voltage on the NMOS M1 controls the voltage on the tail NMOS transistors in the 5 stages. This control voltage determines the propagation delay on each inverter

4 Cinj + Vc Global Clock (RO 1 2 N + Vc + Vc_off Slave Oscillator (RO 1 2 M Cl Vdd Vdd Global Clock (RO 1 2 N + Vc Cl Rinv ycin fcin yf^(n-1cin Rinv/f^(N-1 Cl 1 N Fig. 3. Schematics of the ILRO clock tree (top conventional clock tree (bottom left and model for approximating the energy and delay of a buffered clock network with wire capacitance (bottom right. Vdd Vdd Vdd Vdd Vdd Vdd M2 N1 N2 N3 N4 N5 Osc_out M1 N6 N7 N8 N9 N10 VCO_In Fig. 4. 5-Stage Current Starved Ring Oscillator large the injection buffer must be in relation to the slave oscillator to produce frequency locking when the global and slave oscillators have different control voltages. To test this, we offset the control voltages by 5% and swept the width of the injection buffer until the slave was able to lock to the global frequency within 5% as shown in figure 6. From this figure, we concluded that the slave oscillator could be five times larger than the injection buffer, corresponding to an injection strength S = 5. Fig. 5. 5-stage current starved ring oscillator frequency vs. oscillator size and control voltage stage, which sets the oscillation frequency. In figure 5, we plot the frequency of the provided 5-stage oscillator as a function of the transistor sizing and the control voltage for a 32 nm PTM technology model. As the width and control voltage increase, the frequency increases. Our design goal has been to obtain a 2 GHz clock frequency, which we observe from the provided plot occurs at a width of about 1.1 µm, and a control voltage input of about 800 mv. 4.2 Injection Buffer Sizing In a real ILC system, this control voltage would deviate somewhat between the global and slave oscillator. Without an injection signal from the global oscillator, this would cause a frequency offset between the global and local clocks. One measure of effectiveness of ILC is how Fig. 6. % Frequency lock as a function of the injection buffer size. 5 RESULTS 5.1 Optimization Results Before simulating this ring oscillator as part of an ILRO clock tree, we used our optimization scheme to compare the energy and delay of an ILRO to a conventional clock tree. The results are shown in the table below. Significantly, with the parameters listed in section 3.2, both the energy and delay of the ILRO are worse than those of the conventional clock tree. Not surprisingly, to reduce total EDP of the ILRO, most of the delay happens in the first stage, while most of the energy is spent in the second stage, which reduces the total capacitance driven by the network at the expense of delay. However, even

5 this shifting of fanout to the last stages is unable to offset the significant increase in capacitance (and thus energy due to the slave oscillator. Conventional Clock Tree Energy.25 pj Delay 446 ps N 5 f 2.5 ILRO Clock Tree Total Energy.26 pj Total Delay 548 ps N 4 f N 2.34 E (stage 1 92 fj D (stage 1 474 ps M 2 f M 9.21 E (stage 2.17 pj D (stage 2 75 ps C inj.58 ff Although ILRO was worse than a conventional clock tree with a relatively small injection strength (5 and load capacitance (250 ff, it seemed possible that increasing either of these variables might improve the ILRO s performance. By sweeping the injection strength, we analyzed the energy and delay of the ILRO and conventional clock trees as shown in figure 7. Although increasing injection strength slightly reduces the energy in the first stage (because the buffers are driving a smaller load capacitance, C inj =.58fF was already quite small in comparison to C W = 150fF, so changing injection strength has very minimal effect. We also explored increasing the total load capacitance as shown in figure 8. As C L increased, C inj was fairly constant ( Cinj C L =.01. Therefore, the dominant delay of the first stage remained constant. Since C inj was fairly constant, most of the extra capacitance was shifted to the second stage, so the energy in the second stage increased linearly and mirrored the energy increase in the conventional clock tree. However, since delay did not decrease, the ILRO consistently had worse energy and delay than the conventional clock tree. Fig. 8. Energy and delay in an ILRO and conventional clock tree as the load capacitance C L is varied. Energy Delay Clock Scheme MAtLAB Cadence MATLAB Cadence Inverter Chain 251 fj 712 fj 446 ps 314 ps ILOs 261 fj 1.1 pj 549 ps 689 ps 5.2 Circuit Simulation Results To confirm our analytic results, we simulated the ILRO and conventional clock trees in figure 3. Figure 9 consists of simulation results for the clock distributed through buffers (middle, and injection-locked ring oscillators (bottom. As expected the injection-locked clock has a larger delay than the buffered clock, which results from the fact that the clock propagates through a larger number of buffers as derived in the injection-locked case than in the conventional clocking scheme. Fig. 7. Energy and delay in an ILRO and conventional clock tree with changing injection strength (S. Fig. 9. Simulated Clock Signals; top: Global Master Clock; middle: Injection-Locked Ring Oscillator Clock; bottom: Conventional Clock Buffered with an Inverter Chain In table 5.2 above we have summarized the energy and delay results obtained through MATLAB optimizations and the circuit-level simulations. The total clock delay

6 from the master global clock to the final output nodes are in a close agreement with MATLAB predictions. For example, we predicted an optimal delay of 549 ps for an injection-locked system, while the delay in the actual circuit was 689 ps. Similarly, for an inverter-chain based clocking, we approximated an optimal delay of 446 ps, while the circuit simulations indicated a delay of 314 ps, which is also in a relatively close agreement with the predictions. The energy values however, were quite different between MATLAB predictions and circuit simulations. Some of the discrepancy between the predicted and the measured energy values for the injection-locked clock can be attributed to the difference between the predicted and the actual capacitance of the injecting buffer. In spite of these differences, our simulations suggest that ILRO based clocking is less energy efficient and requires more delay (which suggests it would be more prone to skew than conventional clocking. Although this discourages the use of ROs in ILC, ILROs offer an important redeeming feature of lower percent change in skew due to interconnect variations. Parasitic mismatches play a significant role in determining the clock skew with advancing high density technology nodes, as clocks paths may not always be laid out to exactly match each other. Similarly, with shrinking feature sizes, the variations in inter-layer dielectric thickness and interconnect thickness start becoming more significant, which directly affect the parasitic interconnect capacitances, and thus the clock skew in data paths. We simulated the impact of perturbations of the wire capacitance around the nominal approximated value of 150 ff by 10%, and measured the relative % difference between the nominal delay and the additional delay due to the capacitance perturbation as shown in figure 10.From this plot we observe that the relative % variation in the two ILO clock paths as a function of interconnect variations is 2.6%, while the variation in the buffered clock paths is 4.2%. This result suggests that injectionlocked ring oscillators are more tolerant to interconnect mismatches, achieving a 1.6x better skew performance. 6 CONCLUSION This study has presented a comparative analysis between conventional clock distribution scheme and an injection-locking based scheme. We developed a analytical framework for designing a ring-oscillator based injection-locked system (i.e., a methodology for choosing an appropriate fanout and number of buffering stages around the local oscillators in the presence of interconnect parasitics, and optimized our design for achieving a minimum EDP. Our preliminary findings indicate that (ILROs are not as energy-efficient as the conventional clocking approaches and require more delay which makes them more prone to skew. However, IL- ROs have been found to have 1.4x better skew variation in the presence of interconnect capacitance variations spanning 10% from the nominal. Therefore, we conclude Fig. 10. % delay change as a function of perturbations on C W that ILROs are not a good clocking strategy. Although ILROs offer better interconnect variation tolerance, their higher skew and energy consumption outweighs this advantage. REFERENCES [1] L. Zhang, Low-Power, Gigahertz Clock Generation and Distribution using Injection-Locked Oscillators, in Ph.D. Dissertation, 2010. [2] L. Zhang, B. Ciftcioglu, M. Huang, H. Wu, Injection-Locked Clocking: A New GHz Clock Distribution Scheme, in IEEE 2006 Custom Integrated Circuits Conference, pp. 785 788, 2006. [3] L. Zhang, B. Ciftcioglu, M. Huang, H. Wu, A 1V, 1mW, 4GHz Injection-Locked Oscillator for High-Performance Clocking, in IEEE 2007 Custom Integrated Circuits Conference, pp. 309-312, 2007. [4] L. Zhang, A. Carpenter, B. Ciftcioglu, A. Garg, M. Huang, H. Wu, Injection Locked Clocking: A Low-Power Clock Distribution Scheme for High-Performance Microprocessors in IEEE Trans VLSI, pp. 12511256, 2008. [5] B. Mesgarzadeh, A. Alvandpour, First-Harmonic Injection-Locked Ring Oscillators, in IEEE 2006 Custom Integrated Circuits Conference, pp. 733 736, 2006. [6] B. Mesgarzadeh, A. Alvandpour, A Study of Injection Locking in Ring Oscillators, in IEEE 2005 International Symposium on Circuits and Systems, pp. 5465-5468, 2005. [7] T. Lee, A. Hajimiri, Oscillator Phase Noise: A Tutorial, in IEEE J. Solid-State Circuits, vol. 35, no. 3, pp. 326-336, March 2000. [8] B. H. Calhoun, Y. Cao, X. Li, K. Mai, L. T.Pileggi, R. A. Rutenbar, K. L. Shepard, Digital Circuit Design Challenges and Opportunities in the Era of Nanoscale CMOS, in Proc. IEEE, vol. 96, no. 2, pp. 343 365, Feb. 2008 [9] H. T. Ng, R. F. Rad, M. J. E. Lee, W. J. Dally, T. Greer, J. Poulton, J. H. Edmondson, R. Rathi, R. Senthinathan, A Second-Order Semidigital Clock Recovery Circuit Based on Injection Locking, in IEEE J. Solid-State Circuits, vol. 38, no. 12, pp. 2101-2110, Dec. 2003 [10] S. Nakai, S. Fukuyama, N. Misawa, M. Miyajima, T. Sugii, K. Watanabe, A 65 nm CMOS Technology Featuring Hybrid- ULK/Copper/Interconnects, in Electrochemical Society Proceedings, vol. 4, pp. 67, 2004.