An 8.2 Gb/s-to-10.3 Gb/s Full-Rate Linear Referenceless CDR Without Frequency Detector in 0.18 μm CMOS

IEEE JOURNAL OF SOLID-STATE CIRCUITS 1 An 8.2 Gb/s-to-10.3 Gb/s Full-Rate Linear Referenceless CDR Without Frequency Detector in 0.18 μm CMOS Sui Huang, Member, IEEE, JunCao, Senior Member, IEEE, and Michael M. Green, Senior Member, IEEE Abstract An 8.2 Gb/s-to-10.3 Gb/s full-rate referenceless CDR in 0.18 mcmosispresented.byrealizing an asymmetric phase detector transfer curve, the linear CDR's single-sided capture range increases, which allows the Hogge phase detector itself to function as a frequency detector, thus eliminating the need for the reference clock and the separate frequency detector in conventional dual-loop CDRs. Robust frequency and phase acquisition is demonstrated. Furthermore, a new phase adjustment mode is added to further improve the jitter tolerance performance. The measurement results show that with a 10.3 Gb/s 2-1 PRBS input, the random jitter at the output data is 0.336 ps, and the out-ofband jitter tolerance is 0.34 UI -. Index Terms Analog, clock-and-data recovery (CDR), CMOS, jitter tolerance, linear phase detector, receiver, referenceless, wideband data communication. I. INTRODUCTION T HE limited frequency capture range of a clock and data recovery (CDR) loop causes a locking issue due to the process and temperature variations in voltage-controlled oscillator (VCO) circuits and the wide locking range required by links conforming to multiple communication protocols. Conventionally, an additional frequency-locked loop (FLL) and a reference clock are introduced to aid the frequency acquisition; once the FLL is locked, the CDR circuit takes over to complete the final phase locking as shown in Fig. 1 [1], [2]. However, in many applications, such as repeaters and active cables, it is costly and complicated to implement an accurate and adjustable reference clock. Moreover, the CDR operation can be affected by coupling from the external reference if the board is compactly designed. Thus, the referenceless CDR architecture is popular in industry because of its simplicity and flexibility for different applications. Many referenceless CDRs have been reported in recent years [3] [15]. In [3] [5], referenceless digital CDRs were realized with a wide frequency acquisition range, but the threshold selection of their frequency detectors (FDs) relies on the data Manuscript received September 15, 2014; revised December 11, 2014, March 09, 2015, April 21, 2015; accepted April 21, 2015. This paper was approved by Associate Editor J. Kenney. S. Huang and M. M. Green are with the University of California, Irvine, CA 92697 USA. J. Cao is with Broadcom Corporation, Irvine, CA 92617 USA Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2015.2427332 Fig. 1. Conventional dual-loop CDR architecture with external reference signal. pattern and density, which leads to a tradeoff between the acquisition time and the robustness of FD operation. State-machine-based FDs can also greatly increase the capture range, but the maximum data rate of this type of FDs is limited by the delay and setup time of digital components [6], [7]. Some unique frequency detection methods take advantage of the 8B/10B encoding data and find their application in certain specialized communication channels [8], [9]. A sophisticated architecture was proposed in [10] to enhance the capture range of a CDR using a delay-locked loop (DLL) for frequency acquisition. However, the architecture relied on the topology of the ring VCO, which constrains its applications for high-data-rate cases. Most of these techniques are based on bang-bang phase detectors (PDs), which have worse jitter performance in the steady state than linear CDRs do. In general, a bang-bang PD is highly preferable to a linear PD in referenceless CDRs. In particular, a CDR with a linear PD will have a much lower frequency acquisition range than that one with a bang-bang PD. Since a frequency acquisition loop based on an FD can only coarsely track the input bit rate, frequency detection techniques lend themselves better to bang-bang PDs than to linear PDs. For this work, a linear PD has been chosen in order to obtain better jitter performance while making use of a technique described in the next section to enhance the frequency acquisition range. Rotational FDs (RFDs) are the most popular FD choice for referenceless CDRs, since they are flexible for either type of PD [11] [15]. In [12], a modified RFD without quadrature clock 0018-9200 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2 IEEE JOURNAL OF SOLID-STATE CIRCUITS Fig. 3. Conventional Hogge PD. that the linear PD itself can have a very large frequency capture range. Once the frequency has been acquired, a phase adjustment mode (PAM) then becomes active, which guarantees a smooth and robust transition between frequency acquisition and phase locking, thus further improving the jitter tolerance performance. This paper is organized as follows. Section II explains the impact of the SP of a linear PD on the capture range of a CDR. In Section III, the proposed referenceless CDR architecture is described in detail. The measurement results and a performance comparison are presented in Section IV. Finally, conclusions are giveninsectionv. Fig. 2. Referenceless CDR architecture with a frequency acquisition loop and a phase tracking loop. signals was proposed to simplify the frequency detector circuit. However, it suffers from a limited capture range, and after locking the FD could be turned on by the input jitter. In [13], the capture range of the FD was greatly enhanced, and, in [14], the dead-zone problem was alleviated by further modifying the RFD circuit. However, to guarantee that the frequency residual after frequency acquisition was within the CDR capture range, which is only a few megahertz for linear CDRs, the minimum frequency range of one band was 1.37 and 1.48 MHz in these two works, respectively. Thus, they would face a tradeoff between the CDR capture range and the number of VCO bands. All of the limitations described above can be overcome by innovative circuit design techniques and additional circuit blocks as was done in [15], at the expense of more loading to the high speed clock and/or data path. The above discussion shows that the separate FD required in the referenceless CDR, as shown in Fig. 2, introduces many new problems, including higher complexity of circuits, the special requirement for clock edges at 1/4 UI spacing, and more chip area. To overcome these issues, a full-rate linear referenceless CDR without an FD, first presented in [16], is described in this paper. Its operation is instead based on the theory that, if a nonzero strobe point (SP) is deliberately introduced into the PD characteristic, the pull-in range will be enhanced as long as the initial frequency offset has the appropriate polarity [17]. Basedonthisconcept,afrequency acquisition mode (FAM) is implemented to correctly set the polarity of the SP of the PD transfer curve, according to the initial VCO frequency, so II. EFFECT OF STROBE POINT IN LINEAR CDRS To understand the proposed architecture, the effect of the strobe point of a linear PD on the capture range of a CDR is first considered. As shown in Fig. 3, the Hogge PD consists of two DFFs (DFF1 and DFF2), two XORs (XOR1 and XOR2), and a buffer BUF1, which is used to compensate the clock-to-q delay of DFF1 [18]. Thus the ideal transfer curve (corresponding to the output of the block in Fig. 3) would go through the origin, asshowninfig.4(a)wherethe -axis is the delay between the input data and the clock. However, because the clock-to-q delay of a DFF varies with PVT and the data pattern if the bandwidth of this DFF is limited, the delay differences between BUF1 and DFF1 and between DFF1 and DFF2 are usually not zero. Thus, the strobe point is generally not zero, as shown in Fig. 4(b). Considering these delay mismatches, the average output current of the linear PD is [18] where is the gain, in units of Amperes per second, of the linear PD, is the delay between the input data and the clock signal, and,and are the delay of DFF1, BUF1, and DFF2, respectively. Setting (1) to 0, the strobe point is given by: This expression indicates that the value of the SP is equal to the sum of delay differences between BUF1 and DFF1 and between DFF2 and DFF1. (1) (2)

HUANG et al.: 8.2 GB/S-TO-10.3 GB/S FULL-RATE LINEAR REFERENCELESS CDR WITHOUT FREQUENCY DETECTOR IN 0.18 ΜM CMOS 3 Fig. 4. PD transfer curves when the strobe point is (a) 0 and (b). The above analysis shows that the transfer characteristic of a linear CDR is very sensitive to the delay mismatch; we now examine the impact of this mismatch on the capture range of the CDR loop. In Fig. 5(a), the solid curve is the actual average output current of an ideal PD (i.e., with )whenthe VCO frequency is larger than the input bit rate, as a function of time, in the pull-in process during one cycle slip. As was done in [19], to simplify the analysis, the nonlinear pull-in process can be approximated by a piecewise-linear model, as shown in the dashed curve of Fig. 5(a). Between times and the loop filter is charged by the PD; between and the loop filter is discharged by the PD; is the end of the cycle slip. We define the charge time and the discharge time. In this case, and can be expressed as [20] Fig. 5. Piecewise-linear approximation of the output current transition of a PD in the pull-in process for when the strobe point is (a) 0 and (b). where is the period of the input data, is the difference between the VCO frequency and the input bit rate is the impedance of the loop filter at high frequencies, and is the gain of the VCO with units of hertz per volt. If we now assume a strobe point of and consider the same approximation method above, as shown in Fig. 5(b), and become (5) (3) (6) (4) Therefore, combining (5) and (6), the average output current of the linear PD in one cycle slip isgivenby(7),shownatthe bottom of the following page. Notice that (7) is valid regardless

4 IEEE JOURNAL OF SOLID-STATE CIRCUITS of the sign of. Based on (7), the initial difference between the VCO frequency and the input bit rate determines the effect of the SP. If initially (i.e., when,the loop filter should be charged by the transconductance circuit (i.e., ) in order to increase the control voltage of the VCO, so that the CDR can lock. However, if, this only happens when satisfies (8) According to (8), the largest frequency difference between and for which the loop will eventually lock is limited by. In particular, the capture range is reduced when the SP increases. On the other hand, if and is always negative and thus the capture range in this case (referred to here as the pulling-down capture range ) is enhanced by the positive SP. Similarly, if and is always positive and thus the capture range on this side (referred to here as the pulling-up capture range ) is enhanced by the negative SP. Since a strobe point can only increase the capture range on one side, we call this phenomenon single-sided capture range enhancement. The timing diagram shown in Fig. 6, corresponding to, illustrates how the negative strobe point affects the pull-in process. If the linear PD has zero strobe point, then the average value of UP DN is nearly zero. In contrast, if DFF1 in Fig. 3 has a large clock-to-q delay (giving a negative strobe point), the charge time lasts longer than the discharge time in each cycle slip. Therefore, the loop has more capability to charge than to discharge the loop filter so that the pulling-up capture range is increased by this negative SP. A behavioral simulation based on the loop parameters A/s, 1 GHz/V and the loop filter (as shown in Fig. 3) 1k 1nF, 100pFwasrunwith the strobe point set to 15 ps. The results for two VCOs with different initial frequencies of 9 and 11 GHz are shown in Fig. 7(a) and (b), respectively, and verify the analysis above. For both simulations a 2-1 PRBS data at 10 Gb/s is applied to the CDR input, but the CDR loop only locks in the first case. Likewise, if the strobe point were set positive, then the loop would lock if the initial VCO frequency were higher than the data bit rate. III. ARCHITECTURAL OVERVIEW AND CDR BUILDING BLOCKS In Section II, we examined the effect of the linear PD strobe point on the capture range of a CDR loop and explained how a strobe point can enhance the single-sided capture range. From Fig. 6. UP/DN signals comparisons when the strobe point is zero and negative in the pull-in process if the VCO frequency is slower than the input data rate. Fig. 7. Behavioral simulation results for pull-in process with ps strobe point in the linear PD when (a) 1 GHz and (b) GHz. this observation, the linear PD itself can serve as a frequency detector if the polarity of the strobe point is selected correctly, consistent with the difference between the initial VCO frequency and the input data bit rate. A referenceless CDR architecture based on this idea is described as follows. A. Architectural Overview The CDR architecture based on this concept is shown in Fig. 8(a). Other than those in the digital control circuit (DCC), all signals are implemented differentially. The resolution of the VCO capacitor bank is only 5 bits, since the single-sided (7)

HUANG et al.: 8.2 GB/S-TO-10.3 GB/S FULL-RATE LINEAR REFERENCELESS CDR WITHOUT FREQUENCY DETECTOR IN 0.18 ΜM CMOS 5 Fig. 8. (a) Referenceless CDR architecture block diagram. (b) Linearized block diagram. capture range is sufficiently wide and the requirement for the tuning range in each band does not need to be very stringent. and are externally set to 1 and 0.6 V, respectively, to correspond to the maximum and minimum VCO control voltages that set the frequency limits of each band. A counter circuit is implemented to sweep the capacitor bank. If the loop loses lock and is no longer between and, the CDR enters the frequency acquisition mode (FAM). In this mode, the SP of the PD is controlled by voltage, which is generated by the DCC. After setting the polarity of the SP appropriately (either 15 ps or 15 ps) and incrementing the counter each time leaves the range between and, the DCC eventually finds the correct band, and the loop becomes nearly locked at the input bit rate. The applied strobe point of ps was chosen large enough to overcome any variations in the inherent strobe point of the phase detector, but not so large that eventual locking would be affected. Thus will become almost constant, and the phase adjustment mode (PAM) takes over to reduce the SP and improve the jitter tolerance performance. A decision circuit is also implemented to retime the data with the recovered clock. In this architecture, the linear PD implements the frequency acquisition so that a dedicated FD is not needed and chip area can be reduced compared to conventional referenceless CDRs. B. Frequency Acquisition Mode and Phase Adjustment Mode One of the most important features in this architecture is the utilization of a linear PD with a non-zero strobe point as an FD to achieve a large capture range. A simple frequency acquisition algorithm, implemented as shown in Fig. 9, finds the correct band as well as brings the CDR close to lock in this band. The details are explained as follows. Suppose that initially the strobe point is set negative, which will cause to increase, and the VCO is operating in a frequency band that is higher than the bit rate. Once goes above,then goes high. This increments the counter that controls the VCO capacitor bank, shifting the VCO to a lower frequency band, while at the same time setting the RS flip-flop, which closes switches,and.since has been pre-charged during the previous cycle, this sets to (0.9 V), which sets the PD strobe point to approximately 15 ps, and the voltage on is pre-charged to (0.7 V), which will be used for frequency acquisition in the next band. Because the average charging current during one cycle slip that charges the loop filter is always

6 IEEE JOURNAL OF SOLID-STATE CIRCUITS Fig. 9. Circuit implementation of the FAM and the PAM in the DCC. negative as discussed in the previous section, starts to decrease. If the VCO has been set to the correct band, the CDR will lock, since the VCO frequency is swept from a frequency higher than the input bit rate before locking, and the positive SP can guarantee that the capture range in this case is larger than the frequency range of this band; otherwise, will continue decreasing until it goes below,atwhichtimethe counter is incremented again, changing the VCO to a new, lower frequency band. At this instant, the RS flip-flop is reset, closing switches,and, and thus setting to,which sets the SP to be approximately 15 ps to pull up the VCO frequency. This process continues until the appropriate band has been reached, and the frequency settles to the correct value with close to its final value. Notice that since the 5-bit counter circuit will start over to be 00000 after counting up one bit from 11111, the correct band will always be found, regardless of whether the initial VCO frequency is higher or lower than the input bit rate. Simulation results in Fig. 10 show the frequency acquisition process searching through three bands. In this simulation, 10 GHz and is initially set to 10.10 GHz. Initially, the VCO stays in the 10.10 10.18 GHz band, and is set to be, which corresponds to a negative SP. As discussed above, is increased until it goes above. At that instant, the polarity of the SP is switched and the VCO changes to the next lower band (10.04 10.12 GHz). Since this is still not the correct band, is decreased until it goes below. The polarity of the SP is switched again, and the counter increments one bit to bring the VCO frequency to the correct band (9.98 10.06 GHz). Soon afterward, the CDR loop locks at 10 GHz. Although the loop is actually locked after the FAM is completed, there remains an offset of 15 ps between the middle of thedataandtheclockrisingedge, depending on the polarity of the last chosen SP for the frequency acquisition as illustrated in Fig. 11. This offset would severely degrade the jitter tolerance performance as analyzed in [17]. Therefore, a negative feedback mechanism is designed in the PAM to adjust the strobe point. The SP is detected by the strobe point detector (SPD), whose output is converted to current by the transconductance circuit, as shown in Fig. 8(a). If the CDR is locked in the band when the positive SP is chosen, and are closed, and is conducted into in Fig. 9. Otherwise, and are Fig. 10. (a) Behavioral simulation result of frequency acquisition process with three bands searching. (b) Corresponding digital signals during this frequency acquisition process. Fig. 11. Phase relationship between the data and clock signal corresponding to strobe point after locking. closed, and is conducted into. The voltage on or, which feeds back to the linear PD, adjusts itself to set the PD strobe point to be very close to zero. Notice that in the architecture the FAM and the PAM are both active at the same time. However, the current in the transconductance circuit for the frequency acquisition is much larger than (by a factor of 10 in this design), as shown in Fig. 8(a). Therefore, while the loop is not locked, the FAM dominates, and is set to be or initially. In this case has little effect during the frequency acquisition process. Once the CDR is nearly locked and remains nearly constant, begins to be gradually varied by until the average output current of the SPD is almost zero. In this way, a dedicated lock detector is also not needed, and the transition between the frequency acquisition and the phase adjustment is robust and smooth. A similar technique, which was used in [21],

HUANG et al.: 8.2 GB/S-TO-10.3 GB/S FULL-RATE LINEAR REFERENCELESS CDR WITHOUT FREQUENCY DETECTOR IN 0.18 ΜM CMOS 7 Fig. 12. (a) Linear PD with adjustable strobe point. (b) Schematic of a variable delay cell. also makes use of a weaker secondary loop that has negligible effect on the dominating loop while it is active. C. Phase Detector and Strobe Point Detector The above description of the FAM and PAM shows that two blocks need to be carefully designed: the phase detector, which requires an adjustable strobe point controlled by,andthe strobe point detector, which itself requires a very small strobe point compared with that of the PD. Here, the two blocks will be discussed in detail, and a combined scheme is presented in the end. In Section II, (2) shows that the delay mismatch among the components in the conventional Hogge PD can lead to the presence of a strobe point. Though these mismatches are undesirable for the CDR after locking, it provides an effective way to generate the SP by deliberately introducing a delay mismatch into the PD for the frequency acquisition [17]. Accordingly, as shown in Fig. 12(a) two variable delay cells BUF2 and BUF3, whose delays are controlled by the differential voltage and, are inserted in the paths feeding to XOR1. Assuming the delays of BUF1, DFF1, and DFF2 are all perfectly matched, the value of the strobe point can now be expressed as where and are the delay of BUF2 and BUF3, respectively. The schematics of BUF2 and BUF3 are shown in Fig. 12(b). They consist of a pair of identical current mode logic (CML) buffers with PMOS loads. Voltages and should be low enough so that these two PMOS transistors are always in the triode region. The delay of each of these CML buffers can be expressed as [22], [23] (9) (10) where is the total capacitance at the output of the delay cell, is the hole mobility, is the gate oxide capacitance per unit area, is the channel width, is the effective channel length, is the gate voltage of the PMOS transistor, is the power supply voltage, and is the threshold voltage of the PMOS transistors. Combining (9) and (10), we have (11), shown at the bottom of the page. Thus, it can be seen that the value of SP increases with increasing control voltage.as shown in Fig. 13, the post-simulation results indicate that the delay of each buffer varies from 17.3 ps to 33.5 ps by changing from 700 mv to 900 mv, which provides enough range to allow to be set to ps as described in the previous section. A strobe point detector (SPD) is used to detect a phase difference between the data and clock signal after the FAM is completed. The requirement of the SPD circuit is that its own SP should be much smaller than that of the linear PD. It is known that bang-bang phase detectors are in general insensitive to delay mismatches and can be used as a strobe point detector to calibrate the SP in a linear PD [24]. In the proposed architecture, a simple binary PD is implemented as an SPD shown in Fig. 14. To illustrate the operation of the Fig. 14 SPD, a timing diagram is shown in Fig. 15. In Fig. 15(a), the clock signal leads the input data. Signal, which is triggered by the falling edge of RCK, has the same pattern as but is delayed by a half period. An additional half-period delay, realized by BUF4, gives. Since the delay mismatch is not critical in the SPD circuit, BUF4 is realized as an analog delay cell, rather than a flip-flop, to implement the delay in order to avoid additional loading on the high-speed clock. As a result, UP simply becomes the XOR of the previous data with the current data. In contrast, in Fig. 15(b), the clock signal lags the input data. Signal is triggered a half period before so that is exactly the same as.thusup is always low in this case. Table I compares the simulated values of the SP over different process corners (11)

8 IEEE JOURNAL OF SOLID-STATE CIRCUITS Fig. 13. Post-simulation results of the output of a delay cell varying from 700 to 900 mv. second corresponding to the PAM. Both are active simultaneously once the CDR loop has locked and the PAM has converged to the optimum strobe point. To more easily analyze the dynamics of the circuit a linearized block diagram is shown in Fig. 8(b). We can express the average value of current that is conducted into the loop filter as (12) Fig. 14. Strobe point detector block diagram. where, defined in (9), can be expressed as,where is the strobe point control voltage as shown in Fig. 8(a) and is the ratio of the change in to the change in,approximately 81 ps/v based on the Fig. 13 simulations. Based on the action of the strobe point detector and integrating capacitor, we can write and temperatures between the linear PD and the proposed SPD circuit. This shows that, even with compensation, a linear PD exhibits a significant variation in its strobe point over temperature and process variations, due to the clock-to-q delay of DFF1. It is clear that the SP of the SPD is much closer to zero than that of the linear PD in every case listed in Table I. One of the benefits of the proposed SPD circuit is that it can share some components with the Hogge PD to further reduce the power consumption. Fig. 16 shows the block diagram of PD/SPD circuit. The linear PD with an adjustable SP is comprised of DFF1, DFF2, BUF1, BUF2, BUF3, XOR1, and XOR2; the SPD circuit is comprised of DFF1, DFF2, DFF3, BUF4, XOR2, and XOR3. Voltage differences and are taken as the outputs of the linear PD and the SPD circuits, respectively. The DN signal is used to cancel the data pattern dependence in the SPD circuit as is the case for a linear PD, and DFF1, DFF2, and XOR2 are shared by both circuits. D. Steady-State Loop Dynamics As shown in Fig. 8(a) the circuit contains two loops the first corresponding to the conventional CDR operation and the (13) where is the current conducted at the output of the strobe point detector and is the gain, with units of A/s, of the strobe point detector. Combining (12) and (13), we have (14) Applying the current in (14) to the loop filter and VCO, we have the following open-loop gain for the overall circuit: (15) The first bracketed term in the above expression gives the loop gain of a conventional CDR, with and,andwhere is the equivalent series combination of and. The second bracketed term shows that the PAM loop contributes an additional pole at the origin and an additional left half-plane zero corresponding to.aslongas, the PAM will have little effect on the loop dynamics. For 510 pf, 81 ps/v,

HUANG et al.: 8.2 GB/S-TO-10.3 GB/S FULL-RATE LINEAR REFERENCELESS CDR WITHOUT FREQUENCY DETECTOR IN 0.18 ΜM CMOS 9 Fig. 15. Timing diagram of the SPD circuit when: (a) RCK leads,and(b)rck lags. TABLE I STROBE POINT COMPARISON Fig. 17. (a) Die photograph of the proposed referenceless CDR architecture. (b) Photograph of the testing board. Fig. 16. Linear PD combined with a strobe point detector. and 0.3 A s, this places below 10 khz, which is well below the value of the zero from the loop filter. E. Relationship Between Frequency Acquisition Time and Locked Behavior As discussed earlier, instead of containing a separate frequency control loop, this circuit automatically searches through Fig. 18. Measured frequency acquisition and phase adjustment processes (horizontal scale: 20 s/div, vertical scale: 200 mv/div). each band until it reaches the one that allows it to lock to

10 IEEE JOURNAL OF SOLID-STATE CIRCUITS Fig. 19. Eye diagrams of the retimed data at 8.2 Gb/s when the data pattern is (a) 1010 and (b) 2-1 PRBS. Eye diagrams of the retimed data at 10.3 Gb/s when the data pattern is (c) 1010 and (d) 2-1 PRBS. the input data. As illustrated in Fig. 10, the control voltage and VCO frequency change continuously within each band based on the average current given in (7) being conducted into capacitor. The unity-gain frequency of the loop characteristic, assuming that it is well within the one-pole roll-off region between the zero and the pole, is given by Substituting (16) into (7) and noting that (16),wehave (17) The condition for the second term on the right-hand side of (17) to be dominant is (18) The parameters chosen for this circuit satisfy this condition prior to arriving at the correct frequency band, and thus the acquisition time is given by (19) where is the number of bands traversed prior to locking. Based on this expression, Table II shows how and the loop parameters are affected by the various circuit parameters. In this table, and refer to the open-loop zero and pole, respectively. The inverse relationship between and parameters and is based on the assumption that the overall frequency range of the VCO is kept constant. Note that affects but does not affect the loop parameters. IV. EXPERIMENTAL RESULTS The chip has been fabricated using the Jazz Semiconductor SBC18 m BiCMOS technology, in which only CMOS transistors were used. The chip area is 1.4 1.4 mm, and its activeareais0.9 0.6 mm. The chip micrograph is shown in Fig. 17(a). It consists of the VCO and its drivers, the PD and SPD circuit, the digital control circuit and the output buffer. The

HUANG et al.: 8.2 GB/S-TO-10.3 GB/S FULL-RATE LINEAR REFERENCELESS CDR WITHOUT FREQUENCY DETECTOR IN 0.18 ΜM CMOS 11 Fig. 20. Measured jitter tolerance with and without the PAM when the input bit rate is 10 Gb/s. TABLE II EFFECT OF CIRCUIT PARAMETERS ON ACQUISITION TIME AND LOOP BEHAVIOR loop filter k 220 pf, 10 pf) and the strobe point feedback capacitors 510 pf) are off-chip. The test board is shown in Fig. 17(b). A differential high-speed data signal is applied to the chip inputs, and the differential data output from the retimer can be monitored externally. The supply voltage is 1.8 V, and the total power dissipation is 174 mw, not including the driver for the output data. The measured waveforms of and, illustrating the FAM and PAM processes, are shown in Fig. 18. At the beginning the CDR is locked at 9.7 Gb/s, which corresponds to band 6 (where band 1 is the highest and band 32 is the lowest frequency band of the VCO). At, the input bit rate is switched to 8.4 Gb/s, and the loop starts to lose lock. At,theFAMis activated. The correct band (band 24) is found at,andthen the PAM takes over. The system is phase locked at.thetotal acquisition time is 116 s. The measurement results show that the capture range of the circuit is from 8.2 to 10.3 Gbps with 1010 and 2-1 PRBS patterns. The retimed output data eye diagrams corresponding to 8.2 and 10.3 Gbps for the two patterns are shown in Fig. 19(a) (d), respectively. From Fig. 19(b) and (d), the random jitter, when the input data bit rate is 8.2 Gbps and 10.3 Gbps, is measured to be 0.386 and 0.336 ps, respectively, and the peak-to-peak pattern-dependent deterministic jitter is measured to be 9.3 and 7.7 ps -, respectively. Fig. 20 shows the jitter tolerance performance comparisons, along with the OC-192 mask, with and without the PAM at 10 Gb/s. To disable the PAM, the differential-mode component of was set to 0. Measured at the BER threshold of 10,the out-of-band jitter tolerance is 0.34 UI with the SPD activated, improved from 0.15 UI with the SPD deactivated. The measurement results prove that with the strobe point adjustment, the proposed architecture has a better jitter tolerance than simply using the initial strobe point of the linear PD. They also confirm that the SPD circuit described here has a smaller strobe point than that of a conventional linear PD. A comparison with other similar reported works is shown in Table III. The open-loop characteristic has its zero and pole at 360 khz and 8.3 MHz, respectively, resulting in a unity-gain frequency of 6.4 MHz. Because of the proximity of the pole and the unitygain frequency, the measured closed-loop jitter transfer exhibits peaking of 1.85 db, as shown in Fig. 21. (It can also be seen from Fig. 21 that the presence of the PAM has little effect on the jitter transfer, while it does improve the jitter tolerance as shown in Fig. 20.) The peaking could be reduced by decreasing the value of, thereby increasing the pole frequency, but at the expense of increased random jitter due to larger ripple on the control voltage. Modifying internal design parameters, such as, or the number of bands, would allow other possibilities for optimization between the acquisition time and steady-state loop parameters, as described in Table II.

12 IEEE JOURNAL OF SOLID-STATE CIRCUITS TABLE III CDR PERFORMANCE SUMMARY Fig. 21. Measured jitter transfer with and without the PAM when the input bit rateis10gb/s. V. CONCLUSION In this paper, an 8.2 Gb/s-to-10.3 Gb/s full-rate linear referenceless CDR without frequency detector has been presented. Based on the analysis of single-sided capture range enhancement, the phase detector can also function as a frequency detector by deliberately realizing a strobe point in the linear PD. Compared with recently reported referenceless CDR architectures, this work provides a robust technique to overcome the limited capture range of the linear CDR while maintaining good jitter performance. The jitter tolerance is also successfully improved using the phase adjustment mode after the frequency acquisition process is completed. ACKNOWLEDGMENT The authors would like to thank Broadcom Corporation for measurement facilities and the TowerJazz Shuttle Program for providing chip fabrication. REFERENCES [1] J. Cao, M. Green, A. Momtaz, K. Vakilian, D. Chung, K.-C. Jen, M. Caresosa,X.Wang,W.-G.Tan,Y.Cai,I.Fujimori,andA.Hairapetian, OC-192 transmitter and receiver in standard 0.18 mcmos, IEEE J. Solid-State Circuits, vol. 37, no. 12, pp. 1768 1780, Dec. 2002. [2] A. Momtaz, J. Cao, M. Caresosa, A. Hairapetian, D. Chung, K. Vakilian, M. Green, B. Tan, K.-C. Jen, I. Fujimori, G. Gutierrez, and Y. Cai, Fully-integrated SONET OC48 transceiver in standard CMOS, IEEE J. Solid-State Circuits, vol. 25, no. 12, pp. 1964 1973, Dec. 2001. [3] R. Inti, W. Yin, A. Elshazly, N. Sasidhar, and P. K. Hanumolu, A 0.5-to-2.5 Gb/s reference-less half-rate digital CDR with unlimited frequency acquisition range and improved input duty-cycle error tolerance, IEEE J. Solid-State Circuits, vol. 46, no. 12, pp. 3150 3162, Dec. 2011. [4] G. Shu, S. Saxena, W.-S. Choi, M. Talegaonkar, R. Inti, A. Elshazly, B. Young, and P. K. Hanumolu, A reference-less clock and data recovery circuit using phase-rotating phase-locked loop, IEEE J. Solid-State Circuits, vol. 49, no. 4, pp. 1036 1047, Apr. 2014. [5] G. Shu, W.-S. Choi, S. Saxena, T. Anand, A. Elshazly, and P. K. Hanumolu, A 4-to-10.5 Gb/s 2.2 mw/gb/s continuous-rate digital CDR with automatic frequency acquisition in 65 nm CMOS, in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 150 151.

HUANG et al.: 8.2 GB/S-TO-10.3 GB/S FULL-RATE LINEAR REFERENCELESS CDR WITHOUT FREQUENCY DETECTOR IN 0.18 ΜM CMOS 13 [6] R.-J. Yang, S.-P. Chen, and S.-I. Liu, A 3.125 Gb/s clock and data recovery circuit for the 10 Gbase-LX4 Ethernet, IEEE J. Solid-State Circuits, vol. 39, no. 8, pp. 1356 1360, Aug. 2004. [7] R.-J. Yang, K.-H. Chao, and S.-I. Liu, A 200-Mbps 2-Gbps continuous-rate clock-and-data-recovery circuit, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, no. 4, pp. 842 847, Apr. 2006. [8] R.J.Yang,K.-H.Chao,S.-C.Hwu,C.-K.Lian,andS.-I.Liu, A155.52 Mbps-3.125 Gbps continuous-rate clock and data recovery circuit, IEEE J. Solid-State Circuits, vol. 41, no. 6, pp. 1380 1390, Jun. 2006. [9] J. Song, I. Jung, M. Song, Y.-H. Kwak, S. Hwang, and C. Kim, A 1.62 Gb/s-2.7 Gb/s referenceless transceiver for DisplayPort v1.1a with weighted phase and frequency detection, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 60, no. 2, pp. 268 278, Feb. 2013. [10] S.-K.Lee,Y.-S.Kim,H.Ha,Y.Seo, H.-J. Park, and J.-Y. Sim, A 650 Mb/s-to-8 Gb/s referenceless CDR circuit with automatic acquisition of data rate, in IEEE ISSCC Dig. Tech. Papers, Feb. 2009, pp. 184 185. [11] A. Pottbacker, U. Langmann, and H. Schreiber, A Si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s, IEEE J. Solid-State Circuits, vol. 27, no. 12, pp. 1747 1751, Dec. 1992. [12] J. Lee and K. C. Wu, A 20 G/s full-rate linear clock and data recovery circuit with automatic frequency acquisition, IEEE J. Solid-State Circuits, vol. 44, pp. 3590 3602, Dec. 2009. [13] S. B. Anand and B. Razavi, A 2.75 Gb/s CMOS clock recovery circuit with broad capture range, in ISSCC Dig. Tech. Papers, Feb. 2001, pp. 214 215. [14] N. Kocaman, S. Fallahi, M. Kargar, M. Khanpour, A. Nazemi, U. Singh, and A. Momtaz, An 8.5-11.5-Gbps SONET transceiver with referenceless frequency acquisition, IEEE J. Solid-State Circuits, vol. 48, no. 8, pp. 1875 1884, Aug. 2013. [15] D. Dalton, K. Chai, E. Evans, M. Ferriss, D. Hitchcox, P. Murray, S. Selvanayagam, P. Shepherd, and L. DeVito, A 12.5-Mb/s to 2.7- Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate readback, IEEE J. Solid-State Circuits, vol. 40, no. 12, pp. 2713 2725, Dec. 2005. [16] S. Huang, J. Cao, and M. M. Green, An 8.2-to-10.3 Gb/s full-rate linear reference-less CDR without frequency detector in 0.18 m CMOS, in IEEE ISSCC Dig. Tech. Papers, Feb. 2014, pp. 152 153. [17] J. Cao, S. Huang, and M. M. Green, Non-idealities in linear CDR phase detectors, Int. J. Circuit Theory Applic., vol. 41, pp. 331 346, Apr. 2013. [18] C. R. Hogge, A self-correcting clock recovery circuit, J. Lightwave Technol., vol. LT-3, no. 12, pp. 112 1314, Dec. 1985. [19] R. E. Best, Phase-Locked Loops: Design, Simulation, and Applications. New York, NY, USA: McGraw-Hill, 1996. [20]T.Yoshimura,S.Iwade,H.Makin,andY.Matsuda, Analysisof pull-in range limit by charge pump mismatch in a linear phase-locked loop, IEEETrans.CircuitsSyst.I,Reg.Papers, vol. 60, no. 4, pp. 896 907, Apr. 2013. [21] A. Rezayee and K. Martin, A 9 16 Gb/s clock and data recovery circuit with three-state phase detector and dual-path loop architecture, in Proc. Eur. Solid-State Circuits Conf., Sep. 2003, pp. 683 686. [22] B. Razavi, Design of Integrated Circuits for Optical Communications. New York, NY, USA: McGraw-Hill, 2003. [23] U. Seckin and C. K. Yang, A comprehensive delay model for CMOS CML circuits, IEEETrans.CircuitsSyst.I,Reg.Papers, vol. 55, no. 9, pp. 2608 2618, Sep. 2008. [24] D. Rennie and M. Sachdev, A 5-Gb/s CDR circuit with automatically calibrated linear phase detector, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 4, pp. 796 803, Apr. 2008. Sui Huang (M 13) received the B.S. degree from Shanghai Jiao Tong University, Shanghai, China, in 2006, the M.S. degree from Waseda University, Tokyo, Japan, in 2008, and the Ph.D. degree from University of California, Irvine, CA, USA, in 2013, all in electrical engineering. During the summer of 2011, he was an Intern with Mindspeed Technologies Inc, Newport Beach, CA, USA, working on high-speed SerDes transceivers. From August 2013 to July 2014, he was with Terasignal Inc, Newport Beach, where he was focused on high-speed wireline transceivers for optical, copper, and backplane applications. He is currently with Nurotron Biotechnology Inc, Irvine, CA, USA, developing ultra-low-power analog/mixed-signal integrated circuit. Dr. Huang was the recipient of the Mindspeed Fellowship in 2011 and 2012 while with the University of California, Irvine. Jun Cao (S'96 M'99 SM'14) received the B.S. degree in physics from Peking University, Beijing, China, in 1994, the M.S.E.E. degree from the University of Michigan, Ann Arbor, MI, USA, in 1996, andtheph.d.degreeinelectricalengineeringfrom the University of California, Irvine, CA, USA, in 2003. In 1999, he joined the NewPort Communications, where he was one of the leading designers for the world's first commercial 10G CMOS transceiver. In 2000, NewPort Communications was acquired by Broadcom Corporation, Irvine, CA, USA, and he has been with the Analog/Mixed-Signal Group since, where he is currently a Distinguished Engineer and a Director of design engineering, working on circuits for 100GE and 400GE applications. He has published more than 30 journal/conference papers on the topics of high-speed transceivers, PLLs, ADCs, and DACs for wireline communications. He also has 60 issued or pending U.S. patents. Dr. Cao is a member of the technical program committee of CICC. Michael M. Green (SM 15) received the B.S. degree in electrical engineering from the University of California,Berkeley,CA,USA,in1984,andtheM.S.and Ph.D. degrees from the University of California, Los Angeles, CA, USA, in 1988 and 1991, respectively, in electrical engineering. He has been with the Department of Electrical Engineering and Computer Science, University of California, Irvine, CA, USA, since 1997, where he is currently a Professor. From 1999 to 2001, he was an IC Designer with the Optical Transport Group, Broadcom Corporation (formerly Newport Communications), Irvine. He has published over 100 papers in technical journals and conferences and holds six patents. His current research interests include the design of analog and mixedsignal integrated circuits for use in high-speed broadband communication networks and nonlinear circuit theory. Dr. Green was the recipient of the 1994 Guillemin-Cauer Award of the IEEE CircuitsandSystemsSociety,the1994W.R.G.BakerAwardoftheIEEE,a 1994 National Young Investigator Award from the National Science Foundation and the Award for New Technical Concepts in Electrical Engineering from IEEE Region 1. He has served as an associate editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, the IEEE TRANSACTIONS IN VERY LARGE-SCALE INTEGRATION (VLSI) SYSTEMS,andtheIEEETRANSACTIONS ON EDUCATION.

本文献由学霸图书馆 - 文献云下载收集自网络, 仅供学习交流使用学霸图书馆 (www.xuebalib.com) 是一个整合众多图书馆数据库资源, 提供一站式文献检索和下载服务的 24 小时在线不限 IP 图书馆图书馆致力于便利促进学习与科研, 提供最强文献下载服务图书馆导航 : 图书馆首页文献云下载图书馆入口外文数据库大全疑难文献辅助工具