A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFE for an ADC-Based Receiver. Clifford Ting

Size: px

Start display at page:

Download "A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFE for an ADC-Based Receiver. Clifford Ting"

Justin Todd
6 years ago
Views:

1 A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFE for an ADC-Based Receiver by Clifford Ting A thesis submitted in conformity with the requirements for the degree of Masters of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto Copyright c 2013 by Clifford Ting

2 A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFE for an ADC-Based Receiver Clifford Ting Master of Applied Science, 2013 Graduate Department of Electrical and Computer Engineering University of Toronto Abstract This thesis describes two design ideas in the area of ADC-based receivers. The first contribution of thesis is a 10Gb/s blind baud-rate CDR. The blind baudrate operation, which is made possible by using a 2UI integrate-and-dump filter, creates intentional ISI in adjacent bit periods. The blind samples are interpolated to recover center-of-the-eye samples for a speculative Mueller-Muller PD and a 2-tap DFE operation. The 65nm CMOS test chip has a measured high-frequency jitter tolerance of 0.19UI PP at ±300ppm of frequency offset. The second contribution of this thesis is a digital zero-forcing adaptive DFE. The DFE coefficients are calculated by correlating data samples with the recovered bits. Simulations show that the adaptive taps converge to the ISI values on the pulse response of the data signal. The CDR and adaptive 2-tap DFE have a high-frequency jitter tolerance of 0.28UI PP when simulated at 10Gb/s with an 8 FR4 channel. ii

3 Acknowledgements The work described in this thesis would not have been possible without the help and support of many people. I would like to thank my supervisor, Professor Ali Sheikholeslami, for his support and guidance during my M.A.Sc. studies and for being a great teacher. His optimism encouraged me to continue measuring and eventually publish the blind baud-rate test chip, even though it did not work initially. I would also like to the thank the thesis committee members, Professor Tony Chan Carusone, Professor Antonio Liscidini, and Professor Andreas Moshovos for reviewing the thesis and for their valuable feedback. My gratitude goes to Fujitsu Laboratories Ltd. for sending me to their office in Kawasaki, Japan to tape out the test chip. I am grateful to everyone who made my visit an enjoyable one in particular, thank you to Masaya Kibune, Hirotaka Tamura, Takuji Yamamoto, Kouichi Kanda, Takayuki Hamada, Junji Ogawa, Hirotaka Yamazaki, Yasumoto Tomita, and Iwao Sugiyama. I would like to thank Tamura-san for sharing his ideas with me and for always encouraging research discussions. A special thank you goes to Kibune-san who stayed late to keep me company during the tapeout, made sure I had everything I needed during my stay in Japan, and took time on weekends to give me a tour of Kyoto and Tokyo, even though he was very busy with his own work. Thank you to all the graduate students in BA5000 and BA5158 for making my time in graduate school a wonderful experience. In particular, the contributions in this thesis could not have been done without Josh Liang s and Sadegh Jalali s help during deiii

4 sign and measurement of the test chips. Also, I would like to thank the previous and current students of Professor Sheikholeslami s research group for their friendship and valuable discussions: Shayan Shahramian, Safeen Huda, Behrooz Abiri, Sadegh Jalali, Ravi Shivnaraine, Aynaz Vatankhahghadim, Josh Liang, Neno Kovacevic, and Farhad Ramezankhani. Most of all, I would like to thank my parents for their unconditional love and support. Thank you for always being there for me. iv

5 Contents 1 Introduction Motivation Thesis Objectives Thesis Outline Background Channel effects Receiver Equalization Linear Equalization Decision-Feedback Equalization (DFE) Equalizer Adaptation Zero-Forcing (ZF) Method Minimum Mean Square Error (MMSE) Method Maximum Eye Opening Method Clock and Data Recovery (CDR) Phase-Tracking CDR with Clock Feedback Blind Feed-forward CDR Blind CDR with Feedback Summary v

6 3 A Blind Baud-Rate CDR Blind 1x Data Recovery Concepts Proposed 1x Blind Receiver Architecture Receiver Implementation Integrate-and-Dump Filter Clock Generator Data Interpolator Mueller-Muller Phase Detector Decision-Feedback Equalizer Loop Filter Simulation and Measurement Results Summary A Zero-Forcing Adaptive DFE for an ADC-Based CDR Proposed DFE Adaptation Proposed Blind ADC-Based Receiver Architecture Proposed Digital CDR with Adaptive 2-tap DFE Data Interpolator Low-Pass Filter for DFE Adaptation Simulation Results Conclusion Conclusion Thesis Contributions Future Work Implementation of a Fully Feed-Forward Blind Baud-Rate CDR Evaluation of Phase-Dependent DFE for Data Interpolators Adaptive Optimization of Offset Coefficient in MMPD vi

7 5.2.4 Calibration of I&D and ADC Front End References 80 vii

8 List of Tables 4.1 Comparison of Adapted Coefficients (c 1 and c 2 ) vs. Pulse Response (h 1 and h 2 ) viii

9 List of Figures 2.1 The basic components of a communication system An example of a channel frequency response and the effect on an isolated data pulse Intersymbol interference (ISI) when transmitting a 1111 sequence Comparison of (a) binary and (b) ADC-based receivers (a) Linear and (b) non-linear receiver equalizers Frequency response of combined channel and linear equalizer Source-degenerated continuous time linear equalizer A 3-tap DFE example A speculative 1-tap DFE An example of channel+ffe pulse response (h(t)) and Nyquist response (g(t)). ISI is the difference between the two responses (r=g-h) An example of a receiver with a channel (with 2 pre-cursor and 4 postcursor taps of ISI) and a 2-tap FFE A partial model of a discrete-time receiver with channel and FFE A geometric representation of optimal zero-forcing FFE coefficients A model of a discrete-time receiver, including a ZF adaptation loop A example of minimizing average error by using steepest-descent algorithm A model of a discrete-time receiver with a DFE and LMS adaptation loop A system that adapts equalizer taps based on eye opening ix

10 2.18 A recovered clock sampling equalized data CDR classification System diagram of phase-tracking CDR with clock in feedback loop Example of a jitter tolerance chart (a) PD inputs and output and (b) linear model Alexander PD implementation Alexander PD examples with early and late CK RX Transfer function of Alexander PD with no jitter on data or CK RX Hogge PD implementation Hogge PD output with (a) early, (b) on-time, and (c) late CK RX Transfer function of Hogge PD Example of (a) pulse response and (b) MM function [21] System diagram of a 8x oversampled blind feed-forward (burst-mode) CDR [22, 27] The edge detection and data selection process from Figure A blind 2x ADC-based CDR [32] A blind 1.45x ADC-based CDR [33] System diagram of blind CDR with feedback [10] Analog data interpolator (DI) estimates center and edge samples from blind samples [10] Worst-case for 2x, 1.45x and 1x sampling on open eye diagram Comparison of theoretical worst-case jitter tolerance given the pulse responses of an ideal channel, 1UI I&D, and 2UI I&D. Blind baud-rate samples can shift across a 1UI range due to frequency offset System block diagram of interleaved analog front end (1 UI I&D and ADC) and digital CDR Comparison of (a) fully analog 2UI I&D and (b) analog and digital 2UI I&D 40 x

11 3.5 Handling (a) negative frequency offset: data (TX) is slower than blind receiver clock (CK RX ) (b) positive frequency offset: data (TX) is faster than blind receiver clock (CK RX ) Implementation of integrate-and-dump (I&D) circuit [28] I&D operating phases synchronized with clock pulses Implementation of clock pulse generator with adjustable delay for deskew (a) Effect of clock phase skew on the I&D integration period (b) Equal I&D integration periods after correcting clock skew Adjustable clock delay block Piecewise linear interpolation of desired sample from blind samples (a) Pulse response of an ideal channel followed by 2UI I&D (b) Proposed MM function Design and implementation of the speculative Mueller-Muller phase detector (MMPD) (a) A speculative 2-tap DFE and (b) the first stage of the parallel speculative DFE that recovers 8 bits per cycle The second stage of parallel speculative DFE that recovers 16 bits per cycle Loop filter with configurable proportional and integral gains Simulated loop filter convergence with 1000ppm of frequency offset for PRBS-7. Signals correspond to nodes on the block diagram of Fig Frequency response of channel models in simulation Simulated eye diagrams using Channel A + 2UI I&D Simulated eye diagrams using Channel B Simulated jitter tolerance results at 10Gb/s with a BER of Chip photo Measurement setup Average ADC output given DC input (a) before and (b) after skew correction 56 xi

12 3.25 Measured channel frequency response Measured eye diagrams (a) after the channel and (b) after the ADC ADC Simulated and measured jitter tolerance results with 10Gb/s PRBS-7 input data and BER of 10 6 and 10 12, respectively ISI can be calculated by correlating sampled data (A k, A k 1, etc.) with recovered bits (x k, x k 1, etc.) Zero-forcing controller for n-tap DFE adaptation System diagram of proposed receiver with 3-bit ADC-based CDR and adaptive DFE Data interpolator calculates sample at desired location from closest blind samples. (a) Negative or (b) positive frequency offsets result in occasional skipped or extra interpolated samples Proposed digital CDR with adaptive DFE Piecewise linear interpolation of desired sample from 2x blind samples Frequency responses of 1x and 2x data interpolators. Both interpolators operate on a 10Gbps data signal with a Nyquist frequency of 5GHz Low-pass filter for DFE coefficients Hysteresis block implemented in low-pass filter Frequency responses of channel models used in simulation Combined channel and interpolator pulse responses showing ISI tap values (h 1, h 0, h 1, h 2, h 3 ) when CDR has locked Simulated DFE adaptation with Channel C at 10Gbps. DFE converges to same steady-state values when given different initial coefficients (i.e. 0 and 30) Simulated DFE adaptation with Channel D at 10Gbps. DFE converges to same steady-state values when given different initial coefficients (i.e. 0 and 30) xii

13 4.14 Simplified diagram of CDR model used for eye diagram simulations Simulated eye diagrams with 5Gbps data and Channel C. Eye diagrams correspond to signals in Figure Simulated eye diagrams with 10Gbps data and Channel C. Eye diagrams correspond to signals in Figure Simulated eye diagrams with 10Gbps data and Channel D. Eye diagrams correspond to signals in Figure Simulated jitter tolerance of proposed receiver xiii

14 List of Acronyms ADC Analog-to-Digital Converter BER Bit-Error Rate CDR Clock and Data Recovery CTLE Continuous-Time Linear Equalizer DFE Decision Feedback Equalizer DJ Deterministic Jitter CP Charge pump DI Data Interpolator FFE Feed-Forward Equalizer FIR Finite Impulse Response FR4 A type of glass-reinforced epoxy laminate printed circuit board Gb/s Gigabits per second Gbps Gigabits per second ISI Intersymbol Interference LMS Least Mean Square xiv

15 MMPD Mueller-Muller Phase Detector MMSE Minimum Mean Square Error NRZ Non-Return-to-Zero PCB Printed Circuit Board PD Phase Detector PI Phase Interpolator PRBS Pseudo-Random Binary Sequence PVT Process, Voltage and Temperature RJ Random Jitter UI Unit Interval USB Universal Serial Bus VCO Voltage-Controlled Oscillator ZF Zero-Forcing xv

16 1 Introduction The rapid improvements in processor speeds and other digital computation have enabled the development of applications such as Internet and video conferencing. These applications have, in turn, caused a growing demand for high-speed data communication. One particular trend is the centralization of processing and storage resources in cloud computing centers [4,9]. While cloud computing reduces complexity and power consumption of client devices (e.g. laptops, tablets, cell phones, etc.), it comes at a cost of requiring high bandwidth communication between the client device and computing center [4]. This thesis focuses on the part of the communication system that transfers digital data from one chip to another via wireline channels. In recent years, the wireline channels used in chip-to-chip communications have not improved at the same rate that silicon technologies have advanced. In addition, the desire to minimize production costs has limited the number of I/Os and channels available to each chip. Hence, we are forced to send ever increasing amounts of data per channel in the presence of channel imperfections. Accordingly, new circuit innovations are required in order to achieve higher data rates Motivation The main channel imperfection that limits data rates is the channel s bandwidth. As data rates increase, the data signal experiences frequency-dependent attenuation due to the dielectric and conductive losses in the channel [12, 15]. Analog circuits can be used 1

17 Chapter 1. Introduction 2 to equalize the signal and recover the digital data [13, 20, 30]. Compared to their digital counterparts, the analog circuits consume less power, but are more vulnerable to process, temperature, and voltage (PVT) variations. As the technology scales to smaller device sizes and lower voltages, analog circuits benefit less from the technology advances and, in fact, are at a disadvantage because they require voltage headroom. Digital circuits, on the other hand, port easily and perform better with each successive technology. The role of a clock-and-data recovery (CDR) block is to recover the transmitted data on the receiver side by sampling the data signal either with a phase-tracking [3,7,11,37] or blind clock [1,32,36]. Sampling with a blind clock removes the feedback loop between analog and digital circuits and results in faster development because the analog and digital blocks can be designed independently. Digital equalizers and CDRs require a high-speed analog-to-digital converter (ADC). However, the ADC consumes significant power. One of the goals of this thesis is to increase data rate without increasing ADC power. We can accomplish this by reducing the ADC s sampling rate while maintaining the bit rate. The first part of this thesis proposes a CDR that can recover data from blind, baud-rate samples. This will be in contrast with the previous work where 2x [32,36] or 1.45x [33] the baud rate was used as the sampling rate. The second part of this thesis proposes an adaptive DFE for the blind, baud-rate CDR. The proposal simplifies the DFE controller compared to a previous LMS adaptive DFE for blind CDRs [1] Thesis Objectives This thesis presents the design and implementation of a blind baud-rate CDR for an ADC-based receiver, and the architecture of a zero-forcing adaptive DFE for a blind CDR. The main objectives of the thesis are as follows: Provide a background on different types of adaptive equalizers and clock-and-data

18 Chapter 1. Introduction 3 recovery systems, Investigate and propose a CDR to recover data from blind, baud-rate ADC samples, Present the implementation, simulation, and measurement results to show proposed CDR s functionality, Investigate and propose an adaptive DFE for a blind CDR, Present the implementation, and simulation results to show the DFE controller functionality; measurement is left as future work Thesis Outline The remaining chapters of this thesis are organized as follows: Chapter 2 provides a background on different types of adaptive equalizers and clock-and-data recovery systems, Chapter 3 describes the concept of blind, baud-rate data recovery, proposes a CDR architecture, and presents simulation and measurement results, Chapter 4 proposes a novel DFE controller for the CDR developed in Chapter 3, and presents simulation results, Chapter 5 concludes the thesis and provides the future directions for this work.

19 2 Background This purpose of this chapter is to present the basic concepts needed to understand the contributions of this thesis and to review existing architectures of some of the blocks used in high-speed system for communicating digital data. The communication system shown in Figure 2.1 consists of three main components: a transmitter, channel, and receiver. The channel is the physical medium (e.g. wireline, wireless, or optical) that connects the transmitter to the receiver. In this thesis, we will focus on systems designed for electrical wireline channels. Two examples of wireline channels include the traces on an FR4 PCB and copper wire in Ethernet and USB cables. Digital Data Transmitter (TX) Channel Receiver (RX) Recovered Data Figure 2.1: The basic components of a communication system The transmitter converts the digital data into a signal that can be transmitted across the channel (e.g. electrical pulses with NRZ coding). The receiver samples the signal at the other end and recovers the digital data. Bit errors occur when the recovered data does not match the original data at the transmitter. The goal of the high-speed communication system is to minimize the bit error rate (BER). For wireline applications, the target BER is usually below At the time of this writing, the term, high-speed, refers to the sending and receiving of data on the order of gigabits per second (Gbps). Wireline channels have not improved 4

20 Chapter 2. Background 5 much as data rates have increased. Hence, the transmitter and receiver must compensate for the non-idealities of the channel (e.g. bandwidth limitations) in order to reduce the BER below the target rate while minimizing their power consumption. In this chapter, we will focus on the blocks and different architectures of the receiver and omit the details of transmitter because this thesis contributes in the area of receiver architecture. The chapter is organized as follows. Section 2.1 describes channel nonidealities and their effect on the transmitted data signals. Sections 2.3 and 2.4 describe how equalizers compensate for channel bandwidth limitations and how to adapt an equalizer to match a particular channel. Section 2.5 discusses different types of clock-and-data recovery blocks Channel effects The top of Figure 2.2 shows an example of a frequency response of the channel from the transmitter to the receiver (also known as the S 21 parameter). The frequency, f b, is the baud rate of the data (e.g. if the data rate is 10Gbps with NRZ coding, then f b would be 10GHz). We are mostly interested in the channel frequency response up to f b /2 since the data pattern with the highest transition density ( ) has a frequency of f b /2. In a FR4 channel, the skin effect of the copper trace and dielectric loss from the surrounding PCB substrate cause the channel response to be attenuated at high frequencies. As we increase the data rate, the channel will further attenuate the signal at f b /2. The bottom of Figure 2.2 shows the transmitted and received data pulses. Assuming NRZ coding, the transmitted pulse has a duration of T b =1/f b. The figure shows a digital 1 being sent; if a digital 0 were being sent, then pulse amplitude would be negative. The transmitted pulse is a Nyquist pulse; if we sample the pulse at baud rate, f b, then we would have only one non-zero sample at h 0. However, the channel s frequency-dependent attenuation spreads the pulse energy into adjacent T b bins (h 1, h 1, h 2, etc.). If we

21 Chapter 2. Background 6 Channel Freq. Response f b Freq. TX Channel RX h 0 Pulse Response h 1 h 2 T b =1/f b h 0 h -1 h 1 h 2 T delay + Δt Time Figure 2.2: An example of a channel frequency response and the effect on an isolated data pulse transmit a sequence of bits, as depicted in Figure 2.3, then the received signal would become a superposition of pulses. B k h 0 B k-1 h 1 B k-2 h 2 B k+1 h -1 Figure 2.3: Intersymbol interference (ISI) when transmitting a 1111 sequence Let x k represent a sample in the received signal. Our goal is to recover the transmitted bit, B k, from the sample, x k. However, due to the spreading of the pulse energy, x k also includes components from previous bits (B k 1, B k 2 ) and future bits (B k+1 ). This

22 Chapter 2. Background 7 interference is known as intersymbol interference (ISI). The example in Figure 2.3 includes 3 ISI components in addition to B k h 0. If the ISI components are left uncompensated they can corrupt the recovery of B k and cause bit errors. x k = B k+1 h 1 + B k h 0 + B k 1 h 1 + B k 2 h 2 (2.1) In general, the sample, x k, can be expressed as the following: x k = B k h 0 }{{} Main cursor + i<0 B k i h i + B k i h i i>0 }{{} Precursor ISI }{{} Post-cursor ISI (2.2) The ISI caused by previous and future bits are known, respectively, as pre-cursor and post-cursor ISI. In order to successfully recover the transmitted bit, B k, the main cursor must be the dominant cursor. If the main cursor is dominant, then data eye diagram will be open. h 0 > i 0 h i (2.3) In addition to ISI, the channel also introduces a propagation delay, T delay, shown in Figure 2.2, between the transmitted and received pulses. T delay is a constant delay when transmitting a given pulse response over a given channel. However, since we usually design the transmitter and receiver to work with a range of channels, T delay is not known at the time of design. In practice, the time when signal the arrives at the receiver usually deviates from T delay. This timing deviation is defined as jitter (shown as t in Figure 2.2) and can be modeled as a random process. In general, jitter can be split into two components [18]: deterministic jitter (DJ) and random jitter (RJ). DJ is bounded while RJ is unbounded and has a Gaussian distribution. Channel imperfections (e.g. bandwidth limitations, reflections, crosstalk, and electromagnetic interference) can cause part of the DJ. RJ is

23 Chapter 2. Background 8 mostly caused by noise from the circuits in the transmitter and receiver (e.g. thermal, shot, and flicker noise) [18]. In order to compensate for the channel s non-idealities, a typical receiver contains two main blocks: equalizer and clock-and-data recovery (CDR) block. Equalizers are commonly used to reduce pre-cursor and post-cursor ISI in order to fulfill Equation 2.3. The CDR block recovers the data below the target BER by compensating for the propagation delay and jitter Receiver Figure 2.4a shows a conventional binary receiver with a phase-tracking clock in a feedback loop. The analog equalizer reduces ISI in order to open the eye. The eye opening allows the comparator to sample the received data where the error probability is at its minimum and regenerate the data bit. The comparator is followed by a CDR that detects the phase of the equalized data signal and aligns the rising edge of the clock (CK REC ) with the center of the data eye. This kind of receiver is binary because the sampling comparator only captures the sign of incoming data signal. Since equalization requires both sign and magnitude, all of the equalization must be done before the comparator in the analog domain. Analog Equalizer CDR Data CK REC (a) Analog Equalizer ADC Digital Equalizer CDR Data CK REC (b) Figure 2.4: Comparison of (a) binary and (b) ADC-based receivers

24 Chapter 2. Background 9 Alternatively, we can transform the binary receiver into an ADC-based receiver by replacing the comparator with an ADC as illustrated in Figure 2.4b. By incorporating an ADC in the receiver, we capture both sign and magnitude of the signal after the analog equalizer. Now that we have obtained magnitude information, it becomes possible to perform additional equalization in the digital domain after the ADC. The digital equalizer and CDR architectures (discussed in Sections 2.3 and 2.5) can be implemented entirely with HDL code. The main disadvantages of ADC-based receivers are the high power consumption and large area of the ADC. However, there are several benefits of a digital equalizer and CDR implementation: Digital blocks are immune to PVT variations (assuming that timing constraints are met across all corners) HDL code is easily ported across technology nodes using automatic synthesis, place, and route software, whereas analog blocks must be manually designed for each process technology. The digital blocks scale with more advanced technology nodes (which often benefit digital blocks more than analog ones). The digital equalizer can be easily combined with a digital controller. Furthermore, the adaptive controller may benefit by gaining access to both sign and magnitude information. In analog equalizers, the adaptation controller often only has access to sign information Equalization Equalization can be implemented at the transmitter (pre-equalizer), receiver (post-equalizer), or both. It is easier to perform post-equalization for two reasons. First, the equalizer may change the signal swing, which, if implemented in the transmitter, alters the amplitude of output signal. In many cases, the transmitter is designed for a set of communication

25 Chapter 2. Background 10 standards that impose constraints on the signal swing in the channel. The standards do not impose constraints on the signals internal to the receiver. Second, adaptive equalization is more easily implemented in the receiver compared to the transmitter. The adaptive controller requires information about the channel response, which can be estimated at the receiver. If the adaptive controller were implemented at the transmitter, it would require feedback to be sent back from the receiver through an auxiliary channel. For these reasons, some systems include a small amount of constant equalization at the transmitter and adaptively equalize most of the frequency-dependent attenuation at the receiver [36]. In this thesis, we will focus on post-equalizers. There are two broad categories of equalizers: linear and non-linear. A receiver may include one type or both types of equalizers. We will discuss them in more detail in Sections and Data Signal (x K ) Partially Equalized Signal (x K ) Linear EQ w K y K C(z) Recovered Data (A K ) Non-Linear Equalizer Figure 2.5: (a) Linear and (b) non-linear receiver equalizers Linear Equalization The main purpose of equalization is to improve BER by decreasing the ISI caused by the high-frequency attenuation of the channel. If we can cascade the channel and equalizer such that the overall response is flat up to f b /2, then most of the ISI will be eliminated. A linear equalizer achieves the final flat response by either emphasizing (i.e. boosting) the high frequency content or by de-emphasizing (i.e. attenuating) the low frequency content in the data signal. Figure 2.6 shows the latter.

26 Chapter 2. Background 11 Channel Response Linear EQ Response = Channel + Linear EQ Response f b /2 f b /2 f b /2 Figure 2.6: Frequency response of combined channel and linear equalizer A linear equalizer can be implemented with a continuous-time or discrete-time architecture. Figure 2.7 depicts an example of a commonly-used continuous-time linear equalizer (CTLE). Usually, R S or C S is made programmable so that the zero at f z can be adjusted to match the channel response. CTLEs can only be implemented with analog circuits. R L V OUT + R L A V Vin+ R S Ving m C L C L A V - fz fp1 fp2 g m R L f P1 1 2πR L C L C S 1 f Z 2πRS C S f P2 1 + (g m R S )/2 2πR S C S Figure 2.7: Source-degenerated continuous time linear equalizer A discrete-time linear equalizer (also known as feed-forward equalizer or FFE) can be implemented either in the analog domain or digital domain (if the receiver is ADCbased). They usually include an infinite impulse response (IIR) or finite impulse response (FIR) filter that boosts high-frequency content of the data signal. The main disadvantage of a linear equalizer is that it boosts not only the highfrequency content of the signal, but also high-frequency noise. Sources of noise include

27 Chapter 2. Background 12 thermal noise from the transmitter and receiver, crosstalk, and ADC quantization noise (in the case of a digital FFE). The latter can add a significant amount of noise to the signal; hence, we can either increase the ADC resolution to reduce the quantization noise or reduce the FFE gain (and rely on the decision-feedback equalizer described in the next section) Decision-Feedback Equalization (DFE) A decision-feedback equalizer (DFE) is a non-linear equalizer that removes post-cursor ISI from the channel using the recovered data and a filter (shown as C(z) in Figure 2.5) whose pulse response matches the post-cursor ISI of the partially equalized signal. The DFE response is given by: w k = i>0 A k i c i (2.4) We assume a very low BER such that we can approximate the original data, B k, with the recovered data, A k. The optimal equalization occurs when the DFE coefficients match the ISI taps of the partially equalized signal (i.e. c i = h i): y k = x k w k = (B k h 0 + i<0 B k i h i + i>0 B k i h i) i>0 A k i c i (2.5) y k = B k h 0 + i<0 B k i h i One disadvantage is that the DFE cannot remove pre-cursor ISI since the DFE feedback path would need future recovered data to estimate the pre-cursor ISI. However, linear equalizers can reduce pre-cursor ISI and are often used in conjunction with a DFE. Another disadvantage of the DFE is error propagation. When an incorrect decision is made, the wrong data is fed back through C(z) and may cause incorrect decisions on future data.

28 Chapter 2. Background 13 The DFE s main advantage is the absence of high-frequency noise amplification. Unlike a linear equalizer that amplifies both signal and noise, the DFE slicer regenerates the digital data without noise and the noise-less signal is fed back for equalization. Recovered Data Partially Equalized Signal Critical Timing Path A K-1 A K-2 A K-3 D Q D Q D Q c 1 c 2 c 2 DFE FIR = C(z) = c 1 z -1 + c 2 z -2 + c 3 z -3 Figure 2.8: A 3-tap DFE example The filter, C(z), can be implemented as either an FIR or IIR filter. Figure 2.8 illustrates a DFE example with a 3-tap FIR filter. The coefficients c 1, c 2, and c 3 should be adjustable in order to accommodate different channels. Section 2.4 describes some adaptive controllers that can set appropriate DFE coefficients for a given channel. The feedback loop indicated in Figure 2.8 poses a challenge in meeting timing constraints during design of the DFE. If the DFE is implemented in the analog domain, the high capacitance at the adder node slows down the propagation of the feedback signal. The problem occurs if A k 1 cannot be recovered and sent back to the adder in time for the next bit. A digital DFE in an ADC-based receiver would face similar issues digital adders are slower than analog ones. One solution to the timing problem is to employ speculation on the recovered bit. In this thesis, we assume NRZ coding where B k is either +1 or -1. Figure 2.9 shows an example of a 1-tap speculative DFE. It subtracts both c 1 and c 1 from the received signal and later selects the correct result using a mux. The speculation removes the gain and the adder from the feedback loop; only the mux and register remain in the critical path. The cost of speculation is the area and power

29 Chapter 2. Background 14 consumed by the extra adder. In particular, speculative DFEs do not scale easily because the number of adders increases exponentially as we increase the number of DFE taps. Partially Equalized Signal +c 1 -c 1 Critical path is faster D Q Figure 2.9: A speculative 1-tap DFE A K Equalizer Adaptation When designing a receiver, we usually intend the receiver to work with a range of channels. In addition, the ISI in the received signal may vary with process and temperature. Hence, we would not know the exact amount of equalization required at the time of design. As described in Sections and 2.3.2, both linear and decision-feedback equalizers usually include configurable coefficients which can be adjusted by a controller to obtain an appropriate amount of equalization during receiver operation. This section describes three different adaptation methods: zero-forcing (ZF), minimum mean square error (MMSE), and maximum eye-opening Zero-Forcing (ZF) Method A zero-forcing (ZF) equalizer attempts to force all ISI components to zero. If the equalizer does not have enough taps (i.e. degrees of freedom), to force all ISI to zero, then the optimal tap values should minimize the mean-squared sum of ISI components. This section presents a ZF controller for a linear equalizer; we will discuss and compare ZF and LMS controllers for DFEs at the end of Section The ZF analysis and examples in this section are taken from [5] and [13] and reproduced here for convenience. First, we will find the optimal equalizer coefficients in terms

30 Chapter 2. Background 15 of zero-forcing criteria. Second, we will describe a feedback loop that converges to the optimal coefficients. Figure 2.10 shows an example of a combined channel and equalizer pulse response, h(t), and the desired Nyquist response, g(t). The sampled versions of the responses can be represented with vectors, h and g, respectively, and we assume that p 1 and p 2 are constants such that pre-cursor and post-cursor taps outside the range of k p 1 to k + p 2 are zero. In Figure 2.10 s example pulse response, p 1 and p 2 are 2 and 4, respectively. We define the ISI vector, r, to be the difference between the actual and desired responses, h and g, respectively. In this and the next section, note that the transmitted and recovered data are represented with vectors, b k and a k, instead of B k and A k to distinguish the data vectors from other matrix quantities. g(t) h(t) r k = g k - h k = g(kt b - T delay ) - h(kt b ) Define: g = [ g k-p1 g k-1 g k g k+1 g k+p2 ] T h = [ h k-p1 h k-1 h k h k+1 h k+p2 ] T r = [ r k-p1 r k-1 r k r k+1 r k+p2 ] T Figure 2.10: An example of channel+ffe pulse response (h(t)) and Nyquist response (g(t)). ISI is the difference between the two responses (r=g-h). r = g h (2.6)

31 Chapter 2. Background 16 The goal of a ZF equalizer is to minimize the energy of ISI, r 2. r 2 = g h 2 = (g h) T (g h) (2.7) Figure 2.11 provides a simple example of a system with a channel, FFE, and receiver. In this case, the FFE is a 2-tap finite impulse response (FIR) filter. In Figure 2.11, f(t) represents the channel pulse response. However, if the system includes both a CTLE and FFE, then f(t) would be the convolution of the channel and CTLE responses. The vector, c, represents the FFE coefficients. b k, y k, and a k represent the source data, equalized signal, and recovered data, respectively. Figure 2.12 models the system with matrix and vector quantities. b k f(t) Channel T b x 1 (t) x 2 (t) c 1 c 2 y(t) a k F = Delay element f(-2) f(-1) f(4) 0 0 f(-2) f(3) f(4) b k = [ b k+2 b k+1 b k-4 b k-5 ] T x k = [ x 1 (kt) x 2 (kt) ] T c = [ c 1 c 2 ] T Figure 2.11: An example of a receiver with a channel (with 2 pre-cursor and 4 post-cursor taps of ISI) and a 2-tap FFE c b k F x k = Fb k a k Channel + Linear EQ y k = x k T c = b k T F T c Figure 2.12: A partial model of a discrete-time receiver with channel and FFE Given Figure 2.12, we see that the pulse response is h = F T c. Therefore, we substitute

32 Chapter 2. Background 17 h into Equation 2.7: r 2 = (g F T c) T (g F T c) = g T g 2g T F T c + c T F F T c (2.8) In order to find the optimal c that minimizes ISI (i.e. c OP T ), we take the derivative of Equation 2.8 and set it equal to zero: c ( r 2 ) = 2c T F F T 2g T F T = 0 (2.9) c OP T = (F F T ) 1 F g (2.10) As an example, let us assume that the FFE has two taps (i.e. c is a 2x1 vector). As illustrated in Figure 2.13, h = F T c can be represented as a 2D plane. If g lies on the plane, then there exist values for the two taps that can compensate the ISI completely. However, if g is not on the plane, then we can find c = c OP T such that the length of r is minimum. This occurs when r is orthogonal to the plane spanned by h. g h OPT = F T c OPT r h = F T c Figure 2.13: A geometric representation of optimal zero-forcing FFE coefficients Figure 2.14 shows the model from Figure 2.12 with a ZF feedback loop. The vector n k represents white noise generated by the receiver s circuits. The error e k is the difference between the received sample, y k, and the desired signal, which we generate using the desired pulse response, g, and recovered data, a k. We define v k = n T k c to be noise shaped by the FFE coefficients. We also assume a low BER such that a k b k. In order

33 Chapter 2. Background 18 Controller a k Ma k e k M e k b k F n k g c Shift x k y k Reg a k Channel + Linear EQ Figure 2.14: A model of a discrete-time receiver, including a ZF adaptation loop to show that the feedback loop converges correctly, we find the error, e k, in terms of b k, r, and v k. e k = a k T g y k = b k T g (F T b k + n k ) T c = b k T (g q) n k T c (2.11) = b k T r v k The ZF adaptation correlates the error, e k, with the recovered bits, a k. Equation 2.12 takes the average of the correlation term to find the ISI vector, r. E[a k e k ] = E[a k (b k T r v k )] = (E[a k (b k T r v k )]) = (E[b k b k T ]r E[b k v k ]) (2.12) = r The integrator in the feedback loop forces the average of the weighted quantity Ma k e k

34 Chapter 2. Background 19 to zero. The matrix, M, is a parameter that sets the gain of the feedback loop and maps ISI taps to the FFE coefficients. To find M, we assume that b k is a sequence of independent bits such that E[b k b k T ] = I where I is the identity matrix. We also assume that the data is uncorrelated with noise (i.e. E[b k v k ] = 0). ME[a k e k ] = Mr = 0 (2.13) M(g F T c) = 0 (2.14) c = (MF T ) 1 Mg (2.15) By comparing Equations 2.10 and 2.15, we see that the feedback converges to the optimal tap values if M = uf (where u is a scalar that determines loop gain) or, more generally, M = U F (where U is a matrix). This result implies that an optimal M should be selected based on channel and equalizer responses. It appears that, by using ZF adaptation, we have changed the problem of choosing c into one of choosing M. However, it turns out that M is a less sensitive parameter compared to c. In practice, M is chosen based on the worst-case channel that the system is designed for; in other cases, the adaptation loop will not converge optimally, but will be close enough [5] Minimum Mean Square Error (MMSE) Method The minimum mean square error method seeks to minimize the average power of the error, E[e 2 k ], between the received signal, x k, and the desired signal. The error may include the effects of both ISI and random noise. This is in contrast with the zero forcing method where the adaptation algorithm minimizes r 2, which only includes ISI. An implementation of a MMSE controller for a DFE is described in [1]. Figure 2.15 illustrates how we can find the MMSE using the steepest descent algo-

35 Chapter 2. Background 20 rithm. We assume that E[e 2 k ] is well-behaved with respect to the equalizer tap values, c k = [c 1k c 2k... c ik... c Nk ], and that following the gradient at all c k will lead to the minimum E[e 2 k ]. For each c ik, we start with an initial value and increment or decrement it in the direction of decreasing average error power. E[e 2 ] Increment or decrement c i in direction of decreasing E[e 2 ] Minimum E[e 2 ] c i Figure 2.15: A example of minimizing average error by using steepest-descent algorithm c i(k+1) = c ik u E[e2 k ] c ik (2.16) In a receiver system, it is usually not practical to measure E[e 2 k ]; therefore, we approximate Equation 2.16 by replacing the expected value with the instantaneous value. When this approximation is made, the steepest descent algorithm is known as the least mean square algorithm. c i(k+1) = c ik u (e2 k ) c ik c i(k+1) = c ik 2ue k (e k ) c ik (2.17) Equation 2.17 can be applied to any equalizer. Figure 2.16 shows a LMS feedback loop implemented for a DFE. In order to apply the steepest-descent algorithm, it is necessary to relate the error, e k, to the DFE coefficients, c k in Equation 2.18.

36 Chapter 2. Background 21 c k w k Controller 2u {a k-1, a k-2, a k-3, } g Shift Reg x k y k a k DFE Figure 2.16: A model of a discrete-time receiver with a DFE and LMS adaptation loop e k M e k = y k g jk a k j j=1 N e k = (x k c ik a k i ) i=1 j=1 (2.18) M g jk a k j From Equation 2.18, we can find the derivative of e k with respect to c ik : We can substitute Equation 2.19 into Equation 2.17: (e k ) c ik = a k i (2.19) c i(k+1) = c ik + 2ue k a k i (2.20) We can implement Equation 2.20 as the controller in Figure It is possible to further simplify the controller to replace e k or a k i or both with only their signs (i.e. sgn(e k ) and sgn(a k i )). These simplified LMS controllers are respectively known as sign-error, sign-data, or sign-sign. It is also interesting to compare the ZF and LMS controllers in Figures 2.14 and 2.16.

37 Chapter 2. Background 22 If we replace the FFE in Figure 2.14 with a DFE and substituted M = 2uI (where I is the identity matrix), then the ZF controller would be identical to the LMS controller for a DFE. This is expected because a DFE does not amplify noise. Therefore, minimizing ISI ( r 2 ) and signal error (E[e 2 k ]) at the DFE output should lead to the same solution [12] Maximum Eye Opening Method The maximum eye-opening method is another commonly-used algorithm [8, 16, 31] for adjusting equalizer taps. Figure 2.17 shows a system that uses an eye monitor to measure eye height or width and feeds the information back to the equalizer through a controller. It should be noted that optimizing an equalizer to maximize eye height may not lead to an optimal eye width and vice versa. The eye monitors described in [31] and [16] measure eye height by comparing the outputs of a main sampler and auxiliary sampler with a shifted threshold. If the outputs are the same, then the threshold of the auxiliary sampler is within the eye. Thus, the eye monitor estimates eye height by increasing the threshold of the auxiliary sampler until the outputs differ. Signal from Channel Equalizer CDR Recovered Data EQ coefficients Controller Eye Monitor Recovered Clock Figure 2.17: A system that adapts equalizer taps based on eye opening The adaptive controllers in [31] and [16] iterate across all possible combinations of equalizer tap values. For each combination, they determine the eye-opening by plotting a histogram and, at the end, choose the tap settings that produce the maximum eyeopening. Compared to the ZF and LMS equalizers, this adaptation method is slower and

38 Chapter 2. Background 23 cannot run continuously during data recovery because it has to try all of the equalizer settings. However, the method is more flexible since it can be applied to a variety of equalizer structures and does not depend on having correctly recovered data Clock and Data Recovery (CDR) In many wireline communication systems, the clock signal is not transmitted with the data signal in order to reduce the number of wires and, therefore, the cost of the channel. In addition, the receiver usually has a plesiochronous clock source (i.e. similar in frequency, but phase and frequency are not matched) with respect to the transmitter data. Hence, the clock and data recovery (CDR) block s job is to extract the transmitted clock and binary data from the data signal in the presence of jitter and frequency offset. One type of CDR generates a phase-tracking clock whose falling and rising edges align, respectively, with the zero-crossings and centers of the data signal (shown in Figure 2.18). Then, the CDR samples the data signal with the clock s rising edge and outputs the recovered data and phase-tracking clock to downstream digital blocks. Eye Diagram of Equalized Data Signal Recovered Clock, CK RX Figure 2.18: A recovered clock sampling equalized data Another type of CDR blindly samples the data signal with the plesiochronous clock and post-processes the samples to extract the data bits and phase information. As depicted in Figure 2.19, we can classify CDRs into two broad categories where one operates with a phase-tracking clock and the other with a blind clock. We can further classify CDRs as having a feedback or feed-forward architecture. In Sections to 2.5.3, we will discuss three types of CDRs; burst-mode CDRs [2] are omitted because they are less relevant to the proposed CDR. Chapter 3 proposes an ADC-based implementation of a

39 Chapter 2. Background 24 blind-sampling CDR with feedback. CDR Types Phase-Tracking Clock Blind Clock Feedback (Conventional) Feed-forward (Burst-mode) Feedback (Data interpolator) Feed-forward (Oversampling) Figure 2.19: CDR classification Phase-Tracking CDR with Clock Feedback Figure 2.20 shows a conventional phase-tracking CDR with clock feedback. The phase detector (PD) compares the equalized data signal to the recovered clock, CK RX, to estimate the phase difference between them. The PD output is an error signal that is ideally proportional to the phase difference. The charge pump (CP) is a transconductor that converts the error signal to a current. The loop filter is a proportional-integral controller where the resistor, R 1, produces a proportional voltage to the current and the capacitor, C 1, integrates the current. The second capacitor, C 2, is used to smooth the pulses of current from the CP and its value is much smaller compared to C 1. The voltage from the loop filter adjusts the frequency (and, indirectly, the phase) of the voltage-controlled oscillator (VCO) that generates CK RX. When operating in steady state conditions, the feedback loop forces the phase of CK RX to match that of the incoming data signal. Although Figure 2.20 shows an CDR with a VCO block, it is also possible to generate CK RX with a phase interpolator (PI). While a VCO s frequency is proportional to its input voltage, PI s phase is directly proportional to its input signal. Hence, a PI-based CDR usually has an extra integrator in the loop filter to replace the integrator from the VCO. PI-based CDRs can be used in multi-transceiver systems to reduce the number of VCOs (e.g. to avoid coupling between VCOs). On the other hand, PIs are challenging

40 Chapter 2. Background 25 Equalized Data Signal D Q Recovered data (A K ) CK RX PD & CP R 1 C 2 VCO C 1 PD: phase-detector CP: charge pump VCO: voltage-controlled oscillator Loop Filter (LF) Figure 2.20: System diagram of phase-tracking CDR with clock in feedback loop to implement in terms of linearity (i.e. phase output not exactly proportional to the input signal) and noise (i.e. the PI has a lower output amplitude compared to VCO output) [10]. We can characterize a CDR s performance by measuring its jitter tolerance, jitter transfer, and jitter generation. Jitter tolerance measures the maximum amount of sinusoidal jitter between the data signal and CK RX from which the CDR can successfully recover data given a required BER. Jitter transfer is the amount of jitter the CDR transfers from the data signal to CK RX. Jitter generation is the amount of jitter in CK RX caused by the CDR s internal blocks (e.g. VCO). The most important measurement is jitter tolerance because it directly relates input jitter to BER. A simplified example is shown in Figure The jitter tolerance curve is separated into two parts by the CDR s bandwidth. When the frequency of the input jitter is low, the CDR can shift CK RX to track the center of the data data eye even if it deviates from the ideal location by more than 0.5UI. However, when the jitter frequency is higher than the CDR bandwidth, the feedback cannot track the data eye. At most, the data eye can move the 0.5UI (i.e. 1UI P P ) before a bit error occurs. In practice, the high frequency jitter tolerance is usually lower than 1UI P P

41 Chapter 2. Background 26 Jitter tolerance (UIPP) CDR bandwidth 1UI PP Jitter Frequency (Hz) Figure 2.21: Example of a jitter tolerance chart because the CDR has to recover data in the presence of other components of jitter besides sinusoidal jitter (e.g. data-dependent jitter, random jitter, etc.). The PD is an important component of the CDR because it provides the error signal used to guide the feedback loop (shown in Figure 2.22). In following sections, we will discuss three types of PDs: Alexander, Hogge, and Mueller-Muller. Φ ERR Φ IN Φ CK PD PD OUT Φ IN Φ CK K PD PD OUT (a) (b) Figure 2.22: (a) PD inputs and output and (b) linear model Alexander (Bang-Bang) Phase Detector As depicted in Figures 2.23 and 2.24, the Alexander PD, also known as a bang-bang PD, samples both the edges and centers of the data signal. When a transition occurs, the PD compares the edge sample to the adjacent center samples to determine if the clock is early or late with respect to the data signal. In order to capture both center and edge samples, the Alexander PD must oversample at 2x the baud rate. Alexander PDs are widely used because they are easily implemented with digital logic, but, as shown in

42 Chapter 2. Background 27 Figure 2.25, they are highly non-linear when jitter is absent from clock and data. When jitter exists, the PD can be linearized [17], but its gain is jitter-dependent. This is also undesirable since we usually cannot predict the jitter in advance. D2 D IN CK RX D Q D Q D Q D Q D1 E PD Logic Early Late {D1, E, D2} 110 or or or 111 Early Late Figure 2.23: Alexander PD implementation D1 E D2 D1 E D2 D IN CK RX CK RX is early (a) CK RX is late (b) Figure 2.24: Alexander PD examples with early and late CK RX PD OUT =Avg(Late Early) Φ ERR =Φ IN -Φ CK -UI/2 -UI/2 Figure 2.25: Transfer function of Alexander PD with no jitter on data or CK RX

43 Chapter 2. Background 28 Hogge Phase Detector The Hogge PD is depicted in Figure In contrast to the Alexander PD, its output is linear and its gain is independent of jitter. As shown in Figure 2.27, the signal, B, is a pulse with a constant width of 0.5UI. The other signal, A, measures the time from the data transition to the rising edge of CK RX. When the rising edge samples the center of the data eye (Figure 2.27b), the data transition occurs 0.5UI from the rising edge, the pulses on A and B are equal, and the average PD output is zero. Otherwise, PD OUT is positive or negative when CK RX is late or early, respectively. BUF1 A PD OUT D IN D Q D Q B CK RX FF1 FF2 Figure 2.26: Hogge PD implementation Early On time Late D IN CK RX PD OUT +1-1 A B B B A A A < B A < B A > B Avg(PD OUT )<0 Avg(PD OUT )=0 Avg(PD OUT )>0 (a) (b) (c) Figure 2.27: Hogge PD output with (a) early, (b) on-time, and (c) late CK RX

44 Chapter 2. Background 29 Figure 2.28 shows the transfer function of an ideal Hogge PD with no offset. However, the Hogge PD is more difficult to implement accurately compared to the Alexander PD. In particular, the delay of BUF1 should match the clock-to-q delay of FF1. A delay mismatch adds a phase offset to the A signal and, in turn, causes PD offset [6]. PD OUT =Avg(Late Early) Φ ERR =Φ IN -Φ CK -UI/2 -UI/2 Figure 2.28: Transfer function of Hogge PD Mueller-Muller Phase Detector One way to reduce power consumption is to reduce the sampling rate. Both the Alexander and Hogge PDs require a 2x oversampling rate. In contrast, Mueller-Muller PDs (MMPDs) allow the CDR to operate at baud rate (1x) sampling [14, 21, 26] the PD calculates phase error from center samples only. The center samples contain mostly amplitude information about the data signal and the edge samples, which the MMPD ignores, contain mostly phase information. However, if pulse response of the data signal has ISI, then the MMPD can infer the phase information from the center samples and the slope of the pulse response. Therefore, a MMPD requires ISI in order to function; it will fail if given a data signal with a Nyquist pulse response (which has infinite slope on its edges). Each MMPD is defined by a MM function, F, which should be chosen based on the pulse response of the channel. The MM function is also the transfer characteristic of the MMPD. When placed in a CDR feedback loop, the feedback forces the MM function to zero. Figure 2.29 shows an example that Mueller and Muller presented in their 1976 pa-

45 Chapter 2. Background 30 Pulse Response Example MM Function h -1 h 0 h 1 F = h -1 -h T 2T Time (a) 3T Sampling Phase (UI) (b) Figure 2.29: Example of (a) pulse response and (b) MM function [21] per [21]. The MM function demonstrated in [21] was F = h 1 h 1 (i.e. the difference between the precursor, h 1, and post-cursor, h +1 ). Given the example pulse response shape, when the samples h 1 and h 1 shift to the left, h 1 becomes greater than h 1 and F is negative. Conversely, if the samples shift to the right, F becomes positive. When the CDR locks, the feedback forces F to zero and h 1 and h 1 are equal such that the main cursor, h 0, is near the optimal sampling position close to the peak of the pulse response. Mueller and Muller also showed that we can estimate the points on the pulse response (e.g. h 1, h 0, h 1, etc.) by correlating baud-rate samples of the data signal with the recovered data. The results are listed in Equations 2.21 to The derivation is omitted because the analysis is very similar to the Equation We note that Equations 2.21 to 2.24 assume random, independent data with zero DC bias (E[A k ] = 0); therefore, the MMPD requires these conditions on the input signal in order to function correctly. E[x k A k 1 ] = h 1 (2.21) E[x k A k ] = E[x k 1 A k 1 ] = h 0 (2.22) E[x k 1 A k ] = h 1 (2.23)

46 Chapter 2. Background 31 h 1 h 1 = E[x k 1 A k x k 1 A k ] (2.24) According to Equation 2.24, we can implement the MMPD described in Figure 2.29 using the expression: x k 1 A k x k 1 A k. The loop filter that follows the MMPD estimates the expected value by averaging the MMPD output. From Figure 2.29, we can also observe a disadvantage of the MMPD namely, its transfer function is dependent on the shape of the channel pulse response. A sharp pulse response will lead to a high PD gain, whereas a spread-out pulse response (resulting from increased ISI) will reduce the PD gain Blind Feed-forward CDR An example of a blind feed-forward CDR is described in [22,27], as shown in Figure The proposed design samples a 10.3Gbps data signal at 82.5GS/s (8x oversampling). The edge detector locates the rising and falling data transitions by comparing adjacent samples. As depicted in Figure 2.31, the data selector chooses the sample farthest away from the edge (i.e. closest to the center of the UI). CK REF PLL 8-phase clock generator D IN Samplers Edge detection + Data selection logic Recovered Data Figure 2.30: System diagram of a 8x oversampled blind feed-forward (burst-mode) CDR [22, 27] An advantage of the feed-forward architecture is that the CDR blocks can be implemented and simulated independently. In fact, the data selection logic in Figure 2.30 was implemented on a separate FPGA while the analog front end blocks were imple-

47 Chapter 2. Background 32 UI Center Detected Edges Figure 2.31: The edge detection and data selection process from Figure 2.30 mented on a test chip. However, the 8x oversampling ratio required a large number of samplers and a complicated clock distribution network, which resulted in the test chip s high power consumption of 5.8W. The oversampling ratio also limits the data rate. The analog front end s power consumption and increasing data rates motivates us to reduce the oversampling ratio. 5Gb/s Input 5GHz Blind CK 2 5-bit ADC 4 Digital CDR FFE PD Φ X Data Decision Φ AVG Low-pass Filter D OUT a Φ X 0.5UI 2x blind samples b PD interpolates linearly between 2x samples to find zero-crossing: Φ X 0.5 a a - b Figure 2.32: A blind 2x ADC-based CDR [32] Figure 2.32 shows an ADC-based implementation of a 2x blind feed-forward CDR [32], [36]. A 5Gb/s input is sampled by a 5-bit ADC and is passed to a feed-forward equalizer (FFE) in the digital CDR. After the FFE, the blind samples are processed by the phase detector (PD). If two adjacent blind samples are opposite in sign, a zero-crossing is detected which

48 Chapter 2. Background 33 corresponds to the edge sample in a phase-tracking system. This zero-crossing, denoted by variable φ X, is approximated by the linear interpolation shown in Figure The instantaneous value of φ X is low-pass filtered into φ AV G by the digital filter. The data decision block adds 0.5UI to φ AV G to find the center of the eye and compares it to φ X to recover the data. This system uses 2x sampling where the blind samples are 0.5UI apart. However, if oversampling ratio can be decreased, then the data rate can be increased without increasing the frequency of the blind clock Gb/s Input 5GHz Blind CK 2 5-bit ADC 4 PD Data Decision ΦX Φ AVG Filter Data Compactor Digital CDR D OUT Fractional sampling: 16 samples per 11 UI S 1 S 3 S16 S 2 Φ X Figure 2.33: A blind 1.45x ADC-based CDR [33] A subsequent work [33], illustrated in Figure 2.33, reduces the oversampling ratio to 1.45x; the receiver takes 16 samples for every 11UI to achieve 6.875Gb/s. Its architecture is similar to the one presented in [36], but now the samples are farther apart than 0.5UI and the linear interpolation used in the PD to estimate zero-crossings is less accurate. To solve this problem, the PD filters out some of the less accurate results based on sample amplitude. With this architecture, 1.45x seems to provide a good compromise where the oversampling ratio can be reduced without much loss in jitter tolerance. In order to eliminate oversampling altogether, Chapter 3 proposes a different CDR architecture.

49 Chapter 2. Background Blind CDR with Feedback Due to the linearity and noise drawbacks of PI-based CDRs, [10] proposed a 2x oversampling, 32Gbps design based on a data interpolator (DI) instead of a PI. The DI samples the data signal blindly and generates the center and edge samples by interpolating between the blind samples as shown in Figure The DI is implemented in the analog domain by storing the samples on capacitor arrays and interpolating through charge sharing. Data Interpolator D IN Sampler Switchedcap. array PD Recovered Data Φ AVG LF Figure 2.34: System diagram of blind CDR with feedback [10] Data Edge Data Edge Data Center Blind sample Interpolated sample Figure 2.35: Analog data interpolator (DI) estimates center and edge samples from blind samples [10] A disadvantage of a DI-based CDR is that the DI introduces interpolation error when estimating the desired samples. In particular, the analog interpolator is a firstorder interpolator (see Figure 2.35). A digital DI can reduce the error by using a more sophisticated interpolation algorithm. Chapter 3 proposes ADC-based implementation of a blind CDR with a digital DI.

50 Chapter 2. Background Summary This chapter discussed fundamental concepts about channels and receivers and reviewed some previous work on adaptive equalizers and CDR blocks. This thesis builds upon the background in this chapter by exploring blind baud-rate CDR architecture in Chapter 3 and a zero-forcing adaptive DFE in Chapter 4.

51 3 A Blind Baud-Rate CDR This chapter proposes a CDR that can recover data from blind baud-rate samples. Section 3.1 discusses some concepts and challenges arising from blind baud-rate data recovery. Sections 3.2 and 3.3 present the receiver, CDR, and each of their components. Section 3.4 shows the simulated and measured results Blind 1x Data Recovery Concepts The PDs in the 2x [32,36] and 1.45x [33] blind CDRs (Figures 2.32 and 2.33, respectively) interpolate between the blind samples in order to detect the phase of the zero crossings; they require a finite slope in order to calculate phase. The interpolation cannot accurately estimate phase when given a low-loss channel because the data transitions become to abrupt. Unlike phase-tracking CDRs, blind ADC-based CDRs perform poorly with lowloss channels. Since a blind ADC-based CDR should work with a range of channels, we focus most of the analysis on low-loss channels. Section 3.4 shows how the proposed CDR can be modified for a high-loss channel. Figure 3.1 compares eye diagrams with different sampling rates given a low-loss channel. The worst-case sampling position occurs when adjacent samples are equally far from the center of the eye. For 2x blind sampling, the worst case is where adjacent samples are both 0.25UI from the edge, which leads to a high-frequency jitter tolerance of 0.5UI PP. When the oversampling ratio is decreased to 1.45x, jitter tolerance decreases to 0.31UI PP. 36

52 Chapter 3. A Blind Baud-Rate CDR 37 At 1x, the samples may occur on the edges. If jitter shifts samples away from each other, then the CDR will not capture the bit at all, which results in zero jitter tolerance. The following paragraph uses the channel s pulse response to elaborate on this issue and to arrive at the proposed solution. 2x 1.45x 1x High Freq. Jitter Tol. (HF JT): 0.5UI PP 0.31UI PP 0UI PP Figure 3.1: Worst-case for 2x, 1.45x and 1x sampling on open eye diagram Figure 3.2 shows the pulse response of an ideal channel. The best sampling position occurs when the main cursor is at the center of the ideal pulse response. In a clocked phase-tracking system, the sampling would remain at this position. However, with 1x blind sampling, any frequency offset between the data and receiver clock will cause the sampling phase to shift continuously across a 1UI window. When the sampling occurs near the UI boundary, any high-frequency jitter may shift the sampling outside the 1UI phase range, resulting in the loss of data bits (i.e. zero jitter tolerance). In order to increase the jitter tolerance at baud-rate sampling, the pulse response is extended beyond 1UI by introducing a controlled amount of ISI in the data using a rectangular filter, which is implemented via an integrate-and-dump (I&D) circuit [28] in the receiver front end. A rectangular filter is suitable in this case since its response has a finite length of ISI and requires fewer equalization taps compared to the exponentiallydecaying response of an RC filter. A 1UI rectangular filter, convolved with the ideal channel, spreads the pulse response to 2UI. If we have a perfect decision feedback equalizer

53 Chapter 3. A Blind Baud-Rate CDR 38 Pulse Response with Blind Baud-Rate Samples Vertical eye opening with ideal DFE (h 0 -h -1 ) Ideal channel (no I&D) h -1 h 0 h -1 : Pre-cursor h 0 : Main cursor 0UI pp jitter tolerance at boundary (No margin) Ideal channel + 1UI I&D h -1 h 0 0.5UI pp jitter tolerance at boundary Ideal channel + 2UI I&D h -1 h 0 1UI pp jitter tolerance at boundary 0 T 2T 3T Time Faded arrows and dots show possible sampling phases due to frequency offset Sampling Phase (UI) 1UI blind range Figure 3.2: Comparison of theoretical worst-case jitter tolerance given the pulse responses of an ideal channel, 1UI I&D, and 2UI I&D. Blind baud-rate samples can shift across a 1UI range due to frequency offset. (DFE) to cancel all post-cursor ISI, then the eye would be open for a range of 1.5UI (this would have been 2UI if we could cancel pre-cursor ISI). If the blind samples shift beyond the 1UI window, there is still a remaining jitter margin of 0.5UI PP. A 2UI rectangular filter increases this margin to 1UI PP and results in a symmetric eye opening with respect to the blind sampling window. For these reasons, a 2UI I&D circuit was chosen for the proposed design.

54 Chapter 3. A Blind Baud-Rate CDR Proposed 1x Blind Receiver Architecture Figure 3.3 shows the system diagram of the receiver including an analog front end and digital CDR. The analog front end consists of four interleaved I&D and ADC blocks, each operating at 2.5GS/s. Figure 3.4 shows two possible implementations of a 2UI I&D. The first implementation illustrated in Figure 3.4a is a fully analog 2UI I&D. We have chosen the second implementation (Figure 3.4b) where the 2UI I&D consists of 2 components: one piece is analog and the other digital. The I&D circuit integrates 1UI samples and the ADC converts the samples into 5-bit digital values. An adder in the digital CDR combines adjacent 5-bit 1UI I&D samples to synthesize 6-bit 2UI I&D samples. Since the ADC resolution is limited to 5 bits, if we were to obtain 2UI I&D samples directly in the analog domain and feed them to the ADC, we would have lost the additional 1 bit of resolution. Simulations showed that the system needed an ADC with a minimum ENOB of 4 bits; this work uses a previously designed 5-bit ADC with a known ENOB of 4.2 bits [32]. The proposed design does not include ADC calibration; the addition of digital calibration for gain, offset, and timing mismatches [19, 25, 35] would further improve the receiver performance. The samples in the digital CDR are processed by the data interpolator, which estimates the samples at the center of the eye using the recovered phase, φ AV G. The digital data interpolator allows the use of a more sophisticated interpolation algorithm compared to an analog interpolator. A Mueller-Muller PD and loop filter form a feedback loop with the data interpolator. Loop latency is critical in this design since the digital CDR operates on a 625MHz divided clock each cycle in the loop adds significant delay. The proposed implementation has a loop latency of 7 cycles. A 2-tap DFE recovers the binary data, A k, from the interpolated samples, x k. The data interpolator compensates for frequency offset. As shown in Figure 3.5a, we

55 Chapter 3. A Blind Baud-Rate CDR 40 5GHz Blind CK RX 4 2.5GHz 2 4 Clock gen MHz 10Gb/s Data 1UI I&D 1-UI I&D 5-bit ADC 5-bit ADC 4:16 Digital CDR A K Digital CDR x5b x2 Data Interpolator Average interpolation phase (Φ AVG ) MM PD Loop Filter z -1 x K DFE 17x1b A K Convert to signed integer Add 1UI I&D samples to form 2UI samples x K : Interpolated samples A K : Resolved bits Figure 3.3: System block diagram of interleaved analog front end (1 UI I&D and ADC) and digital CDR Analog 2UI I&D ADC Analog 1UI I&D ADC Blind CK RX Blind CK RX z -1 (a) (b) Digital adder produces 2UI I&D Figure 3.4: Comparison of (a) fully analog 2UI I&D and (b) analog and digital 2UI I&D define negative frequency offset to mean the transmitter clock is slower than the blind receiver clock. When this occurs, an interpolated sample is skipped each time the phase completes a 1UI rotation. Similarly, Figure 3.5b shows a positive frequency offset where the transmitter clock is faster than the receiver clock. A positive frequency offset would

56 Chapter 3. A Blind Baud-Rate CDR 41 Blind samples Desired sampling locations 1UI (a) Φ AVG Phase rolls over from 1UI to 0UI à skip interpolation Blind samples Desired sampling locations (b) Φ AVG Phase rolls over from 0UI to 1UI à do interpolation twice Figure 3.5: Handling (a) negative frequency offset: data (TX) is slower than blind receiver clock (CK RX ) (b) positive frequency offset: data (TX) is faster than blind receiver clock (CK RX ) result in cases where no blind sample exists between two desired samples; the interpolator resolves these cases by interpolating twice between the closest two blind samples when the decreasing φ AV G rolls over from 0UI to 1UI. The range of frequency offset supported by the loop filter is low enough that we can assume the extra interpolated sample is very close to the blind sample at 1UI. Hence, the implemented interpolator directly uses the blind sample as the extra interpolated sample. The data path in the digital CDR is sized for 17 parallel samples. Most of the time, only 16 paths are active. If there is frequency offset and φ AV G rolls over, then the number of active paths is temporarily reduced to 15 or increased to 17 for one cycle.

57 Chapter 3. A Blind Baud-Rate CDR 42 V3 SC1,SC1x SC0,SC0x SC3,SC3x SC2,SC2x C L C L V2 V1 V0 SC2x SC3 SC2 SC1 SC0 Reset Switches Vin+ Vin- SC2 Figure 3.6: Implementation of integrate-and-dump (I&D) circuit [28] 3.3. Receiver Implementation Integrate-and-Dump Filter The output from the channel drives the input of the I&D filter. The I&D circuit in Figure 3.6 introduces controlled ISI into the ADC input and also operates as a frequencyscalable anti-aliasing filter [28]. The circuit consists of a single source-degenerated transconductance stage that converts the input voltage to current and integrates the signal on the input capacitance of the four interleaved ADCs, labelled as C L in Figure 3.6. Each interleaved I&D block operates in 3 phases: integrate, hold (during which the ADC samples the value), and reset. The clock pulses (SC0, SC1, SC2, and SC3) reset the outputs (V0, V1, V2, and V3) and redirect the current to each of the interleaved ADCs. Each clock pulse is 1UI wide.

58 Chapter 3. A Blind Baud-Rate CDR 43 Operating phases (1) Integrate, (2) Hold, (3) Reset Clock Pulses SC0 SC1 SC2 SC3 1UI 4UI Figure 3.7: I&D operating phases synchronized with clock pulses Clock Generator CML-to-CMOS Converters with Adjustable Delay for Deskew 5GHz CK RX CML Toggle FF ( 2) Clock Pulse Generator SC0 SC1 SC2 SC3 CMOS Duty- Cycle Correction Figure 3.8: Implementation of clock pulse generator with adjustable delay for deskew Figure 3.8 shows the clock generator which drives the ADC and I&D. A CML toggle flip-flop divides a 5GHz input clock into 4 phases, each at 2.5GHz. The outputs are then

59 Chapter 3. A Blind Baud-Rate CDR 44 converted into single-ended CMOS signals and buffered. The clock pulse generator [28] uses logic gates to generate 1UI wide pulses from the 4 clock pulses. SC0 SC1 SC2 SC3 Effect of clock phase skew (a) Correct skew by adjusting clock delays (b) Figure 3.9: (a) Effect of clock phase skew on the I&D integration period (b) Equal I&D integration periods after correcting clock skew Figure 3.9a shows an example of the clock pulses when skew exists between the 4 phases. First, we note that any skew could change the integration periods when the pulses control the I&D operation. There would be gain mismatch between the 4 interleaved I&D blocks. Second, when high-speed signals are sampled, the clock skew would appear effectively as high-frequency periodic or duty cycle dependent (DCD) jitter. Both the gain mismatch and high-frequency jitter will degrade the receiver s jitter tolerance. This sensitivity to clock skew is a disadvantage of using the I&D block. As shown in Figure 3.9b, the clock skew can be compensated by adjusting the clock phase through deskew circuits. In this design, the skews are manually adjusted by observing the ADC outputs (e.g. Figure 3.24). Figure 3.10 shows the deskew circuitry implemented in each of the CML-to-CMOS converters as a 4-bit phase interpolator. The differential clock signal connects to the In+ and In- inputs and a 20ps delayed clock connects to In del+ and In del-. Combining them achieves ±10ps of deskew range on each of the 4 clock phases driving the I&D.

60 Chapter 3. A Blind Baud-Rate CDR 45 Out In+ Del[3] Del[2] Del[1] Del[0] In_del+ In- Del[3] Del[2] Del[1] Del[0] In_del- Vbias 8x 4x 2x 1x 8x 4x 2x 1x Figure 3.10: Adjustable clock delay block Data Interpolator a b Φ AVG 1UI c d Desired sample c b Φ AVG b (1-Φ AVG ) + c Φ AVG Φ AVG 0.5((b-a) + (c-d)) Y(Φ AVG ) Y(Φ AVG ) = 0.5 Φ AVG when 0 Φ AVG < 0.5 UI 0.5 (1-Φ AVG ) when 0.5 Φ AVG 1 UI Figure 3.11: Piecewise linear interpolation of desired sample from blind samples Given the ADC s blind samples and the CDR s recovered phase, φ AV G, the data interpolator estimates the value of the data at the centre of the eye (i.e. the desired sample). Figure 3.11 shows 4 consecutive blind samples, a, b, c and d, that are separated

61 Chapter 3. A Blind Baud-Rate CDR 46 by 1UI. The desired sample is φ AV G away from sample b. For simplicity, the expression in Figure 3.11 assumes that φ AV G is a floating point value between 0 and 1UI. In the implementation, φ AV G is represented by a 5-bit value. The desired sample is estimated first by linearly interpolating between samples b and c. This estimate has a large error because samples b and c are separated by 1UI. To improve accuracy, extrapolation is performed using the slopes ((b a)/1ui) and ((c d)/1ui). The piecewise linear shape is scaled in Figure 3.11 by the average of the two slopes and superimpose it on the linear interpolation. Hence, the accuracy of the estimate is improved by using four instead of two blind samples Mueller-Muller Phase Detector Ideal channel+2ui I&D Pulse Response MM Function B h -1 h 0 0 T 2T 3T Time (a) B h 1 F = h 0 -h 1 h B Sampling Phase (UI) Figure 3.12: (a) Pulse response of an ideal channel followed by 2UI I&D (b) Proposed MM function (b) 2 In the proposed design, the 2UI I&D provides a wider pulse response such that the conventional MM function in Figure 2.29 would not provide the optimal sampling phase. If the receiver includes a DFE to cancel post-cursor ISI, the maximum vertical eye opening occurs when the main cursor, h 0, is at time T in Figure 3.12 because h 0 is the maximum value of the pulse response and h 1 is zero. Setting the pre-cursor tap to zero will allow us to fully benefit from the DFE and eliminates the need for FFE. This sampling position occurs when post-cursor ISI, h 1, is equal to the main cursor, h 0. To identify this desired

62 Chapter 3. A Blind Baud-Rate CDR 47 phase location, we choose the MM function to be F = h 0 h 1 [14] and force it to zero through the feedback loop. Since the actual sampling phase is blind, the desired phase is forced on the interpolating phase, φ AV G. Mueller-Muller function: h 0 = h(t) = E[x K A K ] = E[x K-1 A K-1 ] h -1 = h(-t+t) = E[x K-1 A K ] h 1 = h(t+t) = E[x K A K-1 ] h 2 = h(2t+t) = E[x K A K-2 ] F = (h 0 -h 1 ) = E[x K-1 A K-1 - x K A K-1 ] = E[(x K-1 x K )A K-1 ] Mueller-Muller PD: A K-1 x K-1 x K D Q MMPD out = (x K-1 x K )A K-1 Addition and sign operation are done speculatively while the DFE resolves A K-1 Figure 3.13: (MMPD) Design and implementation of the speculative Mueller-Muller phase detector Chapter 2 showed that the pulse response can be estimated using the samples x k, and the recovered data, A k [21]. From Equations 2.22 and 2.21, h 0 and h 1 can be estimated by the expected values, E[x k A k ] and E[x k A k 1 ], respectively. We substitute the expected values into the MM function to transform the MM function into the MMPD. The loop filter in the next block performs the expected value operation by averaging the MMPD output. Note that the expressions for pulse response are not unique. For example, according to Equation 2.22, h 0 is also equal to E[x k 1 A k 1 ]. In the implementation illustrated in Figure 3.13, we can therefore choose h 0 = E[x k 1 A k 1 ] so that A k 1 can be factored out of the expressions for h 0 and h 1. The DFE has some latency before it recovers A k 1 ;

63 Chapter 3. A Blind Baud-Rate CDR 48 factoring out A k 1 allows the subtraction to be performed before A k 1 becomes available. Since A k 1 takes on only two values, +1 and -1, it only affects the sign of the MMPD. In the PD implementation, subtraction is performed first and speculation is used for the sign of A k 1. The DFE s recovered data and the PD output are ready at the same time, thereby reducing latency in the CDR feedback loop and improving loop stability Decision-Feedback Equalizer DFE Sum (2-tap) DFE Sum X8 A K-2 A K-1 C 1 C 2 DFE Levels A K-1 A K-2 x K DFE Sum D Q A K x K D Q D Q D Q D Q A K x K+1 x k+7 DFE Sum DFE Sum D Q D Q A K+1 A K+7 (a) (b) Figure 3.14: (a) A speculative 2-tap DFE and (b) the first stage of the parallel speculative DFE that recovers 8 bits per cycle The DFE compensates for post-cursor ISI from the channel and the I&D filter. As can be seen from the pulse response in Figure 3.12, recovering data from an ideal channel and 2UI I&D filter would require one DFE tap to equalize post-cursor h 1, while a more attenuative channel may require more taps. Three pipeline stages, operating at 625MHz, resolve 16 bits in parallel actually 15 to 17 bits to handle cases of frequency offset as discussed in Section 3.2. DFE adaptation was not included in this design. To recover 16 bits per clock cycle, 16 parallel DFE sum blocks are required. Speculation is used extensively to reduce latency in the CDR feedback loop. In each DFE summation block shown in Figure 3.14a, the 2 DFE taps, C 1 and C 2, are manually set

64 Chapter 3. A Blind Baud-Rate CDR 49 and speculation is performed by subtracting the 4 possible levels from the interpolated sample, x k. When the previous two bits A k 1 and A k 2 have been recovered, the mux selects the correct A k. This speculation removes the adder from the critical path. However, the muxes remain on the critical path since, in order to resolve all 16 bits, data must propagate through 16 muxes. However, at 625MHz, the data can only propagate through 8 muxes per cycle. Figure 3.14b shows 8 DFE summation blocks that resolve 8 bits in one clock cycle. For this reason, another stage of speculation was created. The next stage speculates on the A k 1 and A k 2 inputs to the DFE Sum x8 blocks. As shown in Figure 3.15, A k 1 and A k 2 drive the first 4 parallel DFE Sum x8 blocks in a speculative structure which resolve bits A k to A k+7. The last two bits A k+6 and A k+7 of this first stage then drive a second set of 4 DFE Sum x8 blocks which resolve bits A k+8 to A k+15. In the end, the complete DFE has a latency of 3 cycles A 01 A A K+7 A K+6 10 K-1 K x K A K x K+8 A K+8 x K+6 DFE Sum X8 x K+14 DFE Sum X8 A K+6 A K+14 x K+7 x K+15 A K+7 A K+15 Figure 3.15: The second stage of parallel speculative DFE that recovers 16 bits per cycle

65 Chapter 3. A Blind Baud-Rate CDR 50 Up/down signal From PD 16x11b Proportional Cyclic Phase 5b S ΦAVG Gain Counter Counter K SUM =16 K P ={0.25, 0.5, 0.75, 1} K CYC =1/2048 K PC =1/ Integral Gain K I ={0, 0.25, 0.5, 0.75, 1} Saturating Counter Figure 3.16: Loop filter with configurable proportional and integral gains Loop Filter The loop filter is a conventional proportional-integral controller as shown in Figure The parallel PD outputs are summed together and the result is scaled by configurable proportional and integral gains. The saturating counter is sized to handle up to ±1900ppm of frequency offset. At the output, the 5-bit phase counter produces the recovered CDR phase as discrete φ AV G values ranging from 0 to 31 which are fed back to the data interpolator block, closing the CDR feedback loop Simulation and Measurement Results This section shows, through simulation, that the feedback loop converges correctly, how the system can be modified for a more attenuative channel, and simulated jitter tolerance results. Next, the measured eye diagrams and measured jitter tolerance of the proposed CDR are presented. Figure 25 illustrates the loop dynamics by showing the transient signals in the loop filter. When the system in Figure 3.3 starts up, it appears that the MMPD relies on correctly recovered data to estimate phase and, at the same time, the DFE requires a correct phase to recover the data. To verify that the feedback loop does not enter into a deadlock, we have applied an input with 1000ppm of frequency offset so as to start the loop with both phase and data errors. The proportional gain and saturating counter

66 Chapter 3. A Blind Baud-Rate CDR Proportional Gain Output Saturating Counter Output Up/Down Signal Phase Output (φ AVG ) Error Count Time (us) Figure 3.17: Simulated loop filter convergence with 1000ppm of frequency offset for PRBS-7. Signals correspond to nodes on the block diagram of Fig outputs are, respectively, the outputs of the proportional and integral paths in the loop filter. The cycle-slipping causes the saturating counter to temporarily decrease, but the saturating counter settles to a value corresponding to 1000ppm within 4µs. The up/down signal increments or decrements φ AV G. In steady state, φ AV G increases from 0 to 31 and wraps around in order to track the frequency offset. After 3µs, φ AV G is close enough to the center of the eye to recover the data correctly (i.e. no more bit errors). Figure 3.17 illustrates the transient signals in the loop filter (Figure 3.16). The simulation demonstrates the digital CDR locking to the received signal from Channel A + 2UI I&D and with 1000ppm of frequency offset. There is cycle slipping, however the

Chapter 3. A Blind Baud-Rate CDR 52 B A A+2UI I&D Figure 3.

5UI 1UI -2048 0UI 0.5UI 1UI -128 0UI 0.5UI 1UI PRBS-7 Generator 10GHz (TX RJ = 0.

$23 UIpp) ADC 1 + z -1 Data Interpolator Φ AVG x K \ 2-tap DFE MM PD Loop Filter A K Figure 3.$ values in approximately 4µs. Similarly, the bit errors stop occurring after 3µs. As discussed in Section 3.

values in approximately 4µs. Similarly, the bit errors stop occurring after 3µs. As discussed in Section 3.

67 Chapter 3. A Blind Baud-Rate CDR 52 B A A+2UI I&D Figure 3.18: Frequency response of channel models in simulation UI 0.5UI 1UI 0 0UI 0.5UI 1UI UI 0.5UI 1UI UI 0.5UI 1UI PRBS-7 Generator 10GHz (TX RJ = 0.17 UIpp) Channel A 1UI I&D CK RX (RX RJ = 0.23 UIpp) ADC 1 + z -1 Data Interpolator Φ AVG x K \ 2-tap DFE MM PD Loop Filter A K Figure 3.19: Simulated eye diagrams using Channel A + 2UI I&D proportional and integral paths settle to their steady state values in approximately 4µs. Similarly, the bit errors stop occurring after 3µs. As discussed in Section 3.1, the receiver relies on ISI to spread the pulse response beyond 1UI. We demonstrate through simulation that the 1x blind CDR can work in 2 cases. In cases where the channel attenuation is low (i.e. there is not enough ISI produced by the channel), the system relies on the 2UI I&D to produce the ISI. This

$17 UIpp) Channel B CK RX (RX RJ = 0.23 UIpp) 5-bit ADC Data Interpolator Φ AVG x K \ 20-tap DFE MM PD Loop Filter A K Figure 3.$ 20: Simulated eye diagrams using Channel B situation is demonstrated in Figure 3.

20: Simulated eye diagrams using Channel B situation is demonstrated in Figure 3.

This situation is demonstrated by Channel B in Figure 3.18. Simulations show that the 1x blind CDR works in both of these cases.

has less high-frequency loss. However, an adaptive filter is beyond the scope of this work. The test chip, which is described later, demonstrates only the first case (i.e. low-attenuation channel with 2UI I&D).

68 Chapter 3. A Blind Baud-Rate CDR UI 0.5UI 1UI 0 0UI 0.5UI 1UI UI 0.5UI UI 0UI 0.5UI 1UI PRBS-7 Generator 10GHz (TX RJ = 0.17 UIpp) Channel B CK RX (RX RJ = 0.23 UIpp) 5-bit ADC Data Interpolator Φ AVG x K \ 20-tap DFE MM PD Loop Filter A K Figure 3.20: Simulated eye diagrams using Channel B situation is demonstrated in Figure 3.18 which shows the combined frequency response of a low-attenuation Channel A followed by its associated 2UI I&D filter. In contrast, where the channel is attenuative by itself (i.e. there is enough ISI produced by the channel), the 2UI I&D is no longer needed to produce extra ISI. This situation is demonstrated by Channel B in Figure Simulations show that the 1x blind CDR works in both of these cases. If the CDR will be used in applications with a wide variety of channels, then, ideally, the front-end filter should be adaptive such that it increases the amount of post-cursor ISI when the channel has less high-frequency loss. However, an adaptive filter is beyond the scope of this work. The test chip, which is described later, demonstrates only the first case (i.e. low-attenuation channel with 2UI I&D). Figures 3.19 and 3.20 show the eye diagrams from simulations done in Simulink using event-driven models [34]. The data source is 10Gb/s and has 0.17UI PP of random jitter. Similarly, the blind receiver clock is simulated with 0.23UI PP of random jitter. The two leftmost eye diagrams in Figure 3.19 show the data eye after Channel A and I&D. The 5-bit ADC quantizes the samples into discrete values from 0 to 31. The eyes are still open because the analog 1UI I&D does not add much attenuation. The 1 + z 1 filter adds further ISI and closes the eye. In order to obtain the eye diagrams in the digital

69 Jitter Tolerance (UIpp) Chapter 3. A Blind Baud-Rate CDR 54 CDR, we break the feedback loop and set φ AV G to 0.5UI. This forces the desired sample halfway between the blind samples and the data interpolator produces the worst-case interpolation error in this condition. The open eye after the DFE adder shows that the data can be successfully recovered. Figure 3.20 demonstrates that the system can recover the data with Channel B without the I&D filter, however it requires a 20 tap DFE. This large number of taps is necessary for Channel B because it introduces a long tail of ISI. This is not the case for Channel A with the 2UI I&D because it produces far less ISI " Channel FR4 + A 2UI + 2UI I&D I&D 16" Channel FR4 (no B I&D) kHz MHz MHz MHz 1E+09 1GHz Jitter Frequency Figure 3.21: Simulated jitter tolerance results at 10Gb/s with a BER of 10 6 Figure 3.21 compares the simulated jitter tolerance for each of the two channels. The simulation assumes a bit error rate (BER) of The high-frequency jitter tolerance of the system in Figure 3.20 (Channel B) is slightly below that of the system in Figure 3.19 (Channel A + 2UI I&D). We also note that the former has a lower CDR bandwidth compared to the latter, which is caused by a lower PD gain. Compared to Channel A,

70 Chapter 3. A Blind Baud-Rate CDR 55 Channel B further spreads out the pulse response, which reduces the PD gain (i.e. the slope of the MM function). I&D (85x145μm 2 ) 4:16 Demux (60x490μm 2 ) Process 65nm CMOS Data Rate 10Gb/s Supply 1.2V 5-bit ADC (400x490μm 2 ) Digital CDR (420x645μm 2 ) ADC+Demux Power CDR Power 109mW 112mW Clock Gen. Clock 83mW Power Generator (150x260μm 2 ) I&D Power 1.7mW Total Power 306mW Figure 3.22: Chip photo The proposed receiver was implemented in Fujitsu s 65nm CMOS process. Figure 3.22 is a photo of the test chip. The I&D, clock generator, and ADC are custom-design analog blocks. The digital CDR was designed using Verilog RTL and implemented with standard cell gates. Figure 3.23 shows a simplified diagram of the measurement setup. The data source is a PRBS-7 generator. A logic analyzer captures and stores digital waveforms from the test chip (i.e. design-under-test or DUT). For jitter tolerance measurements, sinusoidal jitter was applied to the transmitter clock. Figure 3.24 shows the average ADC output when the I&D is given a DC input. On one test chip, we observed that one of the interleaved front end blocks had a lower gain compared to the other blocks as we varied the DC input. As discussed in Section 3.3, the gain error is mostly caused by systematic clock skew. If left uncompensated, the

71 Average ADC Output Code Average ADC Output Code Chapter 3. A Blind Baud-Rate CDR 56 5GHz CK RX PRBS-7 Generator Test Channel DUT Logic Analyzer 10GHz CK with sinusoidal jitter I&D ADC CDR PRBS-7 Comp. Figure 3.23: Measurement setup Average ADC Output Code Before Skew Correction ADC 0 ADC 1 ADC 2 ADC DC Input Voltage (mvpp Differential ) DC Input Voltage (mvpp Differential) (a) 40 After Skew Correction ADC 0 ADC 1 ADC 2 ADC DC Input Voltage (mvpp Differential ) DC Input Voltage (mvpp Differential) (b) Figure 3.24: Average ADC output given DC input (a) before and (b) after skew correction skew will reduce the CDR s jitter tolerance. Hence, the delay was manually adjusted the delays in the clock generator. Figure 3.24b shows that the gain at the output of ADC 3 matches more closely with gain of the other interleaved blocks after skew correction. The measurements were performed with a 48 SMA cable as the channel its frequency response is plotted in Figure Figure 3.26a shows the data eye at the output

ADC Eye Diagram 916mV PP 93.1ps 1UI I&D Digital Output 30 20 10 0 0 0.2 0.4 0.

72 Chapter 3. A Blind Baud-Rate CDR 57 Figure 3.25: Measured channel frequency response Channel Eye Diagram Channel + 1UI I&D + ADC Eye Diagram 916mV PP 93.1ps 1UI I&D Digital Output Sampling Phase (UI) (a) (b) Figure 3.26: Measured eye diagrams (a) after the channel and (b) after the ADC ADC

73 Jitter Tolerance (UIpp) Chapter 3. A Blind Baud-Rate CDR 58 of the channel. Figure 3.26b shows the eye diagrams taken from the outputs of the interleaved ADCs. It has been partially attenuated by the analog 1UI I&D. There is some mismatch between the 4 interleaved analog front ends, but the digital CDR is able to tolerate this as demonstrated in the jitter tolerance measurement Simulation (BER=1e-6) -300ppm (TX slower than RX) 0ppm 300ppm 1000ppm (TX faster than RX) XLAUI mask kHz MHz MHz 100MHz Jitter Frequency (Hz) Figure 3.27: Simulated and measured jitter tolerance results with 10Gb/s PRBS-7 input data and BER of 10 6 and 10 12, respectively The jitter tolerance was measured after skew correction and with a maximum BER of at 10Gb/s. In Figure 3.27, we show the results given -300, 0, 300, and 1000 ppm of frequency offset. A negative frequency offset means that the transmitter is slower than the blind receiver clock (i.e. above baud-rate sampling). A positive frequency offset means that the transmitter is faster than the blind receiver clock this case is worse for jitter tolerance since we are actually sampling slightly below baud-rate. During measurement, we were able to push the frequency offset to 1000ppm with a slight degradation in jitter tolerance. In addition, the CDR model was simulated with the channel frequency response (as

74 Chapter 3. A Blind Baud-Rate CDR 59 in Figure 3.25) and 300ppm of frequency offset. Due to simulation time constraints, the simulation assumes a maximum BER of For this reason, the simulated jitter tolerance is higher compared to the measured results. The jitter tolerance mask for XL- Attachment-Unit-Interface (XLAUI) is also shown in Figure Although the proposed design did not specifically target Ethernet applications in the proposed design, the mask is provided as a reference Summary This chapter presents a 1x blind ADC-based CDR. The proposed architecture recovers data by extending the channel pulse response so that the pulse amplitude is greater than zero, no matter where the blind samples occur within a 1UI window. The receiver adds controlled ISI to the pulse response through the use of an I&D block in the receiver front end. The baud-rate design allows the CDR to operate at 10Gb/s given a 10GS/s sampling rate. The proposed design was fabricated in a 65nm CMOS process. The test chip successfully recovers 10Gb/s data with BER below Jitter tolerance measurements show that the CDR implementation can recover data with below-baud rate sampling the CDR operates with ±300ppm of frequency offset and a high-frequency jitter tolerance of 0.19UI PP.

75 4 A Zero-Forcing Adaptive DFE for an ADC-Based CDR This chapter proposes a novel zero-forcing adaptive controller for a DFE in a digital ADC-based CDR. Section 4.1 provides the concepts of the proposed adaptive controller. Sections 4.2 and 4.3 describe the architecture and implementation details of the receiver, respectively. Section 4.4 presents simulation results from Simulink models. At the time of writing this thesis, the Simulink models and Verilog implementation have been completed. However, the measurement results are left as future work Proposed DFE Adaptation Sections and showed how samples on a pulse response can be calculated by correlating samples of random data with recovered bits. The example pulse response from Figure 2.29 is reproduced in Figure 4.1 for convenience. The MMPD described in Section uses this information to estimate phase error by subtracting two pulse response samples (h 0 -h 1 ). The MMPD output is processed by a loop filter and fed back to the data interpolator to form the phase-tracking loop. This chapter shows that it is possible to use a similar feedback loop to adapt the DFE coefficients. Figure 4.2 illustrates a controller that adapts n DFE coefficients. The data sample, x k, is correlated with recovered bits, A k 1 to A k n, to estimate pulse response samples. The low-pass filters provide average values of the pulse samples, which are used as DFE coefficients, c 1 to c n. The n-tap DFE subtracts post-cursor ISI from the current sample, 60

76 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 61 h -1 = E[x K-1 A K ] h 0 = E[x K A K ] h 1 = E[x K A K-1 ] h 2 = E[x K A K-2 ] Figure 4.1: ISI can be calculated by correlating sampled data (A k, A k 1, etc.) with recovered bits (x k, x k 1, etc.) x k and the decision block slices the DFE output to recover the binary data, A k. The bandwidth of the LPF is the main design parameter. It should be low enough to filter out transient noise from the correlation terms and, at the same time, high enough to allow the LPF to settle to the steady state values in reasonable time during receiver start-up. x k n-tap DFE c 1 c 2... LPF LPF... LPF c n A k x k A k-1 x k A k-2... x k A k-n A k-1...a k-n Shift register Figure 4.2: Zero-forcing controller for n-tap DFE adaptation This zero-forcing adaptive DFE architecture has two main advantages: scalability and ease of design. The blocks in Figure 4.2 are easily scaled when n is increased. The controller is also simpler compared to the ZF implementations in [30] and [13] since it does not generate an error signal by subtracting the signals before and after the decision block. Unlike the LMS adaptation in [1], the proposed architecture does not require a reference (i.e. desired) signal and the feedback loop does not require a configurable gain parameter.

77 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR Proposed Blind ADC-Based Receiver Architecture Figure 4.3 shows the system diagram of the proposed blind receiver. The main components are a 20GS/s, 3-bit ADC and a digital CDR with adaptive DFE. The ADC oversamples the 10Gbps data signal by 2x. Compared to the 1x receiver from Chapter 3, the oversampling reduces the anti-aliasing requirement from the analog front end, increases the accuracy of the data interpolator in the digital CDR, and removes the need to extend the pulse response through additional ISI. Hence, the oversampling allows us to remove the 2UI I&D block from the receiver. The removal of the I&D block simplifies the clock distribution, reduces the power consumed by the clock divider and pulse generator, and removes the gain errors resulting from skew between interleaved clocks. 10Gbps Data Channel 20GS/s 3-bit ADC Baud-rate CDR with adaptive DFE Recovered Data Blind CK RX Digital blocks Figure 4.3: System diagram of proposed receiver with 3-bit ADC-based CDR and adaptive DFE Although the front end sampling rate is doubled, the overall ADC area and power consumption is reduced by decreasing the number of bits from 5 to 3. If we assume a simple flash ADC architecture, a 5-bit ADC sampling at baud-rate would require 31 comparisons per UI. In contrast, a 3-bit ADC sampling at 2x would only need 14 comparisons per UI. The architecture of the baud-rate digital CDR, however, is mostly the same as the one proposed in Chapter 3 (Figure 3.3). Hence, this chapter focuses only on DFE adaptation and a few CDR blocks that were modified. The following paragraph explains how the 2x ADC is interfaced with the 1x CDR. The data interpolator at the input of the CDR creates baud-rate samples from the

78 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 63 2x samples. As shown in Figure 4.4, each pair of blind samples (0.5UI apart) are used to calculate a desired sample in between them. The φ AV G quantity tracks the center of the data eye relative to the blind samples; edge samples are not computed. When compared to a 2x digital CDR (e.g [24]), the baud-rate architecture reduces CDR power consumption because no multipliers and adders are used to interpolate and equalize edge samples. 0UI Φ AVG 0.5UI Skip interpolation 0.5UI Φ AVG 1.0UI 2x blind samples Desired sampling locations Φ AVG (a) 1UI 2x blind samples Desired sampling locations 0UI Φ AVG 0.5UI Extra interpolation (b) 0.5UI Φ AVG 1.0UI Figure 4.4: Data interpolator calculates sample at desired location from closest blind samples. (a) Negative or (b) positive frequency offsets result in occasional skipped or extra interpolated samples A negative or positive frequency offset will result in the data interpolator skipping an interpolation or inserting an extra interpolation in a similar way to the one described in Section Proposed Digital CDR with Adaptive 2-tap DFE Figure 4.5 shows the digital CDR and adaptive DFE. The 3-bit ADC data is demuxed to 32 parallel samples at 625MHz. The CDR converts the samples to signed integers before the input to the data interpolator. The phase tracking loop is the same as the one described in Chapter 3, with two main differences: a different MMPD and configurable

79 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 64 phase offset coefficient, P. Digital CDR Phase offset adjustment: P 5b 3-bit ADC 32x3b Convert to signed integer Φ AVG Data Interpolator x k MM PD (x k-2 -x k )A k-1 x k A k-1 x k A k-2 x k-2 A k-m-2 4 4x8b 4x8b 1x8b S K DIV =0.25 K SUM =16 LP Filter LP Filter 8b 8b Digital LF K ADC =8 K INT =16 LP Filter c m c 1 2-tap DFE c 2 Decision block 8b 8b A k Figure 4.5: Proposed digital CDR with adaptive DFE Figure 4.5 also identifies the gains of the ADC, data interpolator, divider, and sum blocks as K ADC, K INT, K DIV, and K SUM. The ADC has a gain of 8 because it has a resolution of 3 bits. The sum block adds together 16 parallel MMPD outputs and, therefore, has a gain of 16. The interpolator gain is discussed in Section Accordingly, the MM function is: F = h 1 h 1 + P K DIV K SUM K AV G K INT (4.1) When the CDR has locked to its steady state, we have the relation: h 1 = h 1 + P K DIV K SUM K AV G K INT (4.2) The phase offset coefficient effectively shifts the CDR s locking phase slightly to the left (assuming a positive coefficient P), which, in turn, reduces the pre-cursor ISI, and

80 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 65 increases the post-cursor ISI. This takes advantage of the DFE s ability to cancel the latter, but not the former. In this work, P is manually set through test registers; in future work, it may be possible to automatically optimize P for maximum eye opening. The output of the phase coefficient adder is processed by the loop filter (see Section 3.3.6) and fed back to the data interpolator. The interpolator implementation is discussed in more detail in Section From Figure 4.5, the MMPD block also provides three correlation terms. The first two are used to estimate the first and second DFE taps (c 1 and c 2 ). They are low-pass filtered and the 8-bit coefficients are fed back to the DFE. The third correlation term provides c m as an ISI monitor for off-chip measurement and optimization. The integer m can be configured between values of -2 to 13 in order to observe 16 ISI taps. The MMPD-based architecture in Figure 4.5 provides an advantage by decoupling the phase-tracking and DFE adaptive feedback loops. In an Alexander-based or Hogge-based phase-tracking CDR, the PD detects the data edges after decision feedback equalization [23]. Hence, the DFE affects the CDR s output phase. At the same time, the output phase affects the DFE coefficients. In order to prevent the interaction from causing instability, the DFE adaptive loop is usually implemented with much lower bandwidth than the phase-tracking loop. However, the low DFE loop bandwidth will increase the CDR s start-up time. The MM-based architecture removes the interaction because the MMPD locks to the unequalized eye the DFE does not affect the phase-tracking loop. Hence, the bandwidth of the DFE loop in an MMPD-based architecture can be increased compared to DFE loop bandwidth in a Alexander-based or Hogge-based architecture Data Interpolator The data interpolator architecture in Figure 4.6 has been modified for 2x blind samples; otherwise it is the similar to the one presented in Chapter 3 (Figure 3.11). Note that the worst case for interpolating between 2x blind samples occurs when φ AV G is 0.25UI (i.e.

81 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 66 the desired sample is halfway between the 2x blind samples). In contrast, the worst case for interpolating between 1x blind samples occurs when φ AV G is 0.5UI (i.e. the desired sample is halfway between the 1x blind samples). a 0.5UI c b Φ AVG d Desired sample c b Φ AVG b (1-2Φ AVG ) + c 2Φ AVG Φ AVG 0.5((b-a) + (c-d)) Y(Φ AVG ) Φ AVG = mod(φ AVG, 0.5UI) Y(Φ AVG ) = Φ AVG when 0 Φ AVG < 0.25 UI 0.5 (1-2Φ AVG ) when 0.25 Φ AVG 0.5 UI Figure 4.6: Piecewise linear interpolation of desired sample from 2x blind samples In the Verilog implementation, φ AV G is represented by a 5-bit number. The most significant bit of φ AV G selects the pair of blind samples adjacent to the desired sample (i.e. b and c). As shown in Figure 4.4, one pair is selected when 0UI φ AV G 0.5UI and the other when 0.5UI φ AV G 1.0UI. The remaining 4 bits are substituted as φ AV G in the interpolation expression in Figure 4.6. For clarity, Figure 4.6 shows φ AV G in terms of UI, but φ AV G is actually implemented as an integer between 0 to 15. Therefore, the implemented interpolator has a gain of 16 (i.e. K INT =16). One disadvantage of the proposed data interpolators (in this section and Section 3.3.3) is that they have a phase-dependent frequency response as shown in Figure 4.7. The frequency response of an ideal data interpolator has a flat magnitude; the interpolator should only shift the phase of the data signal. The proposed 2x interpolator has a flat magnitude only when φ AV G is 0UI; in fact, its frequency response has a null at 10GHz when φ AV G is 0.25UI. In the time domain, the interpolator changes the pulse response shape when φ AV G 0UI. To compensate for this, the DFE should use phase-dependent coefficients [1, 24]. The DFE architecture described in [1] and [24] stored 8 coefficients

82 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 67 for a 1-tap DFE. The disadvantage is the complexity and area required for storing and adapting multiple coefficients for each DFE tap. However, this work neglects the pulse shaping behaviour of the data interpolator because the magnitude of 2x interpolator s frequency responses are approximately flat up to the Nyquist frequency of 5GHz. Hence, only one coefficient is implemented per tap. As we will see in Section 4.4, the DFE adaptation converges to a coefficient that is approximately the average tap value over all φ AV G. Nyquist freq. = 5GHz 2x, Φ AVG =0 Interpolator Freq. Response (db) x, Φ AVG =0.5 2x, Φ AVG = x, Φ AVG =0.25 1GHz 5GHz 10GHz Figure 4.7: Frequency responses of 1x and 2x data interpolators. Both interpolators operate on a 10Gbps data signal with a Nyquist frequency of 5GHz. In Figure 4.7, we also observe a further advantage of 2x vs. 1x blind sampling. The frequency response of the interpolator operating on 1x samples has a null at the Nyquist frequency when φ AV G =0.5UI. The system in Chapter 3 worked because the 2UI I&D already has a null at 5GHz (see Figures 3.18 and 3.25), and, thus, the I&D mostly masked the phase-dependent response of the interpolator. The CDR would fail if the 2UI I&D were removed because the 1x interpolator would change the pulse response significantly. In that case, it would be necessary to implement phase-dependent DFE

83 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 68 coefficients. Therefore, the decision to use 2x oversampling has allowed us to save power by removing the I&D and by using a simple DFE architecture Low-Pass Filter for DFE Adaptation The low-pass filter (LPF) illustrated in Figure 4.8 is used to approximate the expected value of the correlation terms from the MMPD. The LPF consists of a single integrator in an internal feedback loop. A summer adds together a bus of 4 correlation terms at the LPF input. If we needed faster DFE convergence, it is possible to sum together up to 16 correlation terms since the CDR processes 16 samples in parallel per cycle. However, a larger adder would consume more power. Configurable counter Hysteresis to reduce output toggling Integrating counter 4x8b x k A k-1 or x k A k-2 S 10b 11b 13b, 14b, or 15b Q D Overflow Detector 2b Hysteresis Block Up/down signal 2b D Q 9b 8b D Q c 1 or c 2 10b X4 8b 8b Figure 4.8: Low-pass filter for DFE coefficients Up/down signal {-1, 0, 1} = b 2b D Q Hysteresis block b Figure 4.9: Hysteresis block implemented in low-pass filter The configuration counter and overflow detector act as an adjustable divider that produces an up/down signal having one of three values: 1, -1, or 0 (i.e. up, down, or no change). The hysteresis block reduces toggling at the LPF output. As shown in Figure 4.9, the register in the hysteresis block filters out the no change signals and

84 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 69 stores only an up or down signal. If the hysteresis block receives a signal that is opposite to the stored value, then the mux at the output of the hysteresis block forces the signal to no change. The filtered up/down signal at the output of the hysteresis block in Figure 4.8 is integrated by a counter at the output of the LPF. The gain in the feedback divides the output by 4; this is needed since the summer added together 4 terms at the LPF input Simulation Results This section presents the frequency and pulse responses of the channel models, DFE adaptation curves, and simulated eye diagrams and jitter tolerance. Figure 4.10 shows the frequency responses of the channel models used in simulation. Channels C and D represent 1.5 and 8 traces on a FR4 board, respectively. The CDR and DFE are demonstrated for three cases: Channel C at 5Gbps, Channel C at 10Gbps, and Channel D at 10Gbps. The attenuation at the Nyquist frequency are, respectively, 5dB, 10dB, and 13dB. 0 Channel C Channel Freq. Response (db) Channel D 100MHz 1GHz 10GHz Figure 4.10: Frequency responses of channel models used in simulation

Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 70 Figure 4.11 depicts the pulse responses of the channel models cascaded with the data interpolator.

In simulation, the offset coefficient, P, is chosen to be 77 because it shifts h 0 near the peaks of the pulse responses; hence the CDR locks at a position described by Equation 4.3.

0UI h -1 h -1 h 0 h 1 Channel C, 5Gbps, Φ AVG =0.25UI h -1 h 0 h 1 h 2 h 2 Channel C, 10Gbps, Φ AVG =0.0UI h -1 h 0 h 1 h 2 h 2 Channel C, 10Gbps, Φ AVG =0.

85 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 70 Figure 4.11 depicts the pulse responses of the channel models cascaded with the data interpolator. The pulse responses are shown for two values of φ AV G : 0UI and 0.25UI. The pulses responses are normalized so that the amplitude of the eye diagram is 1. In simulation, the offset coefficient, P, is chosen to be 77 because it shifts h 0 near the peaks of the pulse responses; hence the CDR locks at a position described by Equation 4.3. Figure 4.11 shows the pulse response samples at the CDR lock position. P h 1 = h 1 + K DIV K SUM K AV G K INT h 1 = h (4.3) h 0 h 1 h 2 Channel C, 5Gbps, Φ AVG =0.0UI h -1 h -1 h 0 h 1 Channel C, 5Gbps, Φ AVG =0.25UI h -1 h 0 h 1 h 2 h 2 Channel C, 10Gbps, Φ AVG =0.0UI h -1 h 0 h 1 h 2 h 2 Channel C, 10Gbps, Φ AVG =0.25UI h -1 h 0 h 1 h 2 h 2 Channel D, 10Gbps, Φ AVG =0.0UI h -1 h 0 h 1 h 2 h 2 Channel D, 10Gbps, Φ AVG =0.25UI Time (ns) Figure 4.11: Combined channel and interpolator pulse responses showing ISI tap values (h 1, h 0, h 1, h 2, h 3 ) when CDR has locked

86 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 71 Figure 4.12 and 4.13 show that transient output of the adaptation controller given Channels C and D, respectively, at 10Gbps. Each figure demonstrates that c 1 and c 2 converge during CDR start-up even when initialized to different values (e.g. 0 or 30). The coefficients settle in approximately 13µs. Adapted c 1 22 Adapted c 2 8 Figure 4.12: Simulated DFE adaptation with Channel C at 10Gbps. DFE converges to same steady-state values when given different initial coefficients (i.e. 0 and 30) Adapted c 1 24 Adapted c 2 10 Figure 4.13: Simulated DFE adaptation with Channel D at 10Gbps. DFE converges to same steady-state values when given different initial coefficients (i.e. 0 and 30)

87 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 72 Table 4.1 compares the adapted values, c 1 and c 2, to the pulse response samples, h 1 and h 2. Table 4.1: Comparison of Adapted Coefficients (c 1 and c 2 ) vs. Pulse Response (h 1 and h 2 ) Channel C, 5Gbps Channel C, 10Gbps Channel D, 10Gbps c 1 c 1 K ADC K INT h 1, Φ AVG = 0UI h 1, Φ AVG = 0.25UI c 2 c 2 K ADC K INT h 2, Φ AVG = 0UI h 2, Φ AVG = 0.25UI Figure 4.14 depicts a CDR model used to simulate the eye diagrams in Figures 4.15, 4.16, and The c 1 and c 2 coefficients are set to the values in Table 4.1 and φ AV G is forced to either 0UI (no interpolation) or 0.25UI (worst-case interpolation). ADC Output Interpolator Output (x k ) c 1 c 2 DFE Output Data Signal 3-bit ADC Data Interpolator 2-tap DFE Decision block A k (TX RJ = 0.17 UIpp) CK RX MMPD 4 S Digital LF (RX RJ = 0.23 UIpp) Φ AVG 77 Figure 4.14: Simplified diagram of CDR model used for eye diagram simulations Figure 4.18 shows the simulated jitter tolerance of the receiver with a PRBS-31 data source and bit error rate (BER) of The ADC is modeled as an ideal 3-bit ADC. The data source and blind receiver clocks are simulated with 0.17UI P P and 0.23UI P P of random jitter, respectively.

CDR 73 1.0 Data Signal 7 ADC Output 6 5 0.

0 Interpolator Output (Φ AVG =0UI) 0 DFE

25UI) 100 100 0 0-100 -100 0.0 0.2 0.4 0.6 0.

88 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR Data Signal 7 ADC Output Interpolator Output (Φ AVG =0UI) 0 DFE Output (Φ AVG =0UI) Interpolator Output (Φ AVG =0.25UI) DFE Output (Φ AVG =0.25UI) Phase (UI) Phase (UI) Figure 4.15: Simulated eye diagrams with 5Gbps data and Channel C. Eye diagrams correspond to signals in Figure 4.14

CDR 74 1.0 Data Signal 7 ADC Output 6 5 0.

89 Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR Data Signal 7 ADC Output Interpolator Output (Φ AVG =0UI) 0 DFE Output (Φ AVG =0UI) Interpolator Output (Φ AVG =0.25UI) DFE Output (Φ AVG =0.25UI) Phase (UI) Phase (UI) Figure 4.16: Simulated eye diagrams with 10Gbps data and Channel C. Eye diagrams correspond to signals in Figure 4.14

To learn fundamentals of high speed I/O link equalization techniques.

1 ECEN 720 High-Speed Links: Circuits and Systems Lab5 Equalization Circuits Objective To learn fundamentals of high speed I/O link equalization techniques. Introduction An ideal cable could propagate