A 5Gb/s Speculative DFE for 2x Blind ADC-based Receivers in 65-nm CMOS. Siamak Sarvari

Size: px

Start display at page:

Download "A 5Gb/s Speculative DFE for 2x Blind ADC-based Receivers in 65-nm CMOS. Siamak Sarvari"

James Flowers
5 years ago
Views:

1 A 5Gb/s Speculative DFE for 2x Blind ADC-based Receivers in 65-nm CMOS by Siamak Sarvari A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto c Copyright by Siamak Sarvari 2010

2 A 5Gb/s Speculative DFE for 2x Blind ADC-based Receivers in 65-nm CMOS Siamak Sarvari Master of Applied Science, 2010 Graduate Department of Electrical and Computer Engineering University of Toronto Abstract This thesis proposes a decision-feedback equalizer (DFE) scheme for blind ADCbased receivers to overcome the challenges introduced by blind sampling. It presents the design, simulation, and implementation of a 5Gb/s speculative DFE for a 2 blind ADC-based receiver. The complete receiver, including the ADC, the DFE, and a2 blind clock and data recovery (CDR) circuit, is implemented in Fujitsu s 65- nm CMOS process. Measurements of the fabricated test-chip confirm 5Gb/s data recovery with bit error rate (BER) less than in the presence of a test channel introducing 13.3dB of attenuation at the Nyquist frequency of 2.5GHz. The receiver tolerates 0.24UI PP of high-frequency sinusoidal jitter (SJ) in this case. Without the DFE, the BER exceeds 10 8 even when no SJ is applied. ii

3 Acknowledgments Primarily, I would like to thank my supervisor, Prof. Ali Sheikholeslami, for his invaluable guidance and support. I thank Prof. Tony Chan Carusone, Prof. Roman Genov, and Prof. T.J. Lim for serving on my thesis examination committee. Their useful comments and suggestions have enriched this thesis. There are many students who deserve thanks for their help and friendship. Special thanks go to Tina Tahmoureszadeh for her priceless assistance with the RTL description of this work before the tapeout. Thank you to Oleksiy Tyshchenko for the helpful discussions and constant feedback. Thanks go to Shayan Shahramian and Behrooz Abiri for their help with the measurements of the test-chip. My thanks go to other fellow members of Ali s research group as well for their constructive comments. I also need to thank Mike Bichan for lending us the test channels. I would like to thank those at Fujitsu with whom I have worked over the course of this degree for their support and technical advice. In particular, I would not have been able to get to this point without the effective assistance of Hirotaka Tamura and Masaya Kibune. Finally, I thank my friends for their tolerance and my parents, my brother, and my sister for their support, love, and understanding over these years. iii

4 Contents List of Figures List of Tables vi viii 1 Introduction Motivation Thesis Objectives Thesis Outline Background Effect of Channel Impairments Equalization Pre-equalizers Post-equalizers A. Linear Equalizers B. Decision-Feedback Equalizer (DFE) Clock and Data Recovery ADC-based Receivers Blind versus Phase-tracking CDR Equalization in Blind versus Phase-tracking Receivers Proposed Decision-Feedback Equalizer Receiver Overview DFE in Blind versus Phase-Tracking Receivers Proposed DFE Scheme Look-ahead DFE Configuration Complete Receiver Architecture Behavioural Simulations Transceiver Model and Channels DFE Performance A. Equalized Eye Diagrams B. Receiver Bit Error Rate C. Bit Error Rate Sensitivity Receiver Performance Verilog Implementation iv

5 Contents v 3.8 Summary Experimental Results Receiver Layout and Equipment Setup Channel Measurements Measured S-parameters and Step Responses Measured Eye Diagrams and Jitter Characteristics ADC Performance DFE Performance Receiver Bit Error Rate Bit Error Rate Sensitivity Complete Receiver Performance Simulated versus Measured Results Summary Conclusions and Future Directions Thesis Contributions Future Directions A Alternative Architectures 71 A.1 Phase-Independent Data-Decision Scheme A.2 DFE without Phase Equalization References 75

6 List of Figures 2.1 Block diagram of a generic high-speed transceiver An IEEE802.3ap standard backplane channel Effect of channel on a single pulse An example of pre-equalization Equalization in the receiver Linear post-equalization A simple DFE The prevention of noise enhancement in a DFE DFE coefficients obtained from the channel pulse response The critical path in DFE Relaxation of the timing constraint in a look-ahead DFE architecture Timing relationship between the recovered clock and data A generic phase-tracking CDR The input-output characteristics of different phase detectors Operation of a phase-tracking CDR with the Alexander phase detector A generic blind-oversampling CDR An example of blind-oversampling CDR operation Comparison of binary and ADC-based receivers A generic phase-tracking ADC-based CDR Blind ADC-based CDR architectures A 2-tap FFE in ADC-based receivers DFE coefficients in a blind versus phase-tracking receiver Simplified block diagram of the receiver (without a DFE) The sampling clock and DFE coefficient in a blind receiver versus a phase-tracking receiver Removing ISI using DFE coefficients determined by sampling intervals Full-rate implementation of the proposed DFE The critical path in the DFE feedback loop Implementation of the full-rate speculative DFE Implementation of the demuxed-by-8 DFE/CDR in the receiver Using floating observation windows to avoid non-causality issues for the DFE-based receiver Block diagram of the Simulink transceiver model vi

7 List of Figures vii 3.10 Simulated eye diagrams before and after DFE: PRBS & 100ppm Δf TX RX Simulated eye diagrams before and after DFE: PRBS & 100ppm Δf TX RX Simulated sensitivity of BER to the scaling of DFE coefficients Simulated receiver jitter tolerance Simplified block diagram of the digital logic Simplified block diagram of the test setup Micrograph of the test-chip Measurement setup Measured S 21 of the test channels Data path with Channel Measured eye diagrams and jitter characteristics with Channel Data path with Channel Measured eye diagrams and jitter characteristics with Channel Data path with Channel Measured eye diagrams and jitter characteristics with Channel Testing ADC functionality Eye diagrams measured at the probe-card input and ADC output Measured sensitivity of BER to the scaling of DFE coefficients Measured receiver jitter tolerance Simulated versus measured BER sensitivity Simulated versus measured receiver jitter tolerance A.1 The new data-decision scheme needs Φ PICK only A.2 Simplified block diagram of DFE/CDR with the new data-decision scheme A.3 An example of incorrect decisions when high-frequency jitter is larger than 0.5UI PP A.4 Simplified block diagram of DFE/CDR with D-DFE architecture... 74

8 List of Tables 3.1 Summarized characteristics of the test channels Quantized DFE coefficients used in simulations Simulated BER: 10 7 bits & 1UI PP SJ at 100kHz Summarized characteristics of the test channels Pin description of the test-chip Summary of eye opening and jitter values for the test channels Quantized DFE coefficients used for measurements BER Measurements with 0.9UI PP SJ at 100kHz viii

9 List of Acronyms AFE analog front-end APR average phase recovery block ADC analog-to-digital converter BER bit error rate CDR clock and data recovery CID consecutive identical digits CMOS complimentary metal-oxidesemiconductor DCS DFE coefficient selector DD data decision block DDJ data-dependent jitter DDS data decision scheme DeMUX demultiplexer DFE decision-feedback equalizer DSP digital-signal processor ENOB effective number of bits FFE feed-forward equalizer FIFO first in, first out FIR finite impulse response FLL Fujitsu Laboratories Limited Gb/s Gigabits per second GBd Giga Baud ix

10 x List of Acronyms I/O input/output IC integrated circuit ISI inter-symbol interference LPF low-pass filter LSB least significant bit LUT look-up table MSB most significant bit MUX multiplexer NRZ non-return-to-zero PCB printed circuit board PD phase detector PHY Physical Layer PI phase interpolator PLL phase-locked loop PRBS pseudo-random binary sequence PVT process, voltage, and temperature RTL register transfer level RX receiver SJ sinusoidal jitter SNR signal-to-noise ratio SSC spread spectrum clocking TJ total jitter TX transmitter UI unit interval VCO voltage-controlled oscillator VNA vector network analyzer

11 1 Introduction With advances in silicon technologies and rapid rise of processor speeds and digital computing capabilities, demands for high-bandwidth transmission of data over backplane channels continue to increase. The IEEE802.3ap standard (Ethernet over electrical backplanes) has already reached signaling speeds of and Giga Baud (GBd) (per lane) for Physical Layer (PHY) families of 10GBASE-KX4 and 10GBASE-KR. The improvement of transmission channels, however, has been more modest compared to the increase in the data rates. Moreover, lower-cost channel materials, which exert higher attenuation on the data, have been continually used for economical viability. The limited bandwidth of these channels is mainly the result of the skin effect and dielectric losses above 1GHz; other channel effects include reflections caused by impedance discontinuities and cross-talk with adjacent channels. In multi-gb/s input/output (I/O) links, such non-idealities result in considerable broadening of transmitted pulses to greater than one unit interval (UI) by the time they arrive at the receiver. Each transmitted symbol may thus interfere with the preceding and/or succeeding symbols causing inter-symbol interference (ISI) hence degrading the transmission bit error rate (BER). 1.1 Motivation A commonly used technique to remove (or mitigate) the ISI is known as equalization; an equalizer compensates for the frequency-dependent attenuation of the transmitted signal caused by the channel. Various equalization methods have been developed in both analog and digital domains [1, 2, 3, 4, 5, 6, 7, 8, 9]. An ADC-based receiver incorporates an analog-to-digital converter (ADC) at the front-end to enable intensive equalization of the data in the digital domain [10, 11, 12, 13, 14, 15, 16]. Digital equalization is highly programmable and less sensitive to process, voltage, and temperature (PVT) variations than analog circuitry, and its power and area scale with process [15]. 1

12 2 1 Introduction In addition to the above advantages, ADC-based receivers facilitate an all-digital receiver implementation which is desirable as it can significantly reduce the development period of the transceiver. However, achieving this end requires a fully digital clock and data recovery (CDR) circuit. The CDR architectures employed in [10, 11] attain such implementations by removing the voltage-controlled oscillator (VCO) (i.e., an analog circuit) from the clock recovery loop and instead sampling the incoming data using a blind clock (a clock which is not frequency- or phase-locked to the embedded clock); the CDR in this case relies on digital logic to detect the phase and recover the data. The blind receivers presented in [10, 11] incorporate a feed-forward equalizer (FFE) to compensate for the ISI. However, an FFE, as a linear equalizer, suffers from noise enhancement, that is, it amplifies the noise introduced by the channel when boosting the high-frequency components of the received signal. Noise enhancement can be avoided by using the decision-feedback equalizer (DFE) which is a commonly used non-linear equalizer. A combination of FFE and DFE is usually used to achieve high performance and reliability, especially in high-loss environments [3, 4, 15, 17, 18, 19, 20]. In phase-tracking receivers (in which the sampling clock is aligned with the embedded clock), a DFE relies on the clock recovered by the CDR for its operation. In designing a DFE for blind receivers, therefore, the main issue is that there is no recovered clock; the sampling clock in such receivers is completely blind, and thus the data samples no longer correspond to the centre of the eye or to any specific data phase. This thesis proposes a DFE scheme to overcome this challenge. 1.2 Thesis Objectives This thesis presents the design and implementation of a DFE for a 2 blind ADCbased receiver. This is the first reported DFE design for blind receivers to the best of the author s knowledge. The specific objectives of the thesis are the following: Investigating the feasibility of a DFE for blind receivers and proposing a scheme to overcome the challenges introduced by blind sampling. Designing and implementing the proposed DFE in a 2 blind ADC-based receiver which employs the CDR architecture presented in [10].

13 1.3 Thesis Outline 3 Demonstrating the effectiveness of the DFE through behavioural simulations and measurements of the fabricated test-chip (i.e., showing considerable improvement in performance when the DFE is enabled). 1.3 Thesis Outline This thesis is organized as follows. Chapter 2 provides the necessary background for the discussions of the thesis; particularly, clock and data recovery, the decisionfeedback equalizer, and ADC-based receivers are discussed. Chapter 3 presents the details of the proposed DFE as well as the results of its behavioural simulations. Chapter 4 describes the measurements of the test-chip (fabricated to validate the proposed equalizer) and presents the measured results. Chapter 5 concludes the thesis and suggests future directions for research.

14 2 Background This chapter presents the fundamental problems in high-speed communications along with the commonly used techniques and architectures designed to overcome them. The described material provides the essential background for the discussions in the following chapters. We will specifically describe the details of the decision-feedback equalizers (DFE), clock and data recovery (CDR) circuits, and ADC-based receivers as they form the context for the contributions of this thesis. In general, high-speed communications or high-speed signaling refers to Gigabitper-second communication of data between a transmitter and a receiver through a communication channel. The channel can take many forms, such as optical fibers or Ethernet cables in network interfaces, USB cables in peripheral interfaces, or a printed circuit board (PCB) trace in high-speed chip-to-chip signaling. Regardless of their actual form, all of these practical channels exhibit certain non-ideal characteristics such as limited bandwidth and finite signal propagation speed, hence the term nonideal channels. These characteristics cause the transmitted signal to deviate from its initial form and usually result in data detection errors at the receiver, represented by the bit error rate (BER). Transmitter Chip Receiver Chip Core Logic Transmitter Channel Receiver Core Logic Figure 2.1: Block diagram of a generic high-speed transceiver. Fig. 2.1 depicts the block diagram of a generic high-speed transceiver, where the transmitter is integrated on one chip (integrated circuit (IC)) along with the source core logic, and the receiver is integrated on another chip along with the consumer core logic. The job of the high-speed transceiver is to move the data through a non-ideal channel and deliver it to the consumer circuit at multi-gb/s (i.e., multi-gigabits per second) with a BER lower than a specified value (typically in the range of to 4

15 2.1 Effect of Channel Impairments ). To achieve this target BER, the high-speed transceiver has to compensate for the impairments of the non-ideal channel. This usually necessitates complex circuits at the transmitter and/or receiver which occupy a large portion of the transceiver chip area; compact solutions with low power consumption are always desirable. The remainder of this chapter presents the important problems caused by the channel impairments and the corresponding design solutions adopted to overcome them. Section 2.1 describes how the impairments of the channel necessitate certain design techniques known as equalization in the transceiver and clock and data recovery at the receiver. Section 2.2 presents different types of equalization and elaborates on a popular type called the decision-feedback equalizer. Section 2.3 describes conventional types of clock and data recovery circuits. Section 2.4 discusses a special type of receivers known as ADC-based receivers; these receivers allow for more extensive equalization on the data and are becoming more popular as channels more severely attenuate the data at today s rapidly growing communication rates. 2.1 Effect of Channel Impairments The channel in backplane signaling is usually comprised of multiple elements, such as cables, connectors, and traces on a PCB. These elements are typically characterized by the channel s scattering parameters (S-parameters) as a function of frequency; in particular, S 21 of a 2-port network is defined as the forward voltage gain when the output port is terminated in a matched load. Fig. 2.2(a) depicts the cross-sectional structure of a high-performance PCB strip-line constructed and presented as part of IEEE802.3ap standard backplane channels [21]. The combination of each trace, surrounding insulators, and the ground planes forms the communication channel. The total length of the trace over the motherboard and two daughterboards is 1m. The measured S 21 of the channel is shown in Fig. 2.2(b). At high frequencies (typically 1GHz and above), the skin effect in the copper conductor and dielectric losses in the substrate cause this channel to increasingly attenuate the propagating signal [22]. In the time-domain, the result of high-frequency attenuation is the temporal spreading of a symbol transmitted during one symbol-period into adjacent symbol-periods [23]. The above phenomenon is known as inter-symbol interference (ISI) and can be further described with the following example. Imagine a transmitter transmitting binary symbols, 0 and 1, at a rate of f b. Therefore, the symbol-period, also known as

16 6 2 Background Typical Channel Width/Spacing/Width 7.75 mil/6.25 mil/7.75 mil Nelco SI (ε r = 3.49) (ε r = 3.21) Core " Prepreg " (a) Cross-section of PCB strip-line Log( S 21 ), db Frequency, Hz (b) S 21 of the channel. Figure 2.2: An IEEE802.3ap standard backplane channel. the unit interval (UI), is T b = 1 f b. Now imagine a single pulse sent by this transmitter through a channel similar to the one described above. Fig. 2.3 depicts this pulse at the input of the channel and the corresponding response at the output. As can be seen, the received signal, which we refer to as the pulse response in this case, does not reach the final value of 1 in the first UI and has non-zero values in the following UIs. A sequence of symbols, in general, can be regarded as the superposition of individual symbols shifted in time; therefore, such spreading of symbols causes the value of the symbol transmitted in one UI to affect the values of other symbols in the neighbouring UIs, hence the term ISI. As transmission rate increases, the symbols spread more in time and become harder to identify at the output of the channel, resulting in a higher BER. The technique commonly used to overcome this problem is equalization, i.e., compensating for the frequency-dependent attenuation of the transmitted signal caused by the channel. The desired combination of the channel and the equalizer results in

17 2.1 Effect of Channel Impairments UI 1UI Transmitted Pulse Received Pulse Figure 2.3: Effect of channel on a single pulse. an overall flat frequency response, eliminating ISI. Section 2.2 describes equalizers in detail. In addition to high-frequency attenuation that results in ISI, another impairment of channels is finite signal propagation velocity. For the channel described in Fig. 2.2 the propagation speed is approximately 16cm/ns. Therefore, it takes about 31 symbolperiods for the signal to travel across the 1m trace. Uncertainties in the signal propagation velocity along the channel result in timing uncertainty in the receiver, i.e., the received data stream is asynchronous. In addition, the received data is noisy and also suffers from non-idealities such as jitter (i.e., deviation of zero-crossings from their ideal position in time [24]) thus has to be cleaned up by a retimer flip-flop at the receiver. For optimally sampling the data, i.e., at the midpoint of each bit, and subsequent processing, timing information must be extracted from the received signal, e.g., in the form of a clock. The task of extracting such information and generating the clock is called clock recovery. The overall operation of recovering the clock and cleaning up the data is called clock and data recovery (CDR). We will describe conventional CDR architectures in Section 2.3.

18 8 2 Background 2.2 Equalization As described in Section 2.1, ISI results from linear amplitude and phase distortion in the channel that broadens the pulses and causes them to interfere with one another [23]. To eliminate the ISI we have to equalize the channel, meaning roughly that we filter to compensate for the distortion caused by the channel. Equalizers can be implemented in the transmitter [6, 9, 25, 26, 27], receiver [2, 3, 8, 11, 17, 18, 28], or both [4, 15, 19, 20]. The rest of this section separately describes transmitter and receiver equalizers Pre-equalizers Equalizers implemented in the transmitter are called pre-equalizers [4, 6, 19, 20, 25]. Pre-equalization compensates for the channel distortion by boosting the highfrequency content of the data before it is transmitted through the channel. In the time-domain, this translates to transitions of the transmitted signal being emphasized, hence the term pre-emphasis for this type of equalization. Fig. 2.4 illustrates preequalization through an example. After passing through the channel, the combination of pre-equalization and high-frequency channel attenuation cancel out, resulting in an ISI-free signal at the receiver input Transmitted Signal Received Signal Without Equalization With Pre-equalization Figure 2.4: An example of pre-equalization. An important drawback to pre-equalization is that it degrades the signal-to-noise ratio (SNR) at the receiver which can result in data errors [23]. As shown in Fig. 2.4, for the same signal swing at the transmitter output, pre-qualization has to be performed by attenuating the low-frequency content of the signal, i.e., de-emphasizing

19 2.2 Equalization 9 the signal when there is no transition. The high-frequency energy is also attenuated in the channel; therefore, the overall siganl energy is decreased. As the amount of noise introduced to the signal in the channel is unchanged, this results in a decreased SNR at the receiver Post-equalizers Non-linear Equalizer Received Data Linear Equalizer Recovered Bits FIR Figure 2.5: Equalization in the receiver. Equalization performed in the receiver is called post-equalization. The goal of postequalizers, like pre-equalizers, is to compensate for the channel distortion which results in ISI. Post-equalizers are typically divided into two main groups, linear and non-linear (or decision-feedback ); a receiver may incorporate either of these equalizers or both of them (as depicted in Fig. 2.5) to mitigate the ISI. A. Linear Equalizers As depicted in Fig. 2.6, linear equalization boosts the high-frequency content of the received data to counter the high-frequency attenuation of the channel. Therefore, the cascade of the channel and equalizer will ideally have a flat frequency response over the bandwidth of the signal. Linear equalizers are usually implemented as IIR boosting filters [2, 4, 8] or finite impulse response (FIR) filters [1, 28, 29] in analog. If an analogto-digital converter (ADC) is present at the front-end of the receiver, realization in the form of a digital feed-forward equalizer (FFE) is preferable since it facilitates an all-digital adaptation scheme [10, 11]. (Adaptive equalization addresses the problem of estimating the channel isolated pulse response and automatically adjusting an equalizer to equalize this channel [23].) The main drawback to linear post-equalization is that it amplifies or enhances any noise introduced by the channel, called noise enhancement. A linear equalizer boosts

20 10 2 Background Channel Linear Equalizer Equalized Channel 0dB 0dB 0dB f b 2 f b 2 f b 2 Figure 2.6: Linear post-equalization. the high-frequency energy of the signal but also amplifies the high-frequency noise energy. While the high-frequency signal energy is usually only a modest fraction of the total signal energy, the amplified high-frequency channel noise can increase the total noise energy by as much as an order of magnitude or more, resulting in decreased SNR [30]. Therefore, when channel attenuation is high, linear equalizers are usually followed by a non-linear equalizer (discussed shortly) and provide only a few db of boost. (Note that, although pre-equalizers do not amplify the noise, they suffer from noise enhancement as well; in this case, the reduction in SNR is the result of decreased transmitted signal power.) The problem of noise enhancement can be avoided by using non-linear equalization techniques such as decision-feedback equalization. We describe the details of decisionfeedback equalizers next. B. Decision-Feedback Equalizer (DFE) Decision-Feedback Equalizer Unequalized Signal x k y k â k Data Decision w k H(z) Figure 2.7: A simple DFE. A decision-feedback equalizer (DFE) is a non-linear equalizer that aims to remove the ISI from the current symbol by subtracting a replica of the ISI generated using previous data decisions and an estimate of the channel pulse response [3, 4, 5, 7, 12,

21 2.2 Equalization 11 15, 17, 18, 19]. Fig. 2.7 shows the block diagram of a simple DFE. x k is the value of the received signal at the sampling instant k, w k is the correction term (i.e., the predicted ISI), y k is the equalized signal at the slicer input, and â k is the receiver s decision about x k. In the absence of noise, we can write x k = a k + isi k (2.1) where a k and isi k respectively represent the transmitted symbol and total ISI at instant k. (Note that for simplicity and without loss of generality, we assume that transmitted pulse is not attenuated in the channel (i.e., only broadened), hence the received symbol, a k, is the same as the transmitted symbol, a k.) The total ISI is the sum of ISI caused by future symbols, called pre-cursor ISI, and ISI caused by past symbols, called post-cursor ISI; let us assume for now that pre-cursor ISI is negligible (or is removed by another equalizer), thus isi k mainly consists of post-cursor ISI (the former will be discussed shortly after this). The feedback filter, H (z), is a discrete-time FIR filter of length N (hence an N-tap DFE), H (z) =h 1 z 1 + h 2 z h N z N. Therefore, w k = N h i a k i (2.2) i=1 For this to be the ISI replica, i.e., to have w k = isi k, the past N decisions must be correct, i.e., â k i = a k i,i =1..N, and filter coefficients h m,m =1..N must be all non-zero samples of the channel pulse response; the latter will be described shortly. If these conditions are met, y k is ISI-free, i.e., y k = a k,thusâ k = a k. The main advantage of the DFE is that it does not cause noise enhancement. Fig. 2.8 shows how noise enhancement is avoided. In the absence of noise and assuming perfect filter coefficients and correct past decisions (or a very low BER), w k = isi k, hence the input to the slicer is a k, and the slicer has no effect (since â k = a k ). Therefore, the equalizers shown in Fig. 2.8(a) and Fig. 2.8(b) are equivalent when n k =0 (n k denotes the total noise on the received signal at instant k). In the presence of noise, however, the input to the slicer in Fig. 2.8(a) is a k + n k, while the the input to the slicer in Fig. 2.8(b) is a k + v k,wherev k is an amplified version of n k.inadfe, the slicer removes the noise that would otherwise recirculate to the input through

22 12 2 Background the feedback, i.e., a DFE boosts the high-frequency content of the signal without boosting high-frequency noise; moving the slicer out of the loop, however, as in the linear equalizer of Fig. 2.8(b), results in the amplification of noise in the loop causing noise enhancement [23]. Decision-Feedback Equalizer Linear Equalizer a k +isi k +n k a k +n k â k a k +isi k +n k a k +v k â k isi k H(z) H(z) (a) Slicer prevents noise enhancement. (b) Noise feedback in the absence of the slicer. Figure 2.8: The prevention of noise enhancement in a DFE. Another advantage of DFE is the simplicity of its implementation. Compared to linear discrete-time filters that require analog delay elements, the feedback in DFE consists of digital samples which can be delayed using flip-flops. A disadvantage of DFE, however, is that the decision-based feedback can result in error propagation. An error at the output of the slicer can distort future symbols through the feedback, potentially resulting in further errors. Therefore, it is common for DFE errors to occure in bursts [23]. Fig. 2.9 describes through an example how filter coefficients, also called DFE coefficients, must be chosen to ensure zero ISI at the sampling instant. Assuming a binary non-return-to-zero (NRZ) data with bits 1 and 0 ; the corresponding transmitted symbols are positive and negative pulses, a k {1, 1}. Fig. 2.9(a) shows the transmission of three consecutive 1 bits and the corresponding received pulses. The received signal is aligned with the transmitted data for clarity. As will be explained in Section 2.3, a phase-tracking CDR in the receiver ensures that the received data is sampled at the midpoint between data transitions, i.e., the centre of the UI also called the centre of the eye. Assuming noiseless communication, in the presence of ISI, the sample taken at the centre of the k th UI is x k =1+s k 1 + s k 2 ; s k 1 and s k 2 are the contributions from the past two bits, b k 1 and b k 2, that constitute the ISI term isi k.notethats k 1 and s k 2 are in fact the amplitude of the pulse response respectively 1 UI and 2 UIs away from the optimal sampling point. Therefore, the DFE will eliminate ISI from the current symbol if its coefficients (h 1 and h 2 )are

23 2.2 Equalization 13 UI k-2 UI k-1 UI k 1 s k-1 s k-2 b k-2 b k-1 b k Receiver Clock Sampling Edges (a) ISI in transmission of 111. Current UI 1 Optimal Sampling Point h 1 h 2 1UI 2UI (b) h 1 & h 2 must be pulse response samples. Figure 2.9: DFE coefficients obtained from the channel pulse response. samples of the pulse response, as illustrated in Fig. 2.9(b). In general, assuming an ISI length of N (i.e., the pulse response affects up to N following UIs) and assuming transmission of both 1 and 0 bits (i.e., 1 and -1 values for a k ), we can write x k = a k + isi k (2.3) where N isi k = a k 1.s k 1 + a k 2.s k a k N.s k N = a k i.s k i (2.4) and s k i,i =1..N are the samples of the pulse response i UI to the right of the optimal sampling point. Having an N-tap DFE, i=1 w k = N â k i.h i (2.5) i=1 and choosing h i = s k i,i=1..n results in complete cancellation of ISI when BER is low (i.e., when â k i = a k i ). In the process described above, the DFE removes the ISI caused by N past bits

24 14 2 Background (i.e., post-cursor ISI) from the current symbol. However, ISI can also be caused by future symbols when the pulse response has a non-zero value 1 UI (or farther) to the left of the optimal sampling point; this is called pre-cursor ISI as described earlier. A disadvantage of the DFE is its inability to compensate for pre-cursor ISI; this is because DFE uses past decisions for its operation while eliminating pre-cursor ISI requires knowledge of future decisions. The primary challenge in DFE design is the critical path formed by the feedback loop which limits the symbol rate. The delay of this critical path, shown in Fig. 2.10, mainly consists of the settling time of the adder output, y n (due to the large capacitance of the summing node) and the clock-to-output delay of the slicer (implemented as a latch) [30]. To allow for correct DFE operation, this delay must be less than one symbol period; more stringent limits must be met to ensure correct phase recovery in the receivers that employ oversampling clock and data recovery methods (described in Section 2.3). Satisfying such limits becomes increasingly difficult as data rates increase above multi-gb/s and often necessitates special circuit techniques. 1-Tap DFE x k y k â k-1 w k h 1 Figure 2.10: The critical path in DFE. One of the commonly used techniques to alleviate the above timing constraint is the look-ahead architecture originally proposed in [31]. Fig shows the block diagram of a 1-tap look-ahead DFE. Assuming binary NRZ data and symbols a k {1, 1}, there are only two possible values for the correction term, w k, based on the sign of the previous symbol, i.e., w k =â k 1.h 1 = ±h 1. The parallel paths of the look-ahead DFE correspond to these possible values of â k 1 (or equivalently bit values of 1 and 0 ); in each path, the DFE makes a tentative decision for the current symbol before the previous symbol is actually recovered. The correct decision is then selected in a 2:1 selector on the next clock edge. Speculating about the sign of the previous symbol allows us to remove the summing node from the feedback loop and relax the timing constraint, hence the equivalent name speculative DFE (the architecture is

25 2.2 Equalization 15 also known as loop-unrolling DFE). Eliminating the large summing node, especially in multiple tap designs, facilitates higher operating speeds [32]. This architecture, however, increases the hardware requirements and may complicate the clock recovery [33], resulting in an overall increase in the area and power consumption [34]. 1-Tap Look-Ahead DFE w k(+1) = +h 1 x k y k(+1) = x k h 1 â k(+1) +1 â k â k-1 y k( 1) = x k +h 1 â k( 1) 1 w k( 1) = h 1 Figure 2.11: Relaxation of the timing constraint in a look-ahead DFE architecture.

26 16 2 Background 2.3 Clock and Data Recovery A clock and data recovery (CDR) circuit uses the transitions between data bits to align the phase of the recovered clock with the data (or more accurately, with the phase clock embedded in the data), as shown in Fig. 2.12; the recovered clock is then used to sample the data around the centre of the eye (i.e., around the maximum opening) and recover the data. Received Data Data Transitions Align Clock Recovered Clock Data Sampled at Eye Centre Figure 2.12: Timing relationship between the recovered clock and data. Recovered Data Received Signal Phase Detector Φ ERR Loop Filter Recovered Clock VCO V CTRL Figure 2.13: A generic phase-tracking CDR. Fig depicts a traditional approach to CDR design, called phase-tracking CDR [35, 36, 37, 38, 39]. In this approach, a feedback loop forces the phase of a local oscillator to track the phase of the data derived from data transitions. Therefore, the clock (or phase) recovery loop in a phase-tracking CDR is identical to a phaselocked loop (PLL), except that the reference of the phase detector is the received data rather than a reference clock. The phase detector compares the received data and the recovered clock and produces a phase error for every data transition, ideally proportional to the phase difference between the two. This phase error, Φ ERR,is low-pass filtered and used to adjust the phase of the recovered clock. The clock is typically derived from a voltage-controlled oscillator (VCO) but can also be derived from a phase interpolator (PI) [40, 41]; the phase of a plesiochronous reference clock

27 2.3 Clock and Data Recovery 17 is shifted in the latter approach to track the phase of the received data. As shown in Fig. 2.13, the recovered clock samples the received signal to recover the data; in locked conditions, the samples are taken around the centre of the eye where the eye opening is maximum (it is thus the optimal point to slice noisy data). An important component of a CDR circuit is the phase detector (PD) whose inputoutput characteristic can be linear or non-linear. In a linear CDR, the output of the PD is a linear function of the phase difference at its input, as shown in Fig. 2.14(a). Hogge PD [42] is a popular example of a linear PD which extracts phase information by comparing the received signal with its samples taken around the centre of the eye. Another commonly used type of PD is the Alexander PD [43] with the non-linear input-output characteristic depicted in Fig. 2.14(b). Since the PD output exhibits only either of two possible values (determined solely by the sign of the input phase difference), this PD is also referred to as a bang-bang phase detector. An Alexander PD is a 2 sampling (or 2 oversampling) phase detector because it samples the received data twice per UI. Φ ERR Φ ERR Φ IN Φ IN (a) A linear PD. (b) Alexander PD. Figure 2.14: The input-output characteristics of different phase detectors. Fig. 2.15(a), shows the simplified block diagram of a phase-tracking CDR using an Alexander phase detector. The front-end flip-flops sample the received signal twice per UI, once around the eye centre and once close to the UI boundary (i.e., an edge sample). The signs of three consecutive samples (two centre samples and one edge sample) are investigated in the phase logic, as illustrated in Fig. 2.15(b), to determine whether the recovered clock is early or late with respect to the data; the phase logic generates early or late signals accordingly. These signals are then used to control a charge pump producing a current that is low-pass filtered to generate the VCO control voltage. The VCO output is used to clock the PD, sampling the received data on both edges of the clock; the eye centre is sampled on the rising edge while the UI

28 18 2 Background PD Recovered Data CP Received Signal Phase Logic late early LF Recovered Clock VCO V CTRL (a) CDR block diagram. Sampling clock is early Sampling clock is late Sequence 110 or or 011 otherwise PD Output early late (b) Phase detector logic. Figure 2.15: Operation of a phase-tracking CDR with the Alexander phase detector. boundary is sampled on the falling edge of the recovered clock. An alternative to phase-tracking CDR is called blind oversampling CDR (or simply blind CDR) [44, 45, 46, 47, 48]. Fig shows the block diagram of a generic blindoversampling CDR (which is also refer to as blind CDR). In this approach, instead of aligning the local clock with the data, the incoming signal is oversampled blindly, i.e., using a clock which is not frequency- or phase-locked to the embedded clock; the CDR then employs digital logic to detect data transitions and extract the phase. The phase information (typically an average data transition phase) is used to select one sample in each UI as the decision sample (i.e., the recovered bit). m, in the figure, denotes the oversampling ratio (OSR) which is the number of samples taken per UI. Fig illustrates the operation of a blind-oversampling CDR through an example, in which the OSR is 3 (i.e., a 3 oversampling CDR). In this case, the phase detector indicates that the data transitions occur, on average, between S 3 and S 1 ; therefore,

29 2.3 Clock and Data Recovery 19 Received Signal m Samples m:1 MUX Recovered Data Phase Detector Average Phase Blind Oversampling Clock Figure 2.16: A generic blind-oversampling CDR. Received Signal 3x Sampling Clock 3x Samples S 3 S S 2 S 3 S 1 S 2 S 3 S 1 S 2 S 3 S Decision Samples Average UI Boundary Figure 2.17: An example of blind-oversampling CDR operation. the CDR selects S 2 as the decision sample since it is farthest away from the average data transition phase (or average UI boundary), and hence most reliable. A major advantage of the blind-oversampling CDR is that the implementation of the design is entirely digital which simplifies the design procedure and makes it easily scalable with the process. In addition, this architecture responds faster to the changes in the phase of the embedded clock, compared to the phase-tracking approach in which the tracking speed is limited by the stability concerns in the feedback path [30]. A disadvantage of the blind oversampling approach, however, is that taking multiple samples per UI increases the hardware requirements of the receiver and complicates the design of the clock source and clock distribution network [30]. The most important parameter in the design of a blind-oversampling CDR is the OSR as it directly affects the high-frequency jitter tolerance and hardware requirements. Higher jitter tolerance is achieved by increasing the OSR; however, a higher OSR increases power and area [45]. The CDR circuits described in this section work with binary samplers (flip-flops) at the front-end. In a binary CDR, data samples are represented with only 1 bit (i.e., the sign). Some receivers, however, incorporate an ADC at the front-end instead of a flip-flop, thus dedicate more bits (e.g., 5 bits) to each sample, enabling digital

30 20 2 Background equalization of the data. The CDR in such receivers may also use these extra bits to enhance its performance. ADC-bsased receivers are discussed in the following section. 2.4 ADC-based Receivers An ADC-based receiver incorporates an ADC at the front-end to allow for a more complicated equalization of data in the digital domain. Such receivers are feasible solutions for 10Gb/s wired transceivers and their popularity increases as data rates rise to 20Gb/s and above [49]. Fig compares a generic ADC-based receiver to a generic binary receiver in which the received signal is equalized in the analog front-end (AFE); as shown in Fig. 2.18(a), the equalized signal is then sampled by flip-flops and represented by only one bit (i.e., the sign) when used for clock and data recovery. In an ADC-based receiver, however, the front-end ADC samples the received (or partially equalized) signal with more than one bit (e.g., 5 bits) thus enables further equalization of the digital samples prior to clock and data recovery (Fig. 2.18(b)). (Note that the CDR may also employ different architectures to benefit from the additional bits.) ANALOG BINARY Received Signal Equalizer Recovered Clock C D R Recovered Data (a) A generic binary receiver. ANALOG DIGITAL Received Signal Equalizer ADC Equalizer Recovered Clock C D R Recovered Data (b) A generic ADC-based receiver. Figure 2.18: Comparison of binary and ADC-based receivers. Another advantage of having digital equalizers is that it enables an all-digital adaptation loop, in which no analog control signals are required. Also, the presence of both unequalized and equalized samples in the digital domain can potentially simplify the adaptation engine [11].

31 2.4 ADC-based Receivers 21 A disadvantage of ADC-based receivers is the extra power and area associated with the ADC which usually results in an overall larger silicon area and higher power consumption compared to their binary counterparts. Although when combined with the scaling of the digital-signal processor (DSP) engine, the die area and power of the ADC-based front-end approaches that of a conventional binary front-end, the latter seems to be the more economical solution for data rates up to 10Gb/s with NRZ coding [49]. Both previously discussed CDR architectures, phase-tracking and blind, canbe realized with an ADC-based front-end. When an all-digital implementation is desired for the receiver, the latter is the only possibility as it eliminates the analog feedback in the clock recovery loop. However, blind sampling can complicate the design of digital equalizers, especially the DFE. In the remainder of this section, we briefly discuss clock and data recovery and equalization in phase-tracking and blind ADC-based receivers; in particular, we will describe the challenges in the design of decisionfeedback equalizers for blind receivers Blind versus Phase-tracking CDR Many recently reported ADC-based receivers align the sampling clock with the data in a phase-tracking architecture generically depicted in Fig [13, 14, 15, 16]. In this case, when the receiver is Baud-rate sampling (i.e., the ADC samples the incoming signal once per UI), the Mueller-Müller scheme [50] is typically used for timing recovery since the phase-tracking samples are taken around the eye centre [13, 14, 15]. A bang-bang phase detector can be employed in the CDR when the receiver is 2 sampling [16]. Once timing information is extracted, all of these CDRs accordingly adjust the sampling clock phase in a VCO or PI to track the phase of the data. Data Decision D OUT RX IN ADC Timing Recovery DAC Control Signal VCO/PI Recovered Clock Reference Clock Figure 2.19: A generic phase-tracking ADC-based CDR.

32 22 2 Background New Sample Set Data Decision D OUT RX IN ADC Interpolator Phase Recovery Blind Sampling Clock Interpolation Index Updater (a) Interpolating feedback CDR. Data Decision D OUT RX IN ADC Phase Detector Φ X Average Phase Recovery Φ AVG Blind Sampling Clock (b) Feed-forward CDR. Figure 2.20: Blind ADC-based CDR architectures. To achieve an all-digital CDR implementation, VCO and PI, which are analog circuits, must be removed from the clock recovery loop. As shown in Fig. 2.20(a), the receivers in [51, 52] blindly sample the received signal; the CDR then interpolates between the blind samples to obtain a new set of samples used to recover phase and data. The main disadvantage here is that the interpolator is relatively complex and contributes to the loop latency [10]. Fig. 2.20(b) shows the simplified block diagram of the receiver implemented in [10]; the proposed CDR in this case estimates the phase directly from the samples blindly taken from the received signal, hence eliminating the need for an interpolator. Having an ADC-based front-end facilitates digital equalization and allows area and power scalability. A blind ADC-based receiver is favourable to enable all-digital implementation of the receiver. However, in some cases blind sampling may complicate the design of equalizers. Equalization in ADC-based receivers is described next Equalization in Blind versus Phase-tracking Receivers A commonly used equalizer in ADC-based receivers is the feed-forward equalizer (FFE) [10, 11, 13, 14, 15]. An advantage of digital equalization is that it is straightforward to include FFE as a delay-and-add function without any noise-sensitive analog

33 2.4 ADC-based Receivers 23 delay elements [15], as shown in Fig. 2.21; this makes FFE a popular equalizer for ADC-based receivers. FFE RX IN ADC To DFE or CDR Blind-Sampling or Phase-Tracking Clock FFE Coefficient Figure 2.21: A 2-tap FFE in ADC-based receivers. For a blind receiver, an FFE can be obtained in the same way as a phase-tracking receiver [10, 11], because FFE operation is independent of the sampling clock phase. In the 5Gb/s receiver of [11] an adaptive digital FFE, which is a half-ui-spaced 2-tap FIR filter, compensates for a channel loss up to 15dB at the Nyquist frequency of 2.5GHz. However, as described in Section 2.2.2, an FFE as a linear equalizer suffers from noise enhancement, and a decision-feedback equalizer (DFE) can be employed to avoid this issue. Usually, a combination of FFE and DFE offers the best performance when compensating for large channel losses [13, 15]. In the phase-tracking ADC-based receiver of [15], for instance, a 2-tap programmable FFE is used along with a 5-tap adaptive DFE; the transceiver (including a 4-tap FIR pre-equalizer) compensates for 24dB of attenuation at 3.75GHz (while operating at 7.5Gb/s). Another phasetracking example is the receiver in [13] which operates at 10Gb/s; the DSP-based equalizer in this case consists of an adaptive FFE followed by an adaptive DFE and compensates for 26dB of attenuation at 5GHz (with the help of the pre-equalizer and AFE). In phase-tracking receivers with ADC-based front-ends, illustrated in the above examples, the design of a DFE is relatively straightforward; an N-tap DFE removes the contributions of N previous bits from the sample taken from the centre of the current UI. As described in Section 2.2.2, these contributions are equal to the magnitude of the channel pulse response 1, 2,.., N UIs away from the optimal sampling point. Therefore, as long as the samples of the received signal are taken around the centre of the eye, choosing the N DFE coefficients according to Fig. 2.22(a) (shown for N=2) garantees elimination of the post-cursor ISI. Note that, since phase-tracking samples always fall at the same known location with respect to UI boundaries (i.e., the eye

34 24 2 Background centre), these coefficients are costant during the operation of the receiver (assuming no change in channel characteristics). Current UI 1 Optimal Sampling Point h 1 h 2 Current UI 1UI 2UI (a) In a phase-tracking receiver. Blind Samples (b) In a blind receiver. Figure 2.22: DFE coefficients in a blind versus phase-tracking receiver. As described above, DFE, unlike FFE, relies on the phase of the recovered sampling clock for its operation. In a blind receiver, however, there is no recovered clock; the sampling clock is completely blind, thus the samples of the received signal do not correspond to the centre of the eye or any specific data phase as illustrated in Fig. 2.22(b). Moreover, the location of the samples with respect to the UI boundaries changes during the operation of the receiver due to any frequency offset which is inevitable for such a receiver. These issues make the design of a DFE for blind receivers non-trivial. In the following chapter we propose a scheme to overcome the above problems. The complete design of a DFE for a 2 blind ADC-based receiver is described, and the performance is verified through behavioural simulations. The measurement results of the fabricated test-chip follow in Chapter 4.

35 3 Proposed Decision-Feedback Equalizer The demand for higher data throughputs in backplane applications is growing faster than the improvements of high-loss backplane channels. This has increased the popularity of ADC-based receivers (described in Section 2.4) as they allow for more extensive digital equalization, as well as power and area scalability. Moreover, if all the equalization is performed in the digital domain and the analog circuitry of the CDR is removed, an all-digital receiver implementation can be realized that considerably reduces the development period of the high-speed transceivers. As described in Section 2.4.1, all-digital CDR architectures have already been proposed that operate with blind sampling clocks and thus remove the analog circuit (VCO or PI) traditionally required in the clock recovery loop of a phase-tracking CDR [10, 11, 51, 52]. These receivers blindly sample the received signal and use digital (numerical) processing to detect the phase and recover the data. The blind receiver in [11] proposes a fully digital adaptive FFE and incorporates a blind CDR architecture to obtain the desired all-digital implementation. Since FFE operation is insensitive to the phase of the sampling clock, the design of an FFE for a blind receiver is similar to a phase-tracking receiver and hence straightforward; an FFE, however, suffers from noise enhancement as a linear equalizer. As described in Section 2.2.2, a popular alternative is the decision-feedback equalizer (DFE) (a nonlinear equalizer) which does not enhance noise. However, its operation, unlike FFE, relies on the phase of the receiver sampling clock. As will be discussed in Section 3.2, this characteristic makes the design of a DFE for blind receivers non-trivial. No DFE has been shown thus far for such receivers to the best of the author s knowledge. This chapter presents the design and simulation of a look-ahead (speculative) DFE for a 2 blind ADC-based receiver. The remainder of the chapter is organized as follows. Section 3.1 presents an overview of the blind receiver for which the DFE is designed. Section 3.2 discusses the challenges of DFE design for blind receivers compared to DFE design for phase-tracking receivers. Section 3.3 proposes a scheme to overcome these challenges and presents the design of a DFE for a 2 blind ADC- 25

36 26 3 Proposed Decision-Feedback Equalizer based receiver. The implementation of the DFE in a look-ahead structure and the complete demuxed-by-8 architecture of the receiver are respectively discussed in Sections 3.4 and 3.5. Section 3.6 presents the behavioural simulations of the receiver. Section 3.7 describes the implementation of the receiver in Verilog, and Section 3.8 summarizes this chapter. 3.1 Receiver Overview This section presents an overview of the receiver for which the DFE is designed to provide the context necessary for the discussions of this chapter. Fig. 3.1 depicts a simplified full-rate block diagram of our receiver before the DFE is added (the actual demuxed-by-8 architecture is discussed in Section 3.5). The front-end 5-bit ADC takes two samples per UI, hence a 2 sampling receiver. For clock and data recovery, the receiver employs the architecture proposed in [10]; this 2 blind CDR estimates the phase directly from ADC samples in a feed-forward architecture and uses this phase information to recover the data. Below we briefly describe the main building blocks of this feed-forward CDR (see [10] for more details). Clock/Data Recovery RX IN ADC 2 Phase Detector Φ X Average Phase Recovery Φ AVG 2x Blind Sampling Clock Data Decision Recovered Data Figure 3.1: Simplified block diagram of the receiver (without a DFE). The CDR consists of three main building blocks: the phase detector (PD), data decision block (DD), and average phase recovery block (APR). The PD uses three consecutive ADC samples (which form a 1-UI observation window as a result of 2 sampling) to linearly estimate the location of the zero-crossing (if a data transition exists); this zero-crossing phase is called the instantaneous phase, Φ X. The APR recovers the average zero-crossing phase, Φ AV G,fromΦ X values: for every UI, it first subtracts the current value of Φ AV G from Φ X with a modulo-1 operation to generate the phase error, Φ ERR ; similar to a conventional binary CDR, it then lowpass filters Φ ERR values to recover Φ AV G. The average eye-centre phase (also called

37 3.1 Receiver Overview 27 the picking phase, Φ PICK ) is generated by modulo-1 addition of 0.5UI and Φ AV G. The DD compares Φ PICK and Φ X to pick one of the three consecutive samples that is, loosely speaking, closest to Φ PICK and farthest from Φ X ; the sliced binary value of this digital sample is the recovered bit for the 1-UI window specified by the samples. To accommodate 5Gb/s operation, the CDR in [10] is implemented in a demuxedby-16 structure; we employ a demuxed-by-8 structure in our receiver to reduce the hardware requirements, similar to the receiver in [11] (this receiver incorporates a blind feed-forward CDR architecture which is similar to the one used in [10]). In this demuxed-by-8 configuration, the 16 samples of an 8-UI frame are processed simultaneously in one cycle of a 1 rate (i.e., 625MHz) clock. The CDR basically 8 consists of 8 parallel paths identical to the one described above, except that there is only one low-pass filter averaging over all 8 phase error values. In the end, normally 8 bits are recovered from the 16 samples of each frame. Due to the plesiochronous nature of a blind-sampling transceiver, however, the 16 samples of an 8-UI frame may sometimes correspond to 9 or 7 transmitted bits; that is, if the embedded clock is slightly faster/slower than the receiver sampling clock, the CDR must output 9/7 bits (where needed) to compensate for this frequency difference. A part of the CDR called the cycle-slip monitor checks for this phenomenon by monitoring Φ PICK ; whenever Φ PICK crosses the UI boundary (i.e., a cycle-slip occurs) this block signals the CDR to output 9 or 7 bits depending on the direction of the boundary crossing [11, 10]. This variable-length output data is written in a FIFO placed in the Physical Layer (PHY) logic to generate fixed-length 16-bit outputs (312.5-MHz data); the PHY performs a flow control to prevent a FIFO overflow [11].

38 28 3 Proposed Decision-Feedback Equalizer 3.2 DFE in Blind versus Phase-Tracking Receivers Fig. 3.2 compares a generic 1-tap DFE in a phase-tracking receiver with the one in a blind receiver. In a phase-tracking receiver, the ADC samples the data at the centre of the eye using the clock recovered by the CDR, i.e., sampling phase in known and constant in the absence of jitter. Therefore, the magnitude of the contribution of each bit on the next UI s sample is one constant value (assuming no change in channel characteristics). As described in Section and shown in Fig. 3.2(a), this contribution is equal to the magnitude of the pulse response one UI after the optimal sampling point (which corresponds to the centre of the eye). A constant DFE coefficient equal to this value will thus eliminate the post-cursor ISI. RX IN ADC Digital DFE/CDR Clock/Data Recovery Recovered Data Pulse Response for b n-1 DFE Coefficient ISI Replica Generation Recovered Clock UI n-1 UI n Reference Clock (a) A phase-tracking receiver block diagram and its associated waveforms Digital DFE/CDR Pulse Response for b n-1 RX IN ADC Clock/Data Recovery Recovered Data Φ AVG ISI Replica Generation UI n-1 UI n Blind Sampling Clock (b) A blind receiver block diagram and its associated waveforms Figure 3.2: The sampling clock and DFE coefficient in a blind receiver versus a phasetracking receiver. In a blind receiver, however, there is no recovered clock; the sampling clock is completely blind. Therefore, the samples of the incoming signal no longer correspond to the centre of the eye, and the DFE coefficient of Fig. 3.2(a) cannot be used in this case. Moreover, as depicted in Fig. 3.2(b), any frequency offset (or low-frequency jitter) in a blind receiver will cause the phase of the sampling clock to constantly

39 3.3 Proposed DFE Scheme 29 sweep the entire UI; this makes the task of the DFE even harder as it must be able to modify its coefficient during the operation of the receiver. 3.3 Proposed DFE Scheme Fig. 3.3(a) shows the actual pulse response of a generic high-loss channel corresponding to b n 1 ; the tail of this signal stretches over the next UI (UI n ) due to the limited bandwidth of the channel. The desired pulse response is derived such that the sum of two consecutive pulses results in an approximately constant value. The difference between the actual and desired pulse responses represents the ISI caused by the channel. This ISI, however, is not constant and varies depending on the location of the sampling time within one UI. To provide a DFE coefficient for each sampling time, we propose dividing the nominal UI into 8 equal intervals, I [0:7], and assigning one coefficient to each interval, as illustrated in Fig. 3.2(a). These coefficients, α [0:7],represent the interference from b n 1 in I [0:7]. During the operation of the receiver, the DFE uses the average transition phase, Φ AV G, recovered by the CDR to select two coefficients among 8, which correspond to the sampling time. Φ AV G1, the modulo-1 version of Φ AV G, indicates the distance between the second sample of the current UI, S 2, and the next nominal UI boundary. As an example, Fig. 3.2(a) illustrates the case where S 2 falls in I 2 ; therefore, in this case the selected coefficients for S 1 and S 2, c 1 and c 2, are respectivelly equal to α 6 and α 2. UI n-1 Desired Actual UI n UI n-1 I 7 I 6 I 5 I 4 I 3 I 2 I 1 I 0 c 1 =α 6 c 2 =α 2 b n-1 b n b n-1 S 1 S 2 Φ AVG (a) ISI from b n 1 on b n (b) DFE coefficient for each interval Figure 3.3: Removing ISI using DFE coefficients determined by sampling intervals. Fig. 3.4(a) depicts the full-rate implementation of the proposed DFE in a simplified block diagram. Two ADC samples of the current UI, S [1:2], along with one delayed sample of the previous UI, S 0, form the 3 consecutive samples, S [0:2], needed by the CDR to recover b n. The DFE coefficient selector (DCS) uses the current value of

40 30 3 Proposed Decision-Feedback Equalizer S [1:2] 2 S [0:2] S [0:2] Clock/Data Recovery b n Φ AVG Recovered Data δ [0:2] X 2 X 0 ISI Replica Generator X [1:2] X [0:2] c [1:2] DCS c 2 c [1:2] δ [1:2] 1 c [1:2] 0 (-1) b n-1 δ [1:2] δ 0 b n-2 b n-1 (a) Simplified receiver block diagram DFE Coefficient Selector Φ AVG1 mod 1 Φ AVG ISI on S [0:2] b n-1 affects S [1:2] with c [1:2] b n-2 affects S 0 with c 2 b n-2 b n-1 b n c [1:2] c α 4 α 5 α 6 α 7 α 0 α 1 α 2 α 3 c α 0 α 1 α 2 α 3 α 4 α 5 α 6 α 7 S 0 S 1 S 2 (b) ISI on S [0:2] (c) DFE coefficient selector (DCS) Figure 3.4: Full-rate implementation of the proposed DFE. Φ AV G to select c [1:2] out of α [0:7]. These coefficients are multiplied by the sign of b n 1 to generate the correction terms (ISI replica), δ [1:2],forS [1:2]. ISI must also be removed from S 0 ; the correction term for this sample, δ 0, is obtained by multiplying c 2 by the sign of b n 2. Note that, as shown in Fig 3.4(b), the effect of b n 2 on S 0 is the same as the effect of b n 1 on S 2 (ignoring the change of Φ AV G in 1UI), hence c 2 can be used to generate δ 0. Finally, δ [0:2] are subtracted from S [0:2] to remove the post cursor ISI, and the resulting ISI-free samples, S [0:2], are used by the CDR to recover b n and update Φ AV G. Fig. 3.4(c) depicts how the DCS selects c [1:2] out of α [0:7] based on Φ AV G. One of the challenges of this work is to implement the proposed DFE in a lookahead architecture operating at 1 the data rate to accommodate 5Gb/s operation. 8 Details of the implementation are presented in the following two sections: Section 3.4 first describes the implementation of the DFE in a look-ahead configuration and Section 3.5 then presents the final demuxed-by-8 architecture of the entire digital block.

41 3.4 Look-ahead DFE Configuration Look-ahead DFE Configuration In this section, we describe the implementation of the proposed DFE in a look-ahead configuration. For simplicity, we use a full-rate architecture for the discussions of this section; once this architecture becomes clear, the transition to the actual implementation in the following section (which employs a demuxed-by-8 architecture) will be straightforward. Fig. 3.5 shows the feedback loop of the DFE including the feedforward CDR. The critical path goes through the blocks marked with a triangle on the figure. Note that we can ignore the change of Φ AV G in 1UI knowing that the time constant of the CDR phase recovery loop (inside the APR) is much larger than 1UI (or even 10UI) and thus exclude the APR from the critical path. For proper operation, the delay of this critical path must be smaller than 1UI. However, the delay of PD/DD (cascade of the phase detector and data decision block) alone is much larger than 1UI, hence employing a look-ahead configuration is necessary for correct operation. Clock/Data Recovery Data Decision b n S [0:2] S [0:2] Phase Detector Φ X APR Φ AVG δ [0:2] ISI Replica Generator c 2 c [1:2] c [1:2] DCS δ [1:2] b n-1 δ 0 b n-2 Figure 3.5: The critical path in the DFE feedback loop. When S [0:2], the three samples of a 1-UI observation window, are being processed, every combination of b n 2 and b n 1 values can potentially result in a different value for S [0:2] and thus a different value for each of the CDR outputs, specifically Φ X (PD output) and b n (DD output). Similar to the approach explained in Section 2.2.2, to make a speculative DFE we generate all of these possible outputs in advance. After the actual values of b n 2 and b n 1 are recovered, we pick the correct PD and DD outputs using a multiplexer (MUX). This technique removes the PD/DD from the

42 32 3 Proposed Decision-Feedback Equalizer DFE loop and reduces the critical path to a MUX and a flip-flop. PD/DD Array b n-2 b n-1 0,1 S [0:2] S [0:2] 0 S S 00, S [1:2] PD/DD (Φ X,b n ) {00} [1:2] 2 ISI 00 S 00 1 Subtractor, S [1:2] PD/DD (Φ X,b n ) {01} 01 (Φ 0 S 01, S [1:2] PD/DD (Φ X,b n ) {10} X,b n ) 10 1 S 01, S [1:2] PD/DD (Φ X,b n ) {11} 11 Unit (Φ X,b n ) [1:4] b n Φ X c [1:2] DCS Φ AVG Average Phase Recovery (a) Block diagram of the speculative DFE S [0:2] [0] [1:2] ISI Subtractor Subtractor Unit [1:2] [2] c [1:2] S [0:2] 0,1 0 S [1:2] S [1:2] -c [1:2] 0,1 S [1:2] 1 S [1:2] c [1:2] (-1) (b) The ISI subtractor PD/DD Unit Data Decision b n S 0, S [1:2] Phase Detector Φ X (Φ X,b n ) Φ AVG (c) The PD/DD unit generating (Φ X,b n ) Figure 3.6: Implementation of the full-rate speculative DFE. Fig. 3.6(a) shows the block diagram of the full-rate speculative DFE. One delayed sample from the previous UI (UI n 1 ) and two samples of the current UI (UI n )form S [0:2] at the input. The DFE coefficients for these samples are c [1:2] selected by the DCS. The correction terms for S [1:2] can either be positive or negative corresponding to b n 1 being 1 or 0 respectively; as shown in Fig. 3.6(b), the ISI Subtractor generates both of these cases, c [1:2] and c [1:2], and subtracts them from S [1:2] to respectively produce S 1 [1:2] and S 0 [1:2]. InanotherSubtractor unit, it similarly produces S 1 0 and S 0 0 (corresponding to b n 2 being 1 or 0). There are four possible combinations of b n 2 and b n 1 values, (b n 2 b n 1 ) {(00),

43 3.5 Complete Receiver Architecture 33 (01), (10), (11)}, and hence four potentially different sets of PD/DD outputs. The PD/DD array consists of four parallel PD/DD units; the input to each PD/DD unit corresponds to one combination of b n 2 and b n 1 and produces one instantaneous phase and one decision bit, (Φ X,b n ), at the output. For instance, the input to the third PD/DD unit is (S 1 0, S0 [1:2] ), which corresponds to (b n 2b n 1 ) = (10) and produces (Φ X,b n ) 10 at the output. Therefore, the PD/DD array generates all four possible sets of PD and DD outputs; once b n 1 is actually recovered, a 4:1 MUX selects the corresponding (Φ X,b n )usingb n 2 and b n 1. b n is the recovered bit for the current UI, and Φ X isfedtotheaprtoupdatethevalueofφ AV G for the next UI. 3.5 Complete Receiver Architecture This section presents the actual demuxed-by-8 architecture of the combined DFE and CDR (referred to as DFE/CDR) in the complete receiver. Fig. 3.7(a) shows the block diagram of the receiver. The received signal is sampled by four timeinterleaved 2.5GS/s 5-bit ADCs to generate 10GS/s corresponding to 2 samples per UI (Fig. 3.7(b)); a 4:16 demultiplexer (DeMUX) feeds 16 samples of an 8-UI frame, S [1:16], to the digital DFE/CDR operating at 1 the data rate, i.e., 625MHz. Inside the 8 digital DFE/CDR, these samples along with one delayed sample from the previous 8-UI frame form S [0:16]. The ISI Subtractor adds/subtracts c [1:2] to/from S [0:16] to produce the raw speculative data, S 1 [0:16] and S 0 [0:16]; notethatsinceφ AV G is constant throughout the entire 8-UI frame, c 1 and c 2 can be used respectively for all odd- and even-numbered ADC samples. Eight parallel PD/DD arrays (from Fig. 3.6(a)) process the information of the 8 UIs in the frame; each PD/DD array receives the ISI Suntractor outputs related to a 1-UI observation window (Fig. 3.7(c)) as the input and generates the 4 possible values of (Φ X,b n ) at its output (here subscript n in b n denotes the n th 8-UI frame). As described earlier, a 4:1 MUX is required for each array to select one of these possible values as the final output using the two previous recovered bits; hence, eight consecutive 4:1 MUXes are needed for the 8-UI frame. As shown in Fig. 3.7(d), once the last two bits from the previous frame (b n 1,7 and b n 1,8 ) are recovered, the decisions propagate through all multiplexers: b n 1,7 and b n 1,8 select Φ X1 and b n,1, b n 1,8 and b n,1 select Φ X2 and b n,2,etc. b n,[1:8] are the recovered bits for the current frame; Φ X[1:8] are fed to the APR to update Φ AV G for the next 8-UI frame.

44 34 3 Proposed Decision-Feedback Equalizer RX IN 5Gb/s ISI ADC/ 16 S [1:16] S [0:16] DMUX Subtractor 2 4 5GHz 2-phase Sampling Clock S [0:16] 0,1 c [1:2] DCS S [0:2] 0,1 S [2:4] 0,1 S [14:16] 0,1 DIGITAL DFE/CDR PD/DD (Φ X1,b n,1 ) [1:4] Array PD/DD (Φ X2,b n,2 ) [1:4] Array PD/DD Array (Φ X8,b n,8 ) [1:4] Φ AVG 8x 4:1 MUX b n-1,[7:8] b n,[1:8] Φ X[1:8] Average Phase Recovery b n,[7:8] RX IN 5Gb/s ADC/DMUX 5-bit 2.5GS/s ADC (a) Block diagram of the digital DFE/CDR in the receiver 4:16 S [1:16] b n-1,7 b n-1,8 16 Samples of the n th 8-UI Frame b n,1 b n,2 b n,8 4 S 13 S 14 S 15 S 0 S 1 S 2 S 3 S 4 S 14 S 15 S GHz 4-phase (b) Block diagram of ADC and DeMUX 1-UI Observation Windows (c) One observation window (3 samples) for each bit 8x 4:1 MUX b n-1,[7:8] b n-1,7 b n-1,8 (Φ X1,b n,1 ) [1:4] 4:1 (Φ X1,b n,1 ) MUX b n,1 b n,1 Φ X1 b n,[1:8] (Φ X2,b n,2 ) [1:4] 4:1 MUX (Φ X2,b n,2 ) b n,2 Φ X2 b n,2 (Φ X3,b n,3 ) [1:4] 4:1 MUX (Φ X3,b n,3 ) b n,3 Φ X3 b n,7 (Φ X8,b n,8 ) [1:4] 4:1 MUX (Φ X8,b n,8 ) b n,8 Φ X8 Φ X[1:8] (d) Selecting correct phases and data using 8 MUXes Figure 3.7: Implementation of the demuxed-by-8 DFE/CDR in the receiver.

45 3.5 Complete Receiver Architecture 35 The block diagram of Fig. 3.7(a) completely depicts the normal operation of the receiver in which 8 bits are recovered from 16 data samples. In the presence of any frequency offset (or similarly low-frequency jitter), however, the receiver occasionally has to output 9 or 7 bits to compensate for the difference in clock frequencies. As describedinsection3.1,thecycle-slip monitor as part of the CDR determines when a change in the number of output bits is required; the CDR then adds/removes one bit to/from the output accordingly [11, 10] (in the case of 9 output bits, the extra bit is the sample delayed from the previous frame). Another point that should be discussed here is the relative position of the 16 ADC samples with respect to the UI boundaries. Thus far, we have only described the case in which S [1:2] (and similarly S [3:4], S [5:6], etc.) fall in the same UI of the received data as illustrated in Fig. 3.7(c). However, since the receiver is blind, these samples may fall in two adjacent UIs as depicted in Fig. 3.8(a). In this case, each 1-UI observation window (e.g., S [0:2] ) includes two samples from the current UI and one sample from the future UI (as the UI with two samples is always the one whose information is recovered). This makes the extraction of Φ X in the PD, and consequently the detection of data in the DD, non-causal. Removing ISI in a DFE, however, is a causal operation that requires the previous recovered bit and cannot be realized with a non-causal DD as explained below. Consider the detection of b n,2 (Fig. 3.8(a)) from the corresponding observation window including S [2:4]. As described in Section 3.1, the data-decision scheme employed in the DD needs Φ X2 to recover b n,2,andφ X2 itself depends on S [2:4] [10]. Thus, to make a DFE-based decision for b n,2 we require the ISI-free values of S [2:4] (especially S 4 ); however, the correction term for S 4 itself depends on b n,2. Therefore, a DFE cannot be realized when the CDR uses a future sample to recover the current bit. (Note that this issue does not exist in [10] since the receiver does not include a DFE.) To resolve this non-causality issue, we use floating observation windows in the implementation of the DFE/CDR. The average eye-centre phase recovered by the CDR, Φ PICK, is used to determine the relative position of S [1:16] with respect to UI boundaries (Φ AV G can equivalently be used to do this). As shown in Fig. 3.8(b), when S 1 and S 2 fall in adjacent UIs, the block preceding the ISI Subtractor (Fig. 3.7(a)) shifts the observation windows to the left by 1 sample to avoid the potential noncausality problem. Some small modifications have to be made to the PD and APR to ensure that such floating observation windows do not cause errors in the recovery

46 36 3 Proposed Decision-Feedback Equalizer 16 Samples of the n th 8-UI Frame b n-1,8 b n,1 b n,2 b n,8 S 13 S 14 S 15 S 0 S 1 S 2 S 3 S 4 S 14 S 15 S 16 1-UI Observation Windows (a) Observation windows in [10] when S 1 & S 2 fall in adjacent UIs 16 Samples of the n th 8-UI Frame b n-1,8 b n,1 b n,2 b n,8 S 13 S 14 S -1 S 0 S 1 S 2 S 3 S 4 S 14 S 15 S 16 Shifted Observation Windows (b) Observation windows shifted to the left by one sample Figure 3.8: Using floating observation windows to avoid non-causality issues for the DFE-based receiver. of Φ AV G. The last point that should be emphasized, before going to the behavioural simulations of the receiver, is that the DFE/CDR architecture presented in this chapter thus far is not the only possible architecture. In fact, two alternative architectures were developed at the same time as the presented architecture in an attempt to reduce the hardware requirements of the receiver; however, they both sacrifice part of the receiver performance to achieve this end. The details of these two architectures are provided in Appendix A for completeness. In this implementation, however, we chose the architecture described in this chapter because of its more accurate operation.

47 3.6 Behavioural Simulations Behavioural Simulations This section describes the behavioural simulations of the receiver. We performed four different types of simulations to verify the performance of the DFE and complete receiver. The section is organized as follows. Section describes the transceiver model and channels used in the simulations. Section presents the DFE simulations. The jitter tolerance simulations of the receiver are presented in Section Transceiver Model and Channels Fig. 3.9 depicts the block diagram of the entire transceiver model. For the behavioural simulations of the proposed DFE we used an event-driven [30] model in Matlab s Simulink simulation tool [53]. The data rate is 5Gb/s and the transmitter is capable of transmitting 2 7 1(1+X 6 + X 7 )and2 31 1(1+X 28 + X 31 ) pseudo-random binary sequence (PRBS) data. The channel is modeled in the time domain using the step response; the total ISI length including precursors and postcursors is 90 UIs. The digital DFE/CDR processes 16 samples (belonging to an 8-UI frame) at a time. It recovers 7, 8, or 9 bits for each frame and feeds them to the PRBS comparator at the back-end of the receiver. The comparator verifies the recovered bits and counts the number of errors after an initial bits during which the CDR acquires lock. Depending on the type of the simulation, in each run we transmit 10 6 or 10 7 bits, in addition to the initial bits, and count the number of errors. Jitter is added to the transmitter clock where needed, and the total frequency offset between the transmitter and receiver, Δf TX RX, is divided equally between the transmitter and receiver clocks. PRBS Generator Channel Model 4x ADC 4 4:16 16 Digital 7..9 DeMUX DFE/CDR PRBS Comparator Number of Errors 5 GHz TX Clock 2 8 Sinusoidal Jitter Source Receiver 5 GHz RX Clock Figure 3.9: Block diagram of the Simulink transceiver model. We used three test channels for the simulations (and measurements) of the proposed DFE. Channel 1 consists of only a pair of 40 SMA cables and has 0.7-dB

48 38 3 Proposed Decision-Feedback Equalizer loss at the Nyquist frequency of 2.5GHz. Channel 2 and Channel 3, however, consist of FR4 traces on a backplane and two daughtercards (with two connectors) and 3 pairs of SMA cables with the total length of more than 80. The FR4 trace on each daughtercard is 5 and the trace on the backplane is 16 for Channel 2 and 24 for Channel 3 ; the attenuations at 2.5GHz are 12.1dB and 13.3dB, respectively. In simulations, we modeled these channels with their step responses that are generated using the measured s-parameters. Table 3.1 summarizes the important channel characteristics; further details about the measurements of the test channels are described in Section 4.2. Table 3.1: Summarized characteristics of the test channels. Name Description Loss at 2.5GHz (db) Channel 1 40 SMA cables 0.7 Channel FR4 trace & 80 SMA cables 12.1 Channel FR4 trace & 80 SMA cables DFE Performance We performed three different types of simulations on the receiver to examine the performance of the DFE. In the first type, eye diagrams of the unequalized data (before DFE) and equalized data (after DFE) were compared. In the second type, the simulated BER of the receiver was used to evaluate the performance of the receiver with and without DFE. We examined the sensitivity of the receiver BER in the third set of simulations. Table 3.2: Quantized DFE coefficients used in simulations. Channel α q,7 α q,6 α q,5 α q,4 α q,3 α q,2 α q,1 α q, Before presenting the simulation results, we describe the coefficients used for the DFE look-up table (LUT). As was explained in Section 3.3, the DFE employs a LUT of 8 coefficients, α [0:7], during its operation; these coefficients are calculated from the step response of the channel using the proposed scheme. The LUT is implemented with 5-bit resolution and least significant bit (LSB) of 15.6mV. Table 3.2 shows the

49 3.6 Behavioural Simulations 39 decimal value (between 0 and 31) of the 8 quantized DFE coefficients (α q,[0:7] )used in the simulations. A. Equalized Eye Diagrams Eye diagrams of the equalized data are commonly used as a performance measure of equalizers. To generate the eye diagrams, a frequency offset is applied between the transmitter and receiver clocks to ensure that the ADC scans the entire UI, i.e., samples the data at all points of the UI. Next, the receiver is simulated and a large number of ADC samples are saved before and after the DFE. Finally, the samples are rearranged according to the frequency offset to form the eye diagrams. Fig depicts the simulated eye diagrams of the received data before and after the DFE in the presence of each of the three test channels. In these simulations, the total frequency offset is 100ppm (0.5MHz), and the transmitted data is 2 7 1PRBS; ADC samples are used to generate the eye diagrams. With Channel 1, the eye of the incoming data is completely open, and the DFE does not change the vertical or horizontal eye opening. With Channel 2, however, the maximum vertical eye opening of the unequalized data is 80mV; the DFE increases this opening to 230mV. The eye of the unequalized data is completely closed in the presence of Channel 3 ; the DFE opens the eye by 200mV. Fig depicts the results of the above simulation repeated with a PRBS (first 31 bits: ). This sequence causes a larger data-dependent jitter (DDJ) in the incoming data due to its larger length of maximum consecutive identical digits (CID) and variety of adjacent patterns [54]. As a results, the eye of the unequalized data is closed with both Channel 2 and Channel 3 ; the DFE opens the eye respectively by 200mV and 130mV.

40 3 Proposed Decision-Feedback Equalizer 0.5 Eye Diagram before DFE Channel 1 0.5 Eye Diagram after DFE Channel 1 0.4 0.4 0.3 0.3 Sample Amplitude, V 0.2 0.1 0.0-0.1-0.

8 2 Sample Time, UI Eye Diagram before DFE Channel 2 80mV -0.5 0.5 0.4 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.

8 1.0 1.2 1.4 1.6 1.8 2 Sample Time, UI 0.5 0.4 Eye Diagram before DFE Channel 3-0.4 0.70UI -0.5 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2 Sample Time, UI 0.5 0.4 Eye Diagram after DFE Channel 3 200mV Sample Amplitude, V 0.

50 40 3 Proposed Decision-Feedback Equalizer 0.5 Eye Diagram before DFE Channel Eye Diagram after DFE Channel Sample Amplitude, V Sample Amplitude, V Sample Time, UI Eye Diagram before DFE Channel 2 80mV Sample Time, UI Eye Diagram after DFE Channel 2 230mV Sample Amplitude, V Sample Amplitude, V UI Sample Time, UI Eye Diagram before DFE Channel UI Sample Time, UI Eye Diagram after DFE Channel 3 200mV Sample Amplitude, V Sample Amplitude, V Sample Time, UI UI Sample Time, UI Figure 3.10: Simulated eye diagrams before and after DFE: PRBS & 100ppm Δf TX RX.

3.6 Behavioural Simulations 41 0.5 Eye Diagram before DFE Channel 1 0.5 Eye Diagram after DFE Channel 1 0.4 0.4 Sample Amplitude, V Sample Amplitude, V 0.3 0.2 0.1 0.0-0.1-0.2-0.

3 0.2 0.1 0.0-0.1-0.2-0.3-0.4-0.5 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2 Sample Time, UI 0.5 0.4 0.3 0.2 0.1 0.0-0.1-0.2-0.3 Eye Diagram after DFE Channel 2 200mV Sample Amplitude, V -0.

4 0.62UI -0.5 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2 Sample Time, UI Eye Diagram after DFE Channel 3 0.5 130mV 0.4 0.3 0.2 0.1 0.0-0.1-0.2-0.3-0.4 0.56UI -0.5 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2 Sample Time, UI Figure 3.

51 3.6 Behavioural Simulations Eye Diagram before DFE Channel Eye Diagram after DFE Channel Sample Amplitude, V Sample Amplitude, V Sample Time, UI Eye Diagram before DFE Channel 2 Sample Amplitude, V Sample Amplitude, V Sample Time, UI Eye Diagram after DFE Channel 2 200mV Sample Amplitude, V Sample Time, UI Eye Diagram before DFE Channel Sample Time, UI Sample Amplitude, V UI Sample Time, UI Eye Diagram after DFE Channel mV UI Sample Time, UI Figure 3.11: Simulated eye diagrams before and after DFE: PRBS & 100ppm Δf TX RX.

52 42 3 Proposed Decision-Feedback Equalizer B. Receiver Bit Error Rate The fact that an equalizer increases the eye opening of the data does not necessarily demonstrate an improved performance for the receiver. The final performance measure for a receiver indicating the accuracy of a data transmission is the receiver BER. Therefore, we would like to verify with simulations that the BER of the receiver is lower than a specified target. This target BER is 10 7 in our simulations; verifying lower BER values requires unreasonably larger simulation times. To measure the BER, we simulate the transceiver for 10 7 UIs, count the number of errors, and calculate the BER by dividing the number of errors by the total number of transmitted bits, i.e., If there are no errors during the transmission, the BER is assumed to be 10 7 or better. In BER simulations, the obtained BER results may vary depending on the phase of the sampling clock, i.e., depending on where within the UI the samples are taken. In a practical blind receiver, there is no control over the relative phase of the data and the receiver clock, thus, to prove capable of error-free operation, the receiver must achieve the target BER for any sampling clock phase. To verify this, one can apply a small frequency offset between the transmitter and receiver clocks in BER simulations to ensure that the sampling clock phase sweeps the entire width of the UI during the operation of the receiver. An alternative approach is to apply lowfrequency sinusoidal jitter (SJ) to the transmitter clock in BER simulations; jitter amplitude of 1UI PP (or larger) ensures all values of the sampling clock phase. In our BER simulations, we employed the latter since in this method the phase of the sampling clock travels in both directions along the width of the UI. BER simulations were performed at the rate of 5Gb/s for both 2 7 1and PRBS data. The amplitude of added jitter is 1UI PP and its frequency is 100kHz. The simulation results are as follows. With PRBS data and the DFE disabled, the receiver achieves the BER of 10 7 (or better) in the presence of Channel 1 or Channel 2. With Channel 3, however, the BER increases above 10 2 ; enabling the DFE in this case lowers the BER below 10 7.Using PRBS data and with the DFE disabled, only Channel 1 results in the BER of 10 7 (or better) for the receiver. With Channel 2 and Channel 3, the BER increases above 10 3 and 10 2, respectively. Enabling the DFE, however, restores the BER back to 10 7 (or better) in both cases. Table 3.3 summarizes the results of the BER simulations of the receiver for all three test channels with and without DFE.

53 3.6 Behavioural Simulations 43 Table 3.3: Simulated BER: 10 7 bits & 1UI PP SJ at 100kHz. Channel 2 7 1PRBS PRBS DFE Disabled DFE Enabled DFE Disabled DFE Enabled 1 < 10 7 < 10 7 < 10 7 < < 10 7 < < < < 10 7 For completeness, we also performed the BER simulations when a frequency offset is applied between the transmitter and receiver clocks (without any SJ) to discover how much frequency offset the receiver can tolerate. The results showed that the receiver is capable of recovering 2 7 1and2 31 1PRBSdatawiththeBERof10 7 (or better) in the presence of up to 2000ppm (10MHz) of frequency offset with any of the three test channels. C. Bit Error Rate Sensitivity To determine the sensitivity of the receiver BER to the coefficients used for the DFE LUT, we scaled these coefficients by a factor of K and repeated our BER simulations. We changed K from 0 to 2, and in each case we multiplied the DFE coefficients by K before quantizing them to 5 bits. Fig depicts the results of the BER simulations for Channel 2 and Channel 3. NotethatK = 0andK = 1 respectively correspond to the receiver without DFE and the receiver with the original DFE coefficients. The markers with white fills indicate the range of K for which the simulated BER was lower than 10 7, i.e., no error was found in 10 7 recovered bits. With PRBS, this happens for a wide range of K values. However, with PRBS, this range is limited to 50% and 10%; simulated BER is better than 10 7 for 0.8 K 1.3 in the presence of Channel 2 and 1.0 K 1.1 in the presence of Channel Receiver Performance Jitter tolerance is a key performance metric for a receiver. It shows how much input jitter, with a specific jitter frequency, the receiver can tolerate without increasing the BER beyond a target value. Finding the jitter tolerance of the receiver for a specific jitter frequency involves a number of BER simulations with different jitter amplitudes. To start, the receiver is simulated for an initial jitter amplitude, i.e.,

54 44 3 Proposed Decision-Feedback Equalizer -1 Log10(BER) PRBS BER = Displayed Value BER < Displayed Value PRBS BER = Displayed Value BER < Displayed Value K (a) BER sensitivity with Channel Log10(BER) PRBS BER = Displayed Value BER < Displayed Value PRBS BER = Displayed Value BER < Displayed Value K (b) BER sensitivity with Channel 3. Figure 3.12: Simulated sensitivity of BER to the scaling of DFE coefficients.

55 3.6 Behavioural Simulations 45 an initial guess of the jitter tolerance. If the simulated BER is lower than the target value, jitter amplitude is increased for the next simulation; otherwise, the amplitude is reduced. After a few iterations, the simulations stop when the difference between the amplitude at which BER is lower than the target (A pass ) and the one at which BER is higher than the target (A fail ) is smaller than a specified value; A pass is reported as the jitter tolerance of the receiver for that specific jitter frequency. The same procedure is repeated for the rest of the list of desired jitter frequency points. In our jitter tolerance simulations, the target BER is For each specific jitter amplitude, we simulate the receiver for 10 6 UIs; if the number of errors in the recovered bits is zero, we consider the receiver capable of tolerating that jitter amplitude. However, if an error occurs during the operation of the receiver, the simulation aborts, and jitter amplitude is reduced for the next step. A script iterates through the list of jitter frequencies and runs multiple simulations at each frequency until the difference between A pass and A fail is less than 5%. We applied 100ppm frequency offset between the transmitter and receiver clocks to ensure that the obtained jitter tolerance, especially at high frequency, is independent of the receiver sampling clock phase. Fig. 3.13(a) depicts the simulated jitter tolerance of the receiver in the presence of each of the three test channels with and without DFE when 2 7 1PRBSisused. With Channel 1 the receiver can tolerate up to 0.54UI PP of SJ at high frequency when the DFE is disabled. With Channel 2, high-frequency jitter tolerance falls to less than 0.06UI PP,andwithChannel 3, the receiver fails to achieve the BER of 10 6 even when no SJ is applied, i.e., jitter tolerance is zero (or undefined) hence not shown in the figure. When the DFE is enabled, however, the receiver achieves the high-frequency jitter tolerance of 0.32UI PP and 0.28UI PP, repectively in the presence of Channel 2 and Channel 3. Simulated jitter tolerance of the receiver for PRBSdataisdepictedin Fig. 3.13(b). For Channel 2 and Channel 3, jitter tolerance is zero when the DFE is disbaled. Enabling the DFE improves the receiver performance considerably; high-frequency jitter tolerance is more than 0.17UI PP for Channel 2 and more than 0.13UI PP for Channel 3.

56 46 3 Proposed Decision-Feedback Equalizer 10 3 Jitter Tolerance, UIPP Channel 1, DFE Disabled Channel 2, DFE Disabled Channel 2, DFE Enabled Channel 3, DFE Enabled Data Type = PRBS Target BER = 10-6 f TX-RX = 100ppm Frequency, Hz (a) With PRBS & 100ppm Δf TX RX Jitter Tolerance, UIPP Channel 1, DFE Disabled Channel 2, DFE Enabled Channel 3, DFE Enabled Data Type = PRBS Target BER = 10-6 f TX-RX = 100ppm Frequency, Hz (b) With PRBS & 100ppm Δf TX RX. Figure 3.13: Simulated receiver jitter tolerance.

57 3.7 Verilog Implementation Verilog Implementation This section briefly describes the implementation of the digital block in Verilog and the procedure followed to verify the register transfer level (RTL) description. Fig shows the simplified block diagram of the entire digital logic implemented in Verilog. The CDRT module is an exact one-to-one RTL description of the digital DFE/CDR block in the Simulink model. The TSTREG module is the input test register used to write data into the programmable registers of the receiver, e.g., the DFE coefficients in the DFE LUT, α [0:7]. The DATVER module is a PRBS comparator which receives the 7, 8, or 9 bits recovered by the CDRT and counts the number of errors (includes a FIFO to sort the data). The rest of the test structures includes a parallel monitor and a serial monitor. The parallel monitor, the CDRMON module, provides observability on various signals in the digital logic (through 8 output pins on the test-chip), such as the DFE/CDR input (i.e., the demuxed ADC samples), recovered average phase (Φ AV G ), output of the DATVER error counter, etc. The serial monitor, the SERMON module, also enables observation of many internal signals for debugging purposes; in this case, the signals are shifted out serially. An important job of this monitor is to provide a snapshot of the important CDRT signals in real operation; this information is very useful for verifying the operation of the CDRT internal modules bit DeMUXed ADC Samples CDRT Digital DFE/CDR Recovered Bits 7..9 DATVER PRBS Comparator 6 Error Count Programming Input Comparator Clock Main Clock (625MHz) Serial Shift Clock TSTREG Test Register α[0:7] Testing Signals CDRMON Signal Select CDRMON Parallel Monitor SERMON Serial Monitor 8 Monitor Output 1 Monitor Output Figure 3.14: Simplified block diagram of the digital logic. To verify the RTL description of the DFE/CDR, we cross-checked the CDRT module with the digital DFE/CDR model in Simulink. All signals in the Simulink model are represented with discrete values using limited number of bits identical to those later used in the Verilog implementation (note that all the simulation results pre-

58 48 3 Proposed Decision-Feedback Equalizer sented in Section 3.6 were obtained using the final cross-checked model). First, the model was simulated for 10 6 UIs, and the demuxed ADC samples (i.e., the inputs to the DFE/CDR) as well as all the internal DFE/CDR signals and recovered bits (i.e., the outputs) were saved. The stored ADC samples were then applied as test input vectors to the CDRT module in a Verilog test-bench; the resulting internal signals and recovered bits were compared with the corresponding Simulink outputs to ensure that the RTL description of the digital DFE/CDR exactly matches the Simulink model. The test structures and the top-level Verilog model, however, were only tested in Verilog test-benches due to the lack of matching Simulink models and shortage of time. The complete receiver, including the ADC, DeMUX, and synthesized DFE/CDR, was fabricated in Fujitsu s 65-nm CMOS technology; Chapter 4 describes the measurements performed on the test-chip. 3.8 Summary This chapter described the proposed DFE scheme and presented the design of a speculative DFE for a 2 blind ADC-based receiver. It showed how the DFE is combined with the CDR to form the complete digital back-end. The look-ahead demuxed-by- 8 structure of the digital DFE/CDR was then described, and the complete receiver architecture was presented. The behavioural simulations of the receiver in Simulink were described. Three test channels were used to verify the performance of the DFE. In the presence of the channel with the highest attenuation (i.e., 13.3dB) and using PRBS data, the BER of the receiver without the DFE is worse than 10 7 ; when the DFE is enabled, however, the receiver achieves the BER of 10 7 (or better) and tolerates 0.28UI PP of high-frequency sinusoidal jitter. The chapter then described the Verilog implementation of the digital logic. It also discussed the effect of phase detection and data detection schemes on the architecture of the receiver as well as alternative architectures which can be used to reduce the hardware requirements. (See Appendix A for the details of these architectures.)

59 4 Experimental Results This chapter presents the experimental results of the receiver fabricated in Fujitsu s 65-nm CMOS process. Fig. 4.1 depicts a simplified block diagram of the test setup. The receiver was tested at the targeted data rate of 5Gb/s using three test channels. Table 4.1 presents a summary of the channels characteristics. The maximum channel loss at the Nyquist frequency of 2.5GHz is 13.3dB. DUT PRBS Generator 4dB/7dB Boost Test Channel ADC Digital DFE/CDR PRBS Comparator Number of Errors 5GHz TX Clock 5GHz RX Clock Figure 4.1: Simplified block diagram of the test setup. Table 4.1: Summarized characteristics of the test channels. Name Description Loss at 2.5GHz (db) Channel 1 40 SMA cables 0.7 Channel FR4 trace & 80 SMA cables 12.1 Channel FR4 trace & 80 SMA cables 13.3 Section 4.1 presents the receiver layout and the measurement setup. Section 4.2 describes the measurements of the test channels. Sections 4.3, 4.4, and 4.5 present respectively the measurement results of the ADC, proposed DFE, and complete receiver. Section 4.6 compares the simulated BER and jitter tolerance to the measured values. Section 4.7 summarizes this chapter. 49

60 50 4 Experimental Results 4.1 Receiver Layout and Equipment Setup Fig. 4.2 shows a micrograph of the fabricated test-chip along with a table describing different parts of the receiver. The AFE, including cells A to D, has customized layout, whereas the digital block, which consists of cells E and F, is synthesized. The digital DFE/CDR and the entire receiver (excluding the test structures) occupy mm 2 and mm 2, respectively. All measurements of the receiver were performed with on-die probing using a probe-card. Table 4.2 describes the pins of the test chip. In particular, the differential receiver clock and differential 5Gb/s data are respectively applied to RXCLKP/RXCLKN and RXINP/RXINN. CLK/16 is a 312.5MHz clock needed for the read operation of the FIFO inside the PRBS comparator. A list of measurement equipment is provided below. Fig. 4.3 depicts the details of the measurement setup. Centellax OTB3P1A 10-Gb/s PRBS Generator Tyco Z-PACK MAX Customer Kit (backplane channel) Signal Gen. 1: Agilent E8257D PSG Analog Signal Generator (250kHz 67GHz) Signal Gen. 2: HP 83620B Synthesized Sweeper (10MHz 20GHz) Signal Gen. 3: Rohde&Schwarz SMT 03 Signal Generator (5kHz 3.0GHz) Sony/Tektronix DG2020A Data Generator Agilent Infiniium DCA-J 86100C Digital Communications Analyzer Tektronix TLA 714 Logic Analyzer HP 8565E Spectrum Analyzer (30Hz 50GHz) Agilent E3646A/E3631A Triple/Dual Output Power Supplies ( 4) Meca 665-dB-1 3dB & 6dB SMA Attenuator ( 2) Picosecond 5828A Ultra-Broadband Amplifier (10dB Gain & 14GHz BW) ( 3) Mini-Circuits ZX86-12G-S+ Bias-Tee ( 3) Narda Hybrid (2 18GHz) ( 2)

61 - 4.1 Receiver Layout and Equipment Setup 51 RXCLKP VSU RXCLKN VDU VCMC Dout[0] VSSO Dout[1] VDDO Dout[2] VSB - VDU VDP VSN RXINP VSN RXINN VDN - VSN - VSSO Dout[3] VDDO CLKOUT VSSO Dout[4] VSSO Dout[5] VDDO Dout[6] VSSO Dout[7] VDN VDD3 CLK/8EN VSS3 CLK/16EN SERCLK VDN SERLOAD ICPD VSS3 SEROUT VDD3 VSU ICRSTX VDU VSU DataEN AdrsEN Din[5] Din[4] Din[3] Din[2] Din[1] Din[0] CLK/16 VSU VDU μm Process Data Rate Supply 65-nm CMOS 5 Gb/s 1.2 V Analog Front-End (Customized Layout) Digital Block (Synthesized) A B C D E F BGR & Bias Gen. Input Buffers 4x 2.5GS/s ADCs 4:16 DeMUX Test Structures Digital DFE/CDR Total Area (Excluding Test Structures) 170x140μm 2 50x60μm 2 400x490μm 2 60x490μm mm mm mm 2 B A C D E F E 1900 μm Figure 4.2: Micrograph of the test-chip. Pin RXCLKP, RXCLKN VCMC CLK/8EN CLK/16 CLK/16EN RXINP, RXINN SEROUT SERLOAD SERCLK Dout [0:7] CLKOUT DataEN, AdrsEN Din [0:5] VDN, VSN VDD3, VSS3 VDU, VSU VDDO, VSSO ICPD ICRSTX Table 4.2: Pin description of the test-chip. Description receiver clock differential input receiver clock common-mode level Enable signal for demuxed-by-8 clock PRBS comparator clock input Enable signal for CLK/16 Data differential input Serial Monitor output Serial Monitor load/shift signal Serial Monitor clock for shifting data out Parallel Monitor output Parallel Monitor synchronous clock Test register data/address enable Test register input ADC supply DeMUX and digital block supply Frequency divider supply I/O supply Test-chip power-down signal Test-chip reset signal

62 52 4 Experimental Results 10MHzIN Signal Gen. (1) RFOUT 5GHz Single-ended 0.6V DC VCMC VDDO VDD3 VDU VDN DC Power Supplies OUT0 180 Hybrid SEIN OUT180 5GHz Differential RFIN RFIN DCIN RFOUT Bias-Ts RFOUT RXCLKP RXCLKN Probe Card 5Gb/s PRBS IN OUT Amplifiers IN OUT IN OUT Attenuators IN OUT Backplane RXINP RXINN DUT Dout[0:7] CLKOUT SEROUT SERCLK DOUT DOUT CLK/16 PRBS Generator CLKIN CLKIN 5GHz Differential OUT0 180 Hybrid OUT180 SEIN 5GHz Single-ended 625MHz Single-ended IN OUT Amplifier RFIN RFOUT Bias-T DCIN CLK/16 0.6V DC Logic Analyzer DataEN, AdrsEN, Din[0:5] ICRSTX, ICPD CLK/8EN, CLK/16EN SERCLK, SERLOAD RFOUT FMIN Signal Gen. (2) 10MHzOUT 100kHz 8MHz Sinusoidal Jitter RFOUT Signal Gen. (3) Data Generator Figure 4.3: Measurement setup. To use the full 1V PP,diff input range of the ADCs at the front-end of the receiver, the output of the Centellax PRBS generator has to be boosted by 4dB in the presence of Channel 1 and by 7dB in the presence of Channel 2 and Channel 3. As illustrated in Fig. 4.3, a combination of amplifiers and attenuators was used to gain the required boost. 4.2 Channel Measurements This section describes the measurements performed on the three test channels. As depicted in Table 4.1, Channel 1 consists of only a pair of 40 SMA cables, whereas Channel 2 and Channel 3 consist of FR4 traces (in a Tyco Customer Kit backplane channel) and 3 pairs of SMA cables with the total length of more than 80. Channel 2 includes a trace (two 5 traces on two daughtercard and a 16 trace on the backplane), while the trace for Channel 3 is Section presents the measured s-parameters and describes the extraction of the step responses of the channels. Section presents the measured eye diagrams and jitter characteristics of the transmitted data at the input and output of the channels.

63 4.2 Channel Measurements Measured S-parameters and Step Responses The s-parameters of the three test channels were measured using a vector network analyzer (VNA). Spectre was employed to generate the step responses of the channels using the measured s-parameters; the measured rise and fall times of the PRBS data at the input of the channels (presented in the following section) were also included in Spectre simulations to obtain more realistic results. The step responses were exported to Simulink for receiver simulations and calculation of DFE coefficients. Measured S 21 of the channels is presented in Fig Attenuations at the Nyquist frequency of 2.5GHz are 0.7dB, 12.1dB, and 13.3dB respectively for Channel 1, Channel 2, and Channel 3. 20Log(S21), db Log(S 21 ), db Channel 1-0.7dB 20Log(S21), 20Log(S 21 ), db db Channel Frequency, Hz Hz -12.1dB 20Log(S21), 20Log(S 21 ), db db Channel dB Frequency, Hz Hz Frequency, Hz Hz Figure 4.4: Measured S 21 of the test channels.

64 54 4 Experimental Results Measured Eye Diagrams and Jitter Characteristics We placed each of the test channels in the setup and measured the eye diagrams and jitter characteristics of the transmitted data at the input and output of the channel (note that the output of the channel is the probe-card input). All measurement were performed using PRBS data at the targeted data rate of 5Gb/s. The detailed measurement results are presented in the next few pages separately for each channel; here, we present an overview of all the results in a table. As can be seen in Tabel 4.3, Channel 1 has little effect on the data, hence there is no need for equalization; this could also be observed in the simulation results of Section 3.6. However, Channel 2 and Channel 3 considerably decrease the vertical and horizontal eye opening of the data; at the output of Channel 3, the vertical eye opening is less than 100mV pp,diff and total jitter (TJ) is 0.77UI. Table 4.3: Summary of eye opening and jitter values for the test channels. Channel Input/Output Eye Opening Vertical (mv) Horizontal (UI) Input/Output TJ (UI) 1 858/ / / / / / /< /< /0.772

65 4.2 Channel Measurements 55 Fig. 4.5 shows the data path in the presence of Channel 1. The output of the PRBS generator is boosted by 4dB, passed through the channel, and fed to the input of the probe-card. Fig. 4.6 presents the measured eye diagrams and jitter characteristics of the data. Channel 1 5GHz PRBS Generator 4dB Boost Scope 40" SMA cables DUT Scope Figure 4.5: Data path with Channel mV PP,diff 1060mV PP,diff 776mV PP,diff 1006mV PP,diff (a) Measured eye at Channel 1 input. (b) Measured eye at Channel 1 output. TJ = 17.4ps = 0.087UI TJ = 18.1ps = 0.091UI (c) Measured jitter at Channel 1 input. (d) Measured jitter at Channel 1 output. Figure 4.6: Measured eye diagrams and jitter characteristics with Channel 1.

66 56 4 Experimental Results Fig. 4.7 shows the data path with Channel 2. The output of the PRBS generator is boosted by 7dB to use the entire input range of the ADC, passed through the channel, and fed to the input of the probe-card. Fig. 4.8 depicts the measured eye diagrams and jitter characteristics of the data at the input and output of the channel. (Note that the data at the channel input is first attenuated by 6dB and then connected to the scope; thus, each vertical division in Fig. 4.8(a) represents 200mV.) Channel 2 5GHz PRBS Generator Scope 7dB Boost -6dB SMA cables 5" 16" SMA cables DUT Scope Figure 4.7: Data path with Channel mV PP,diff 1513mV PP,diff 240mV PP,diff 100ps (0.5UI) (a) Measured eye at Channel 2 input. (b) Measured eye at Channel 2 output. TJ = 16.0ps = 0.080UI TJ = 120.8ps = 0.604UI (c) Measured jitter at Channel 2 input. (d) Measured jitter at Channel 2 output. Figure 4.8: Measured eye diagrams and jitter characteristics with Channel 2.

67 4.2 Channel Measurements 57 Fig. 4.9 shows the data path in the presence of Channel 3. The output of the PRBS generator is passed through the channel after 7dB boost and fed to the input of the probe-card. The measured eye diagrams and jitter characteristics of the data at the input and output of the channel are presented in Fig (Note that, similar to Fig. 4.8(a), a 6dB attenuator is used for the signal shown in Fig. 4.10(a).) Channel 3 5GHz PRBS Generator Scope 7dB Boost -6dB SMA cables 5" 24" SMA cables DUT Scope Figure 4.9: Data path with Channel mV PP,diff 1513mV PP,diff <100mV PP,diff <60ps (0.3UI) (a) Measured eye at Channel 3 input. (b) Measured eye at Channel 3 output. TJ = 16.0ps = 0.080UI TJ = 154.3ps = 0.772UI (c) Measured jitter at Channel 3 input. (d) Measured jitter at Channel 3 output. Figure 4.10: Measured eye diagrams and jitter characteristics with Channel 3.

68 58 4 Experimental Results 4.3 ADC Performance To verify the functionality of the ADC, we generated the eye diagram of the data using the measured ADC samples at the output of the DeMUX (ADC eye diagram) and compared that with the data eye diagram measured at the probe-card input. Fig illustrates the setup. With the PRBS data being transmitted at 5Gb/s, we set the frequency of the receiver sampling clock at 5.001GHz and saved demuxed ADC samples at one DeMUX output using a logic analyzer. The measured samples were then exported to Matlab and rearranged according to the 1MHz frequency offset. Fig depicts the resulting eye diagrams and the eye diagrams measured at the probe-card input for the three test channels, verifying ADC operation. Note that the eye of the data received by the digital block is only open for 3 LSBs (or 94mV) in the presence of Channel 2 and is completely closed in the presence of Channel 3. Scope DUT PRBS Generator 4dB/7dB Boost Test Channel ADC Digital DFE/CDR PRBS Comparator 5.000GHz TX Clock 5.001GHz RX Clock Logic Analyzer Matlab Figure 4.11: Testing ADC functionality. The above test verifies the functionality of only one of the four interleaved ADCs since the data samples collected from one DeMUX output are all generated by the same ADC. Therefore, we repeated the test for three other DeMUX outputs belonging to the other ADCs; the obtained ADC eye diagrams were all similar to those shown in Fig

4.3 ADC Performance 59 Probe Card Input Eye Diagram ADC Output Eye Diagram 0 1 2 Sample Time, UI 30 20 10 0

Probe Card Input Eye Diagram ADC Output Eye Diagram 0 1 2 Sample Time, UI 30 20 10 0 5-bit ADC Output (b)

69 4.3 ADC Performance 59 Probe Card Input Eye Diagram ADC Output Eye Diagram Sample Time, UI bit ADC Output (a) Channel 1. Probe Card Input Eye Diagram ADC Output Eye Diagram Sample Time, UI bit ADC Output (b) Channel 2. Probe Card Input Eye Diagram ADC Output Eye Diagram Sample Time, UI bit ADC Output (c) Channel 3. Figure 4.12: Eye diagrams measured at the probe-card input and ADC output.

A 5-Gb/s 156-mW Transceiver with FFE/Analog Equalizer in 90-nm CMOS Technology Wang Xinghua a, Wang Zhengchen b, Gui Xiaoyan c,

4th International Conference on Computer, Mechatronics, Control and Electronic Engineering (ICCMCEE 2015) A 5-Gb/s 156-mW Transceiver with FFE/Analog Equalizer in 90-nm CMOS Technology Wang Xinghua a,