High Speed I/O 2-PAM Receiver Design. EE215E Project. Signaling and Synchronization. Submitted By

High Speed I/O 2-PAM Receiver Design EE215E Project Signaling and Synchronization Submitted By Amrutha Iyer Kalpana Manickavasagam Pritika Dandriyal Joseph P Mathew

Problem Statement To Design a high speed adaptive 2-PAM receiver with the goal of maximizing data rate at lowest possible power. The Design should be able to retain a BER performance of 1e-12 over two FR 4 channels that has a loss of approximately 26 db and 36 db at 5 GHz respectively. The Design should be able to maintain a speed greater than 75% of nominal, across PVT. Problem Analysis The receiver will have a top level topology as in fig 0 Fig 0 Top level diagram The Design of the receiver is closely tied to the nature of the channel we are trying to work on. Simulation of the FR4 channel in time and frequency domain gave following results fig 1 - Test Setup for channel

fig 2 - Step response of channel fig 3 - Impulse response of channel fig 4 Freq Response of channel for Scale =1

fig 5 - Frequency response of channel for scale = 0.85 The time domain response shows that channel has a delay of roughly 5.5ns and is dependent on the channel. Also Channel 2 with scale 0.85 has almost 10dB extra loss at 5G Hz. It is clearly seen that the channels have a 20dB roll off initially and a then a abrupt drop in the vicinity of 5Ghz giving the intuitive idea of canceling the first pole with a zero so as to widen the bandwidth. A modest target of 10 Gbps data rate was drawn from the second lossy channels containment bandwidth. Since both the IIR zero and feed forward equalizer tends to amplify noise, minimal number of feed forward taps is chosen so that it doesn't complicate the Decision feed back equalizer (DFE) design or cause the need for large number of DFE taps. As a first step to it the Zero of the preamp was placed at the MMSE point by iterating its position until the residual ISI is minimized. The Response of the preamp with zero can be written as and the corresponding impulse invariant model is The zero location specified as T1 is the adaptive element and was used to run MMSE. T2 and T3 were chosen as maximum values that don't degrade ISI performance significantly. T1 and T2 were observed to be T1/4 and T1/6 respectively.

One key observation which was made at this point was that due to the nature of the channel, we have a few 'precursor' (strong impulses in impulse respond before the strongest impulse component which causes ISI from 'future bits') points in the impulse response which cannot be killed by DFE and has a tendency to increase in magnitude and count, if we aggressively place the zero which the MMSE solution doesn't take care of. So, the zero should be placed just over the pole or the peaking in the frequency response will add more precursors that are difficult to deal with. The impulse response of the system after the preamp then had a strong precursor element and a strong post cursor element. Since the precursor element cannot be removed by DFE it has to be minimized by the feed forward equalizer (FFE). Also since the comparator was observed to have a technology limited delay of roughly 100ps, DFE for first post cursor element will need loop unrolling which would lead to double penalty in power as everything needs to be doubled from the point where DFE is added. So the first precursor has to be removed by FFE. The matrix taps are given by solution of following equation. where the slicer input is model in discrete sampled form with sampling rate Tb as The DFE taps were chosen as the remaining impulse elements that contributes significantly to ISI and it was observed that only 3 DFE taps were sufficient to get a good enough eye opening. Adaptation is key to maintaining data rate across varying channels and across PVT. Block sign-sign LMS is used to achieve MMSE adaption without considering noise i.e for maximal eye opening. This was chosen because of the ease of implementation as this requires just a bunch of extra threshold detectors to measure eye opening and since it converges fast. This worked fine for both FFE and DFE taps but for IIR Zero adaptation a bottom window and top window were needed to prevent it from attempting to make the channel flat by peaking. Another approach could have been to add weighted penalty to the overshoot, compared to the undershoot. The FFE equalizer was implemented in the sampled domain as RC delay line based method was not giving good results. A 4X demuxed approach using 4 FFE(s) and 4 comparators running on 4 phases of a 2.5Ghz clock was needed to achieve the required throughput leading to following detailed block diagram fig 6. For a target BER of Q(Height of Eye/ rms Noise) = 1e-12 giving (Height of Eye / rms Noise) = 7. The noise contributors were the preamp, the FFE amps and the comparator as well as the sampling switches.these can be estimated and placed in the model. And by computing BER vs Input Swing, we can estimate the BER curve and extrapolate for the input swing of.4v.

fig 6 Final architecture Circuits and Simulations A. Preamp and Degenerated Zero Zero forcing linear equalizer is implemented as a preamp and frequency dependent source degeneration. Zero location can be controlled by adjusting the impedance of degeneration resistance as per LMS. Zero was placed as per the calculated T1. T2 and T3 are placed at T1/4 and T1/6 respectively so that the noise bandwidth is limited and so that the performance is limited by the channel and not by the preamp. The total current burned in this block is 6mA and the noise contribution is less than 240uV RMS output referred in nominal corner. DC Gain is 0.7. Simulation was done using the load cap estimate from FFE. The source degenerated preamplifier adds a zero. This flattens the channel to some extent. However, it increases the noise linearly with the extra bandwidth got. We are adapting the location of the zero by varying the source resistance. fig 7 - Preamp Schematic

fig 8 - Preamp Frequency Response The DC gain of the circuit is 1.5 with an output common mode of 900 mv. The zero was placed at 1GHz, which was the optimal zero location got from the matlab code. The simulations were done for a load cap of 200 ff. However, the effective loading is actually less and so the 20 db rise will actually be better than the one quoted. B. Feed Forward Equalizer A 2 tap feed-forward equalizer was implemented for a data rate of 10Gbps, 4 clock phases are required per arm, with a spacing between the edges to be 100 ps. The circuit has the Track and Hold setup, feed forward equalizers, DFE Taps and a Comparator. Although 4 such blocks will be needed to work in a time interleaved manner, the power consumption is low due to low bandwidth requirement. A PMOS sampling switch is used since the output common mode voltage of the preamplifier is 900 mv. During the negative phases of the clocks 1/2/3, the input is tracked, and held during the positive phases. Three differential amplifiers with an approximate ratio of tail currents of a1 : a2 : a3 is used to realize the feed-forward coefficients. The currents are in the order of 100uA, and so the sizes of the input transistors are small, giving a parasitic capacitance of only 3fF. The negative coefficients are realized by interchanging the output connections at the summing node. The comparator then can sample during the positive phase of clock 4 and make a decision.

fig 9 - Track and Hold fig 10 - FFE Amplifier Fig 11 - Frequency Response Of FFE Amplifier

The Circuit was tested for Equalization Operation by giving a long string of Zeros followed by a one and then again a string of zeros. This is where the effect of adjacent bits are maximum. The fig 12a,b shows the acquired waveforms. The following nodes are plotted to emphasize the equalization action. Vin Vchannel Vinsample Vclk1 Vnodelay Vclk2 Delay1 Vclk3 Vnodelay Vclk4 Vout : the input to the channel : the output from the channel : the output of the source degenerated preamplifier : the 1st sampling clock for the 1st sample and hold : the output of the 1st sample and hold : the 2nd sampling clock for the 2nd sample and hold : the output of the 2nd sample and hold : the 3rd sampling clock for the 3rd sample and hold : the output of the 3rd sample and hold : the 4th sampling clock. Comparator can sample in the positive phase. : the final output from the feed-forward equalizer Note the following in the 2 graphs below. a. The channel output rises only till -81.68mV b. The output from the preamplifier rises up to -56.83 mv. This shows that the zero equalizes the channel to some extent c. The data is sampled in the positive phase of clock 4 and the FFE output is 7.355 mv, clearly indicating the extent to which the feed forward equalizer along with the source degenerated preamplifier equalizes the channel are able to open the eye. fig 12a Waveforms for FFE -1

fig 12b Waveform for FFE -2 C. Decision Feed Back Equalizer The decision feedback equalization is done by summing currents proportional to previous bits causing ISI on to the output node of FFE amplifier. This will add parasitic capacitances to to the output of feed-forward equalizer. The simulations were run with a worst case load of 30fF at the output node, in order to account for the capacitors from the 3 tap decision feedback equalizer and comparator. This ensures the cancellation of h[2], h[4] and h[6] ISI components of the channel response. D. Comparator A Strong arm fig 13 based implementation was chosen for the sliced comparator. The aim is to maximize the speed so that DFE loop can operate reliably. The Impulse Sensitivity Function (ISF) was computed and is plotted as in fig 14. Since ISF act as an Low pass filter, it also causes some amount of ISI. So attempt was made to minimize its width. Current design achieves a Tw =13ps. The memory in the comparator due to improper reset is as good as ISI from one past bit and therefore it has to be minimized. As the node is around mid supply it is reset to supply. However, this increases the time to latch from the clock edge by a small amount, due to the extra time needed to bring the latch to common mode. This can be improved by pulling that node down faster by increasing tail current source. This has two positive effects a.) Increase in gain as seen by area under ISF The input pair is in saturation for more time with more current b.) Better input referred noise of the comparator.

Fig 13 - Comparator Schematic The final transient performance is plotted in fig 15. Input referred noise should be similar since, at trip point of comparator, the gain of the stage is roughly Gm1/Gm2. The input referred noise corresponding to peak point of ISF is around 1mV RMS and worst case is 2mV. This is the noise of interest as this corresponds to the ideal sampling point fig 14 Comparator ISF in nominal Corner

fig 15 Comparator in Slow Slow Corner E. Adaptation fig 16 - Block diagram for adaptation

The top-level block diagram fig 16 for adapting the filter coefficients is shown. The zero location is adapted by changing the value of the source resistance of the preamplifier. The filter coefficients are adapted by controlling the tail current sources. From matlab, when adapting the coefficients from 1 channel to the other, the coefficients change only by 10%. The circuit works fine for this range of values of gm and R. D. Model and LMS The System was modeled as a chain shown in fig 1. Channel was modeled as a linear system using the impulse invariance method with a sampling period of 1ps. The preamp was modeled as per 2 polesingle zero impulse response. Provisions for adding noise/offset are in the model as well. Sign Sign LMS is a convenient way to adapt the coefficients of both DFE and FFE. The Error estimate can be based on BER or Eye opening. Since BER based adaption takes a long time to converge once the eye is opened, a good scheme would be to use window based method to open the eye to obtain quick convergence and the to use BER based to fine tune the coefficient. However, as of now only window based method is implemented. Code is partly in matlab and partly in C, called from matlab. The latter was needed because of the iterative feedback loops for adaptive DFE, which if run in matlab, will require enormous amount of time. To put in a scale, we could only run 40000 bits simulation. This was barely enough to adapt one coefficient in 10 minutes with only matlab, while we can run nearly 5,00,000 bits needed to adapt all coefficients in under 2 minutes with the new scheme. This will also enable us to do BER prediction more accurately once implemented Eye Diagrams For Channel scale 1 computed using the model for 10Gbps operation is shown in fig 17ab and Eye Diagram for Channel Scale 0.85 is as in fig 18. The initial estimate for second experiment with channel 2 was that of channel 1 and adaptation-algorithm-calibrated-coefficients to open eye. Adaptation of a few coefficients is shown in fig 19-20 fig 17a Eye Diagram for channel Response with equalization

fig 17b Eye Opening For Channel With Scale 1 fig 18 Eye Opening for Channel scale =0.85

fig 19 Adaptation of Coefficient T1 With Channel Change fig 20 Adaptation of Coefficient a1 With Channel Change

E. Circuit Performance Simulation across Corners at R=0.75Rmax fig 21 the input pulse fig 22 Vout in the nominal corner fig 23 Vout for the slow-slow corner An isolated zero amidst a series ones is used to test the equalization action and it can be clearly seen that th eye is open, proving that the circuit can operate in the slow corner.

F. Noise Summary Source Output referred Noise Power (W) Preamp 5.78E-008 FFE Amplifier 1.96E-006 KT/C - 6 sampling caps 5.00E-006 Comparator { Peak of PSS/PNOISE} 4.00E-006 G. Gain Summary Component Gain Preamplifier 0.7 FFE Amplifier 1.4 H. Final Performance Power Speed Figure of Merit Vdd Technology Worst case Eye opening Scale 0.85 Worst case Eye opening Scale 1.0 Target BER 8.7mW 10Gbps 0.87 mw/gb/s 1.2 90nm 70mV 128mV 1.00E-012 The power estimates for the individual components are enumerated below: Preamplifier 7.2 mw Feed Forward 1.2 mw DFE 0.3 mw I.Conclusion A High Speed receiver targeting FR4 channel was designed. A Peak Data rate of 10Gbps was achieved at a power of 8.7mW that corresponds to a figure of merit of.87mw/gbps. The most challenging part of the design involved the design of the linear feed forward equalizer and the implementation of LMS based adaptation routine. The pending activities include the estimation of actual BER performance across input swings to ensure that we are indeed achieving 1e-12 BER performance. Also we would like to improve the model of FFE equalizer by putting ISF function of Track and hold PMOS switch.