A CMOS Multi-Gb/s 4-PAM Serial Link Transceiver* March 11, 1999 Ramin Farjad-Rad Center for Integrated Systems Stanford University Stanford, CA 94305 *Funding from LSI Logic, SUN Microsystems, and Powell foundation jeihgfdcbabakl
Goals R TERM R TERM Timing Recovery Tx Copper cable Rx Networking high-speed (5-10Gbps) systems for ranges up to 10 meters at lower cost and complexity Parallel buses are costly for long distances. Optical fibers are not beneficial for such small ranges. Serial links on copper cables are an attractive solution for this kind of application. Push bandwidth limitations of CMOS serial links CMOS technology is getting cheaper, faster, and more available. Integrate more digital functions on-chip.
Outline Challenges System Architecture Circuit Implementation Test Results Conclusion
Challenges: Interconnection Bandwidth Frequency-dependent attenuation in electrical links due to skin effect resistance and dielectric loss. The -3dB BW of 10-meter PE-142 coax is ~1.0GHz. 1 Frequency Response 0.9 Amplitude(V) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 10 8 10 9 10 10 Frequency(Hz)
Challenges: Interconnection Bandwidth Frequency-dependent attenuation causes ISI. Only channel eigen-waveforms result in no ISI. Generation and detection of true eigen-waveforms is not feasible due to circuit limitations at high frequencies. Trapezoidal pulses are instead used as basis waveforms. Higher symbol rate results in more ISI. 1 Amplitude(V) 0.8 0.6 0.4 Transmitted Ideal pulse Pulse Cable pulse Pulse Response response 0.2 0 ISI ISI 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Time(ns)
Challenges: Data Generation/Detection Hard to operate CMOS circuits at directly multi-ghz speeds Better to reduce the on-chip frequency Recovering embedded timing information from the serial data Data stream Large frequency variations of on-chip oscillators and small frequency capture range of phase detectors Data detection at high speeds Input voltage offset, cross-talk, and signal reflection (reflection ISI) limits the minimum detectable signal.
Outline Challenges System Architecture Circuit Implementation Test Results Conclusion
Reduction of On-Chip Frequency Multiplexing (N :1) and demultiplexing (1:N) the high-speed data at the transmission line* Reduces the on-chip frequency by a factor of N 5G For example: N = 5 => (for 5Gsym/s) f ck = ------ = 1GHz 5 Max switching speed of the process is the limit CMOS provides high-speed transistor switches Transmitter Drivers (f ck ) ckt0 ckr0 (f ck ) ckt1 ckt2 ckt3 ckt4 Transmission line High-speed data (Nxf ck ) ckr1 ckr2 ckr3 ckr4 Receiver Detectors Multiplexer Demultiplexer *C-K. Yang, R. Farjad, M. Horowitz, VLSI Symp. 97
Proposed Modulation 4-PAM is used for data communication in the serial link Symbol rate reduces to half that of binary transmission. Lower symbol rate reduces ISI and on-chip clock frequency. Higher level PAM was not used because of: limited transmitter swing, minimum detectable signal and reflection ISI. 4Sym-->5Sym conversion guarantees clock recovery Gray code mapping of levels reduces BER by 25% vs. linear mapping 2 bits error 11 10 01 00 Only 1 bit error 10 11 Linear Gray 01 00
Proposed Architecture to Combat ISI To cancel the long tail of pulse response, a pre-emphasis symbol-spaced 2-tap FIR filter is implemented at the transmitter: Vo( n) = Vi( n) a Vi( n 1) b Vi( n 2) To sharpen the signal transition edges, a half-symbol-spaced 1-tap high-pass equalizer is implemented at the receiver 1 Veq( n) = Vi( n) α Vi n --- 2 1 0.8 0.6 Shaped transmitted wave Received signal Equalized signal Pulse response Pre emphasized transmitted wave Received signal equalized signal Pulse response (no filtering) Amplitude(V) 0.4 0.2 0 0.2 a b 0.8 1 1.2 1.4 1.6 1.8 2 Time(ns)
Receiver Equalizer The half-symbol-spaced filter boosts frequency components up to f = 1 ----- Ts R=5Gsym/s T s =200ps f max =5GHz -f max f max Sharpening the transitions increases the eye opening width. =>Less sensitive to sampling phase errors. Slow transition Sharp transition
Outline Challenges System Architecture Circuit Implementation Test Results Conclusion
Top Level Architecture 10 Analog TX PLL 5:1 Multiplexer/Serializer 2-b DAC & Filter Serial 4-PAM Analog RX VCO 1:5 Demultiplexer Sampler & Equalizer 2-b ADC Bank 10 Ph/Fr Detector Filter V ctl
5:1 Multiplexing Transmitter Sym4 Sym0 Sym1 Sym2 Sym3 Ring Oscillator ck0 ck1 ck2 ck3 ck4 Sym0 R TERM R TERM CLK 0 D[0:1] (valid for Sym0) D D Out Outb CLK 1 Ts Clk 0 Clk 0 x 5 Clk 1 Each symbol is generated by the rising and falling egdes of two phases of clock
2-bit Output Driver Dout Dout D Dout Dout D D Clk 0 x2 Leg x1 Leg Clk 0 Clk 0 DoDo Clk0 Clk1 D1D1 tail Clk 1 Vdd Clk1 2-bit DAC module Differential drive leg
4-PAM Preshaping 5:1 Multiplexer clk0 clk1 clk2 data 2-b DAC Delay 2-b DAC,T1 Delay 2-b DAC,T2 clk1 clk2 clk3 data 2-b DAC Delay 2-b DAC,T1 Delay To 3 Other Drivers + - - - + - - - 2-b DAC,T2 a b External Sources Multiplexer Drivers & 2-Tap Filters Each driver generates a filtered symbol independent of other drivers. Simple architecture to implement the filter. t
Symbol Generation Tap2 stream S2,T2 S3,T2 S4,T2 S0,T2 S1,T2 Tap1 stream S3,T1 S4,T1 S0,T1 S1,T1 S2,T1 Main stream Sym4 Sym0 Sym1 Sym2 Sym3 Summed @ the output Main Pulse CLK 1 CLK 2 Ts D[0:1] (valid for Sym0) Delayed versions of D[0:1] D[0:1] (Ts delayed for T1) D[0:1] (2Ts delayed for T2) Tap-Drivers timing
Symbol-Width Problem Variations in PMOS to NMOS strength ratio result in duty cycle error in the clocks (unbalanced falling & rising times) The effective width of the final output symbol decreases.
Pulse-Width Control Loop To 4 other drivers D driver1 dummy T s Va Vb dummy Ck Buf D Ctl + - Ck Buf D Vdd D Ck1,2 Vdd Ck1,2 Ctl Vb Va (Wide Pulses) Vb Va (Narrow Pulses) t
Receiver Timing Recovery Din Pre_amplifier Clock Generator/ VCO samp_ck Phase Detector/ Receiver ctrl Filter PLL Dout Data Sample Clock Oversampling phase detection Many input samplers, Phase quantization error, Complex logic. timing margin Tracking phase detection Conventional bang-bang control: Low loop bandwidth and capture range Proportional control: Desirable
Top Level Architecture 10 Analog TX PLL 5:1 Multiplexer/Serializer 2-b DAC & Filter Serial 4-PAM Analog RX VCO 1:5 Demultiplexer Sampler & Equalizer 2-b ADC Bank 10 Ph/Fr Detector Filter V ctl
Timing Recovery: Front-end Multi-Φ Clocks 4-PAM Input (differential) Ck0 Ck1 Ck2 Ck3 S o0 S o1 S o2 S o3 2-bit ADC linear amp 2-bit ADC linear amp S e1 d0,1 d2,3 S e3 Simplified receiver front-end (x2 oversampling)
Proposed Proportional Phase Detection φ Clock Lags 1 1 S e = k. φ S e <0 S e >0 0 0 (1-->0) => -(S e ) > 0 (0-->1) => +(S e ) > 0 Speed clock Advantages: Larger PLL bandwidth and stability compared to bang-bang PLLs. Zero systematic phase offset (same detection mechanism for edges and data). Zero ripple on control voltage of PLL (unlike bang-bang). Disadvantage: Voltage offsets of edge samplers translate into phase error.
Three 4-PAM Transitions type1 type2 type2 type3 type1 Differential 4-PAM input Sampling edges (in lock) 1) Right crossing 2) Misplaced crossing 3) No crossing Only type1 transitions are used for clock recovery Transitions type2 and type3 are ignored by a decision logic
Data Phase Detector Data phase detector S e1 + - decision logic To input samplers S e3 + - d2,3 d4,5 Charge pump Loop filter d0,1 d2,3 Analog RX VCO S e9 + - V P d8,9 d0,1 Edge sample values, S e, of type1 are summed with correct polarity at phase detector output (V P )
Frequency Acquisition A frequency acquisition aid solves the small capture range problem of the data-recovery phase detector The frequency acquisition circuit sets the proper oscillation frequency before phase locking starts Data Phase Det V Q =0 ; f data -f ck > f V Q =1 ; f data -f ck < f << f capture f data -f ck LPF1 V P f CK VCO V Q Freq. Monitor LPF2 V Q CK ref Freq. Det
Frequency Monitor 1 V Q D Q f data -f ck Vp Edge detector C OneShot V O R Reset
Top Level Architecture 10 Analog TX PLL 5:1 Multiplexer/Serializer 2-b DAC & Filter Serial 4-PAM Analog RX VCO 1:5 Demultiplexer Sampler & Equalizer 2-b ADC Bank 10 Ph/Fr Detector Filter V ctl
1:5 Demultiplexing Samplers and Equalizers Multi-Φ Clocks 4-PAM Input differential Tap weight Ck 0 Ck 1 Ck 2 Ck 3 S 0 x S 1 x S 2 x S 3 Σ Σ Σ - - - Σ Ck 0 Ck 1 Ck 2 Ck 3 2-bit ADC S o0 S o1 linear amp S o2 2-bit ADC linear amp S o3 1-tap Equalizer: So 1 = S 1 - α*s 0 5 Samplers for the symbol centers and 5 samplers for transitions (x2 oversampling). Each equalizer uses the present and half a symbol earlier sample (half symbol spaced)
Half-symbol-Spaced 1-tap Equalizer I O1 = I 1 - α I 0 α*-i 0 S O1 = S 1 - α S 0 Analog 1-tap equalizer I 1 I O1 S On1 S Op1 S n0 S p0 S p1 S n1 Tap weight (α) Equalization function should be performed very fast Subtraction is done by summing the currents, which are proportional to the sampled values with opposite polarity. Differential pairs should have a large linear range for proper operation of analog equalization.
Input Preamplifier I D V GS I D = k.(v GS -V t ) Vip Von Vop Vin I D Vsrc Vsrc Linear V o Linear V GS Short-channel MOS has a linear I D -V GS characteristic in saturation region Vsrc should be set such that V o -V i is linear for all values of Vi. (V i -V src )
Differential 4-PAM Level detection + a2 - + a1 - + a0 - V ref Vin Vip Flash detection: Three comparators to detect the 4 levels. Differential signaling: Only one reference voltage is required +Vref 0 -Vref Data levels
Input 2-bit ADC Vref ref + Comp. - + - + - ref Comp. Comp. a2 a1 a0 Gray decoder A0 A1 Preamplifier & Equalizer SR-Latch Regenerative amp V n V p a2b Vo + Vref Decode logic a0 a2 a1 A0 A1 Vin clk0 clk1 Vip Vin clk0 Vip Vsrc
Outline Challenges System Architecture Implementation Test Results Conclusion
Modeling The Cable Oscilloscope tr~40p TDR => non-ideal pulse response DSP => Ideal impulse response Convolution => Cable real symbol response * = SPICE (Xmitter) Matlab (Cable) Matlab (Equalizer) SPICE models for skin effect are not ideal, need a better model: Directly measure the cable impulse response (time domain) Convolve it with the transmitted symbols
Simulated Eye Diagrams Amplitude (V) a) 1 0.5 0 0.5 Amplitude (V) b) 1 0.5 0 0.5 τ v 1 0 0.1 0.2 0.3 0.4 time (ns) 1 0 0.1 0.2 0.3 0.4 time (ns) c) 1 (a) Eye diagram after cable (Without pre-emphasis) (b) Eye diagram after cable (With pre-emphasis) (c) Eye diagram after cable (With pre-emphasis/equalization) Amplitude (V) 0.5 0 0.5 τ 1 0 0.1 0.2 0.3 0.4 time (ns) v
0.35-µm Transmitter Die Photo 4-PAM FIR Xmitter Analog TX PLL Ck Buf Resync. PRBS Enc. SRAM Total die area: 2mm x 1.5mm 4-PAM FIR transmitter: 0.8mm x 0.3mm
Measured Eye Diagrams a) b) c) (a) 10Gb/s eye diagram at source (No pre-emphasis) (b) 10Gb/s eye diagram after the cable (With Pre-emphasis) (c) 8Gb/s Eye diagram after the cable (With pre-emphasis)
0.35-µm Transmitter Performance Transmitter Data Rate Eye Height Eye Width 10Gb/s, 10meter, W/ Pre-emphasis 200mV 90ps - 70ps 10Gb/s, 10meter, No Pre-emphasis 0 0 8Gb/s, 10meter, W/ Pre-emphasis 350mV 110ps - 90ps 8Gb/s, 10meter, No Pre-emphasis < 60mV < 50ps Transmitter Output Jitter: Peak to peak 32ps RMS 8ps Power @ 10Gb/s (5Gsym/s): Analog 0.7watts Output Driver 0.5watts Sync/Logic 0.3watts --------------------------------------------------------------------------------------------------- Total 1.5watts
0.3-µm Full Transceiver Die Photo 4-PAM FIR Xmitter Analog bypass cap TX PLL RX PLL Samplers & Phase detector Resync PRBS dec. Resync PRBS Enc. SRAM Digital bypass cap Total die area: 2mm x 2mm Total receiver data recovery section: 0.85mm x 0.43mm
0.3-µm Transceiver Performance Maximum link speed Transmitter output Jitter @ 8Gbps Receiver PLL Jitter @8Gbps 8Gbps @ 3V 11ps (peak-peak), 2ps (rms) 28ps (peak-peak), 4ps (rms) Receiver PLL dynamics BW > 30MHz, Ph.m. > 48 Receiver PLL capture range Min. input swing to capture lock Min. input swing to maintain lock ~ 20MHz ±400mV (diff.) ±300mV (diff.) Power 8Gb/s, 3V Analog: 750mW 4-PAM driver: 220mW Other: 130mW Total: 1.1W
BER Measurements PRBS Gen. Xmitter encoder on-chip Driver Ck1 Ck2 Clock Gen. At 8Gbps: BER ~ 10-7, Window = 50ps Sampler (almost no improvement with input equalizer) At 6Gbps: BER ~ 10-14, Window = 150ps (30ps improvement with input equalizer) Receiver encode frm det on-chip PRBS decode Freq. Counter Possible factors for lower BER at high speeds: - Line Reflection: Bad on-chip terminations, package/bondwire - EMI from neighboring high-speed bondwires
Contributions A solution for multi-gbps transmission over bandwidth-limited cables in standard CMOS technology: Transmitter: A high-speed 4-PAM DAC design to reduce the symbol rate to half (v.s. 2-PAM). A FIR preshaping filter to perform at multi-gbps rates with very low complexity. A control circuit to optimize the width of the transmitted symbols. Receiver: An analog FIR equalizer effective up to multi-ghz ranges in CMOS technology. A new proportional data-recovery phase detector for detecting 4-PAM serial data. A new frequency-acquisition technique for data-recovery PLLs. A 4-PAM transceiver capable of data transmission up to 8 Gbps over 10-m copper cable with BW~1GHz in 0.3-µm CMOS technology.
Acknowledgments
Future Work Use higher-level N-PAM modulation Challenge: Very fast ADCs and DACs with higher resolution. Explore general methods of N-PAM data recovery. Advanced communication methods for narrow-band channels: Channel eigen function as transmission symbol. Maximum likelihood detection (ML) Multi-Carrier techniques (e.g DMT in ADSL) Use coding methods to reduce BER.
Frequency Monitor 1 D Q V Q f data -f ck Vp Edge detector C OneShot V O R Reset W 2W W Level Converter ECL to CMOS delay in+ in- Hysteresis edge detector
Analog Supply Drop Vdda Vdd Gnd Gnda Analog The on-chip VCOs speed was limited to 800MHz (8Gbps) due to analog supply drop Analog supply traces has ~1.8Ω resistance in series. - Only one pin for Vdda or Gnda 250mA analog supply current at 8Gbps => ~0.45V drop on analog supply!