High-Speed Links Vladimir Stojanovic (with slides from M. Horowitz, J. Zerbe, K.Yang and W. Ellersick) EE371 Lecture 16 Agenda : High Speed Links High-Speed Links, What,Where? Signaling Faster - Evolution» Circuits» Channel System-level design» Channel designer s view» IC designer s view Demo EE371 Lecture 16 2 1
What Makes a Link? Signaling: sending and receiving the information Tx Channel PCB, Coax, Fiber Rx Clocking: Determining which bit is which 1 0 0 1 0 1 0 t bit /2 EE371 Lecture 16 3 Spanning A Broad Space Inverter...to..DSL modem Metrics» Speed» Latency» Electrical environment» Power & area» Volume EE371 Lecture 16 4 2
Increasing Chip I/O Bandwidth Computers: Main memory: SDRAM100 (100 Mbps) RDRAM (0.8-1.1 Gbps) Peripherals: PCI (66 Mbps) Infiniband (2.5 Gbps) Networks: Physical Front End: LAN: Fast-Eth (100 Mbps) Gigabit-Eth (1Gbps) WAN: OC-12 (625 Mbps) OC-192 (12.5 Gbps) Switch Fabric: 625 Mbps 2.5 Gbps EE371 Lecture 16 5 Inside the Router Line Cards: 8 to 16 per System Passive Backplane Switch Cards: 2 to 4 per System MEM MEM MEM MEM SerDes Crossbar Optics SerDes MAC NPU TM/ Fabric IF SerDes OC-192 12.5Gb/s Laser driver link 4x3.125 Gb/s XAUI Serial Links (chip-to-chip) 3.125-12.5Gb/s Backplane Serial Links Regardless of where the links are, there is a constant desire to signal faster and with less power EE371 Lecture 16 6 3
Serial Link Signaling Over Backplanes - Past Channel was not an issue up to 2-3Gb/s Linecard Backplane Linecard serdes serdes 1.0 0.0 0.2 0.4 0.6 0.8 1.0 [GHz] Signal at Tx 0.1 2Gb/s view of the channel Signal at Rx Designs were limited by transmitter & receiver speed Clever circuit design no communications/si background needed EE371 Lecture 16 7 Signaling Low Impedance High Impedance VS Single Ended V S /2 + - d ref + - shared Differential + - d d + - EE371 Lecture 16 8 4
Transmitter Design Data Generation Pre-Driver Driver Encoder Sync Mux Tx 50 Critical components: Sync, Mux, Tx Design issues:» Slew rate control vs ISI, jitter» Output current and impedance control Clock and Driver power dissipation EE371 Lecture 16 9 Output Drivers On-chip clock speed limited to 6-8FO4 Need to send more bits/clock multiplex data EE371 Lecture 16 10 5
Simple Transmitter Data_O Data_E DDR: send a bit per clock edge Critical issues:» 50% duty cycle» Tbit > 4-FO4 30 output pulse width closure (%) 20 10 0 1 2 3 4 5 bit time (normalized to FO4) EE371 Lecture 16 11 Fastest Transmitter» Off chip time constant smaller than on chip: Generate current pulse at the output» Limited only by the output capacitance clock(ck3) ck3 D0 D1 D2 data(ck0) R TERM R TERM out_b out x 8 % eye closure 30.0 20.0 10.0 d0 d0» Limiting time constant 25-*Cpad» Cpad = 8*Cdriver + Cesd 0.0 0.50 0.60 0.70 0.80 0.90 1.00 Bit-width (#FO-4) EE371 Lecture 16 12 6
Simple Receiver in ref A clk D/A latch clk Regenerative latch has highest gain-bandwidth product of all amplifiers (gain exponential with time just need to wait long enough) Preconditioning stage: filter/integrate rectify CM Latch makes decision (4-FO4) DAC can be used to compensate offsets EE371 Lecture 16 13 Fastest Receiver Ring Oscillator clk0 clk1 clk2 clk3 ck0 clk0 clk1 D0 D1 D2 D7 din ck1 ck2 ck3 ck4 To Amplifiers clk2 clk3 Use multiple input receivers» Simplest 2, more complex 4-8» Decouples Tbit from latch resolution» Leverage high input impedance amplifiers EE371 Lecture 16 14 7
Serial Link Signaling Over Backplanes Linecard Backplane Linecard serdes serdes 1.00 0.0 1.0 2.0 3.0 4.0 5.0 [GHz] 0.10 Signal at Tx 0.01 Signal at Rx 0.00 10Gb/s view of the channel Now that we ve made the fastest Tx & Rx look what happens with the eye Need to look more closely into the channel as that seems to be the problem EE371 Lecture 16 15 The Backplane Environment Package Line card trace On-chip parasitic (termination resistance and device loading capacitance) Package via Back plane trace Back plane connector Line card via Backplane via [Kollipara, DesignCon03] The problem is there are many sources of Z and thus many possible sources of signal degradation Interference» Intersymbol (dispersion, reflections)» Co-channel (crosstalk) EE371 Lecture 16 16 8
Interference Attenuation [db] 0-10 -20-30 -40-50 -60 THROUGH FEXT NEXT 0 2 4 6 8 10 frequency [GHz] pulse response 1 0.8 0.6 0.4 0.2 0 Tsymbol=160ps 0 1 2 3 ns Inter-symbol interference» Dispersion (skin-effect, dielectric loss) - short latency» Reflections (impedance mismatches connectors, via stubs, device parasitics, package) long latency Co-channel interference (Far-End & Near-End Crosstalk) EE371 Lecture 16 17 Dispersion: Material Loss FR4 dielectric, 8 mil wide and 1m long 50 Ohm strip line 1 Attenuation 0.8 0.6 0.4 0.2 Total loss Conductor loss Dielectric loss 0 1.0E+06 1.0E+07 1.0E+08 1.0E+09 1.0E+10 Frequency, Hz PCB Loss : skin & dielectric loss» Skin Loss Laf» Dielectric loss L f : a bigger issue at higher f EE371 Lecture 16 18 9
Reflections: Z - Discontinuities Z2 Z1 ------------------- Z1+ Z2 Z1 2Z2 ------------------- Z1+ Z2 Z2 Energy flow into junction = transmitted + reflected energy Sources of Reflections : Z - Discontinuities» PCB Z mismatch» Connector Z mismatch» Vias (through) Z mismatch» Device parasitics - effective Z mismatch DC via Conn via BP EE371 Lecture 16 19 Reflections From Via Stubs Attenuation [db] 0-10 -20-30 -40-50 9" FR4, via stub 9" FR4 26" FR4-60 26" FR4, via stub 0 2 4 6 8 10 frequency [GHz] Additional sources of reflections : stubs» Vias - particularly on thick backplanes» Package plating stubs Top layer signaling results in large via stub EE371 Lecture 16 20 10
Reflections and Crosstalk Far-end XTALK (FEXT) Desired signal Reflections Near-end XTALK (NEXT) [Sercu, DesignCon03] EE371 Lecture 16 21 Crosstalk Many sources» On-chip» Package» PCB traces» Inside connector Differential signaling can help» Minimize xtalk generation & make effects common-mode Both NEXT & FEXT» NEXT very destructive if RX and TX pairs are adjacent Full swing-tx coupling into attenuated RX signal Effect on SNR is multiplied by signal loss» Simple solution : group RX/TX pairs in connector» NEXT typically 3-6%, FEXT typically 1-3% EE371 Lecture 16 22 11
A Complex System PCB only PCB + Connectors PCB, Connectors, Via stubs & Devices EE371 Lecture 16 23 Signaling Faster System Level Improvements Channel designer s view (passive techniques)» Try to make Z-discontinuities go away» Reduce cross-talk (EM isolation) IC designer s view (active techniques)» Design circuits that compensate/eliminate interference EE371 Lecture 16 24 12
Equalization For Loss : Goal is to Flatten Response x = Channel is band-limited Equalization : boost high-frequencies relative to lower frequencies EE371 Lecture 16 25 Receive Linear Equalizer Amplifies high-frequencies attenuated by the channel Pre-decision D D D Digital or Analog FIR filter W L Issues» Amplifies noise» Precision» Tuning delays (if analog)» Setting coefficients W 1 + W L-1 Adaptive algorithms such as LMS H(s) freq EE371 Lecture 16 26 13
Transmit Linear Equalizer Attenuates low-frequencies» Need to be careful about output amplitude : limited output power If you could make bigger swings, you would EQ really attenuates low-frequencies to match high frequencies Also FIR filter : D/A converter Can get better precision than RX Issues» How to set EQ weights?» Doesn t help loss at f H(s) freq EE371 Lecture 16 27 Transmit Linear Equalizer: Single Bit Operation 0.7 0.5 Unequalized Equalization Pulse End of Line Voltage 0.3 0.1-0.1-0.3 0.0 0.3 0.6 0.9 1.2 time (ns) EE371 Lecture 16 28 14
Example : 5Gbps over 26 FR4 no equalization with Tx linear equalizer EE371 Lecture 16 29 Decision Feedback Equalization Don t invert channel just remove ISI» Know ISI because already received symbols» Doesn t amplify noise» Has error accumulation problem Less of an issue in links where random noise small Feed-forward EQ FIR filter - Decision (slicer) FIR filter Feed-back EQ Requires a feed-forward equalizer for precursor ISI» Reshapes pulse to eliminate precursor EE371 Lecture 16 30 15
Transmit and Receive Equalization TX DATA RX DATA 3 TAP SEL LOGIC Transmit and receive equalizers are combined to make a range restricted DFE» Tx equalizer functions as the feed-forward filter» Rx equalizer restricted in performance of loop EE371 Lecture 16 31 Tx & Rx Equalization Ranges RX Equalizer 5-17 taps after main Pick any 5 taps TX Driver/Equalizer : 5 taps 1(pre)+1(main)+3(post) EE371 Lecture 16 32 16
Minimizing Reflections : The Vias Minimizing via stubs» Thinner PCBs are better but sometimes impossible» Counter-boring» Blind vias» SMT technology» All are costly 1.1x - 2x counter-bored blind via EE371 Lecture 16 33 Vias : Effect of Counter-boring Layer3 no Counter-boring Layer3 with Counter-boring Counter-boring top layer takes it from highest-loss to lowest-loss & reduces resonance EE371 Lecture 16 34 17
Minimizing Reflections: Termination Design On-chip termination» Bondwire & pad capacitance part of the channel instead of a stub (which rings) EE371 Lecture 16 35 Minimizing Reflections: FET Terminations IV-characteristic of two-element resistor [Dally] EE371 Lecture 16 36 18
Alternate Approaches: Multi-Level Signaling Binary (NRZ) is 2-PAM 2-PAM uses 2-levels to send one bit per symbol 4-PAM uses 4-levels to send 2 bits per symbol Each level has 2 bit value Signaling rate = 2 x Nyquist Signaling rate = 4 x Nyquist 00 0 0 01 1 1 11 10 Note : both can be either single-ended or differential EE371 Lecture 16 37 When Does 4-PAM Make Sense? Nyquist Frequency (GHz) 0.0 1.0 2.0 3.0 4.0-20db H(f) -40db -60db First order : slope of S21» 3 eyes : 1 eye = 10db» loss > 10db/octave : 4-PAM should be considered EE371 Lecture 16 38 19
Alternate Approaches: Simultaneous BiDirectional receive signal drv Vline transmit signal Vline Vref (Vline - Vref) +ve -ve VrefH VrefL rcvr Vref VrefH (shared) VrefL (shared) Two signals at half speed» Makes sense if b/w need equal in both directions Issues» Getting ideal timing between TX & RX is tough Fixed VrefL = Vdd 1.5*Vswing EE371 Lecture 16 39 Characterization System Multiple» Connectors» Backplane materials» Trace lengths» Layers/via lengths» Via technology These slides» 20 Trace length» FR4 non counter-bored» Nelco 6000 2-step counter-bored» Top & bottom layers Will show the Rambus 10Gb/s backplane SerDes demo on Friday EE371 Lecture 16 40 20
An attempt to shift the problem to DSP side 8-way DAC (8bit) and ADC (4bit) 8GSa/s A lot of power (not even including the DSP section) DACs and ADCs complex a lot of parasitic filtering channel degradation Still people are moving in that direction check out K. Poulton s 20GSa/s 8-bit ADC paper at ISSCC03 EE371 Lecture 16 41 Time-Interleaved DACs data0 clkstart0 clkend0 data7 clkstart7 clkend7 Current DAC Current DAC DACs enabled by overlap of two 1 GHz clocks» Need precise clocks: 3% pp phase noise=>24% pp symbol» Fast clocks (period of 8 gate delays) limit interleaving» Capacitance of all 8 DACs loads output N N 50 EE371 Lecture 16 42 21
DAC Output Circuitry RC out = 25* 4.3pF 1.5 GHz bandwidth 7 thermometer-coded size 32 outputs symbol time V ddreg clkstart data clkend low-fanout pre-driver 1 1 output driver 2 2 4 4 8 8 5-bit binary Predriver V ddreg controls output current 16 16 32 32 3-bit thermometer EE371 Lecture 16 43 Transceiver Inductors and Clocking adjust adjust adjust adjust 4 stage PLL adc adc adc adc Rx memory Timing Recovery adjust adjust adjust adjust 4 stage PLL Phase adjusters correct LC delay, static errors» Adjuster: clock mux, 1/16th-symbol clock interpolator» 8 ADC phase adjusters + 1 for timing recovery» 16 DAC phase adjusters (2 clocks for each DAC) dac dac dac dac Transmit memory EE371 Lecture 16 44 22
Next Challenges Improving PSR of all circuits in the path Integration of many links» Low power, area, portable solutions Control of complex architectures» Deal with loss, reflections and crosstalk Offset and mismatch» Both voltage and time Lots of opportunity for design! EE371 Lecture 16 45 Measured S21 s: FR4 no C-Bore 100% 90% 80% 70% S21 60% 50% 40% FR420top FR420bot FR410top FR410bot 30% 20% 10% 0% 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Nyauist Frequency (GHz) EE371 Lecture 16 46 23
26 FR4 Bot 3.125Gbps, 2P w/eq EE371 Lecture 16 47 26 FR4 Bot 6.4Gbps, 2P w/eq EE371 Lecture 16 48 24
26 FR4 Top 6.4Gbps, 2P w/eq EE371 Lecture 16 49 26 FR4 Top 6.4Gbps, 4P w/eq EE371 Lecture 16 50 25
Measured S21 s : N6k C-Bore 100% 90% 80% 70% S21 60% 50% 40% NCB20top NCB20bot NCB10top NCB10bot 30% 20% 10% 0% 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Nyquist Frequency (GHz) EE371 Lecture 16 51 26 N6k-cb Top 6.4Gbps, 2P EE371 Lecture 16 52 26
10G Eyes & System Margin Shmoos 3 /20 /3 = 26 Trace + 2 Connectors Tested to BER < 10-15 EE371 Lecture 16 53 Link Performance vs. Time Walker 02 EE371 Lecture 16 54 27
Link Efficiency: Gb/W, Gb/mm 2 (ISSCC 92-2001) Walker 02 EE371 Lecture 16 55 28