Comparison of Bandwidth Limits for On-Card Electrical and Optical Interconnects for 100 Gb/s and Beyond Petar Pepeljugoski IBM T.J. Watson Research Center
Collaborators and Acknowledgements Fuad Doany, Clint Schow, Dan Kuchta, Laurent Schares, Jeff Kash, Marc Taubenblatt, Mark Ritter, Chris Baks, Rich John, Lei Shan, Young Kwark, Dong Kam, John Bulzacchelli, Xiaoxiong Gu, Troy Beukema, Christian Schuster, Renato Rimolo-Donadio, Boping Wu and others Sponsors of specific research programs: IBM s Terabus and TELL programs are partially supported by DARPA 2
Outline Bandwidth and performance needs Electrical interconnect description Hardware results Comparison of simulations to measurements Optical Interconnect Hardware results Comparison of simulations to measurements Escape bandwidth and comparison metrics Electrical escape bandwidth Optical escape bandwidth Comparison of electrical and optical interconnects Conclusions 3
Bandwidth & Performance needs increasing steadily Exa- Cluster/Parallel: 90% CAGR, continuing Performance (log) Peta- Tera- Box: 70-80% CAGR, continuing Uniprocessor: 50% CAGR, slowing Transistors & Pkg: 15%-20% CAGR, slowing CPU Trend: ~50-60% (2x/18 mo.) Parallel System Trend: (~90%) = CPU trend + increased parallelism Time (linear) 08/11/2007 http://www.top500.org IBM Cell Processor Moore s Law (at the system performance level) no longer 9 processors, ~200GFLOPs comes just from improvements at the chip level On- and Off-chip BW~100GB/sec (0.5B/FLOP) Parallel System performance increasingly comes from highlevel interconnection of increasingly-parallel chips & boxes BW requirements must scale with System Performance, ~1B/FLOP (memory & network) Requires exponential increases in communication bandwidth at all levels of the system Interrack, backplane, card, chip 4
On-Card Bandwidth Needs Increasing Steadily BW for on card links increasingly difficult to achieve and bottleneck for future systems Develop a set of communication metrics which will help system designers quantify trade-offs between electrical and optical technologies 250 P P P P MEMORY Mem Bus BW (GB/s) 200 150 100 50 12 to 16 Gb/s/ch Needed!! 0 2008 2009 Year 2011 2012 5
Outline Bandwidth and performance needs Electrical interconnect description Optical Interconnect Escape bandwidths and comparison metrics Conclusions 6
Electrical Interconnect Experimental setup and variants Data collection and link characterization Correlation of experimental and simulation data Prediction of Link limits 7
Schematics of Experimental Setup IC 1 IC 2 High-Speed Links (15 to 60 cm) Module 1 Module 2 CLK 1 Power Supply 1 Daughtercard CLK 2 Power Supply 2 PC LPT1 FR4 Motherboard LPT2 PC WEST EAST Aggregate bandwidth 342 Gb/s - 16 electrical channels (each way), up to 11 Gb/s 8
Testbed Hardware Multichannel power supply Motherboard - low cost PCB supplies power, control, clocks Power Supply interface PC interface High speed nets different lengths and via topologies Ref Clocks HMZD connectors HMZD connectors Common pad design accommodates LGA or BGA or Test Socket Daughtercard in various PCB technologies, small area enables many variants 11Gb/s chip hardware for link BER 16 full duplex channels pattern gen/error detectors/eye measurement Leverages existing hardware 9
Link variants reflect industry-wide interest Data Rate 7 to 11 Gbps Material: Megtron 6 (low loss), Nelco 4000-13 (mid loss) Channel Characteristics Length: PCB differential traces 15, 30, 45 and 60 cm Routing: different layers, via stub Signal to power via ratio (module, PCB) Channel Conditions Crosstalk: up to 32 links Multiple data patterns can be selected Equalization Complexity FFE: 3-tap Tx side DFE: 5-tap Rx side Too many variables for manual data collection and analysis! SW HW Total of 4608 combinations! 10
Data collection Automated link parameter measurements without user intervention Automated data analysis and plotting due to large volume of data Required due to large number of link variants (modules, trace lengths, board technologies) Complete suite of measurements takes 28 hours to finish Save link quality indicators from all channels Adaptive algorithm to find best optimized link coefficients for equalization tap weights 11
Active Link Characterization Completed the entire loop of link performance measurements for different link lengths, equalization settings, signal-to-power ratio and board materials. 100 Megtron6 Nelco 2:1 w/ crosstalk @1E-3 BER 90 80 V Ap Amin 70 60 EyeH t Norm Amin [%] 50 40 30 20 10 0 no EQ FFE only DFE only FFE+DFE no EQ FFE only DFE only FFE+DFE 30 cm 60 cm 12
Model-to-Hardware Correlation Good correlation observed for different channels and equalization. 100 90 80 70 Norm Amin [%] 60 50 40 Active Measurement HSSCDR(passive measurement) HSSCDR(model) 30 20 10 0 Megtron6 2:1 @1E-3 BER A0 A1 A2 A3 B0 B2 B3 A0 A1 A2 A3 B0 B2 B3 A0 A1 A2 A3 B0 B2 B3 A0 A1 A2 A3 B0 B2 B3 A0 A1 A2 A3 B0 B2 B3 A0 A1 A2 A3 B0 B2 B3 A0 A1 A2 A3 B0 B2 B3 A0 A1 A2 A3 B0 B2 B3 no EQ FFE only DFE only FFE+DFE no EQ FFE only DFE only FFE+DFE 45 cm 60 cm We can extrapolate our simulations to explore higher data rate or longer distance 13
Maximum Achievable Link Distance (@11Gb/s) 45 cm is okay even without equalization @ 1E-15 BER Can go beyond 1 meter with both FFE and DFE 100 Megtron6 2:1 w/ crosstalk @1E-15 BER 90 80 70 Norm Amin [%] 60 50 40 w/o EQ FFE only DFE only FFE+DFE 30 20 10 0 45 cm 60 cm 90 cm 120 cm Enables various what-if analyses on different distances and equalization types 14
Electrical Data Rate Limits Megtron-6 Board The improved channel of Megtron-6 coupled with IC technology and architectural improvements could lead to higher data rates @ 1e-12 BER 40 FFE + DFE TELL Hardware No IC Parasitics Ideal case: NRZ, TX and RX do only shaping to avoid waveform ringing, no IC parasitics, TELL module Max Data Rate [Gb/s] 30 20 10 0 20 40 60 80 100 120 Distance [cm] 15
Outline Bandwidth and performance needs Electrical interconnect description Optical Interconnect Escape bandwidth and comparison metrics Conclusions 16
Optical Interconnects Experimental setup and variants Experimental results Link model assumptions Predictions of optical link limits 17
Terabus: A full technology set for waveguide-based links on a card Optomodule SLC Transceiver IC OE Lens Array Optocard Objectives Demonstrate high-speed board-level optical links through integrated waveguides Highly integrated packaging approach to yield dense Optomodules that look like surface-mount electrical chip carriers Component and Package Development Transceiver Chip: Low Power CMOS driver and receiver designs Optical Components: 2D arrays of 985nm VCSELs and pin photodiodes, with integrated backside lenses Organic Carrier: Multi-level high-speed wiring for transceiver data and power Packaging and Assembly: Solder hierarchy, optical system design, mechanical tolerances, thermal analysis Optocards: Dense array of low-loss optical waveguides and turning mirrors Other chips SLC Opto chip Heading towards an optically-enabled multi-chip module 18
Optical MCM provides much higher bandwidth off the module CPU Area req d for equivalent bandwidth electrically package BGA Top Bottom electrical FR4 base circuit board 800 μm pitch IC B G A Cutout with OEs-on-IC 200 μm pitch CPU Organic package CMOS TRX and SLC OE Lens array optical Base circuit board with optical waveguides & turning mirrors 19
Optomodule: Fiber-Coupled Transmitter at 10-20 Gb/s Transceiver, TX 10 Gb/s Pattern Generator PRBS 27-1 500 mvpp (SE) High-Speed Photodiode LDD 50-μm MMF Fiber-probe Oscilloscope (20 GHz) 15 Gb/s 20 Gb/s Core Supplies: VDD = 1.8 V (preamplifier) VDD = 2.7 V (output stage) Power Dissipation: 73 mw/channel 7-μm diameter VCSELs 20 Data rate (Gb/s) Power (mw/gb/s) 10 7.3 15 4.9 20 3.7
Optomodule: Fiber-Coupled Receiver (Full-Link), 10-15 Gb/s 10 Gb/s Transceiver, TX Pattern Generator PRBS 27-1 500 mvpp (SE) Transceiver, RX Error-detector LDD TIA/LA 50-μm MMF Fiber-probe Oscilloscope (20 GHz) 12.5 Gb/s Designed for 8b/10b coding Core supplies: VPD = 2.5 V (photodiode) VDD = 1.8 V (TIA, LA, Buffer) 15 Gb/s >270 mvpp diff outputs Power dissipation: 62 mw/channel 21 Data rate (Gb/s) Power (mw/gb/s) 10 6.2 12.5 5 15 4.1
Optical Link Escape Bandwidth Assumptions 10 Gb/s Optics Trise, fall = 49 ps, TX OMA = 2 dbm RX bw = 7.5 GHz, RX OMA sens = -11 dbm 20 Gb/s Optics Trise, fall = 27 ps, TX OMA = 2 dbm RX bw = 14 GHz, RX OMA sens = -11 dbm Losses: Waveguide 0.05 db/cm, mirror and coupling 3.75 db 1 db of power penalty is equivalent to 20 cm of achievable distance 26 mm electrical trace (Dx=13 mm, Dy=13 mm) Worst-case: Dx = 25 (3+9)=13mm Dy = 25 (3+9)=13mm Module 50x50 mm Chip 20x20 mm 22
Optical Interconnect Max Data Rate / Distance Max Achievable Data Rate (or distance) limited by: For EOE with 10 Gb/s Optics by 10 Gb/s optics due to high ISI in optical link For EOE with 20 Gb/s Optics by electrical trace due to high DJ in electrical links Maximum Data Rate [Gb/s] 100 90 80 70 60 50 40 30 20 EOE with 10G Terabus Optics EOE with 20G Terabus Optics 20G Terabus Optics Only Ideal, Channel Limit Only 10 Waveguide BW*L > 40 GHz 0 20 40 60 80 100 120 140 160 Distance [cm] 23
Outline Bandwidth and performance needs Electrical interconnect description Optical Interconnect Escape bandwidth and comparison metrics Conclusions 24
Physical Limits to Electrical Escape Bandwidth Cross-section view Top view Chip Module Bandwidth of Elements 50mm mod (Tb/s) C4-90 Mod tl 29-114 8-4-8 Wiring under module With typical 1mm LGA via/antipad, standard (full) vias and conductor widths, can only escape one differential pair per channel per wiring level around perimeter of module. P1 S2 P3 S4 P5 S6 P7 P8 P9 P10 S11 P12 S13 P14 P15 S16 P17 S18 P19 S20 P21 P22 S23 P24 S25 P26 S27 P28 LGA connect. PCB LGA Pins 12.6 Card tl 70 LGA is BW bottleneck 25
Maximum Electrical Escape BW for Standard PCBs 2,500 LGA contacts for 50 x 50 mm LGA module 400 Vdd, Ground pins for ~ 200 W processor @ 1V (200 Vdd, 200 Gnd) 200 Low-speed and test signals 1,900 Pins allocated to high-speed signals Escape BW given 1,900 pins allocated to high-speed signals: Signal to Ref. ratio # Diff Pairs BW @ 10 Gb/s @ 20 Gb/s 2:1 633 6.3 Tb/s 12.6 Tb/s Number of diff pairs we can escape is limited by LGA We are not PCB wiring density limited, but LGA field escape limited 26
Optical Escape Bandwidth- OE Elements on Module 3.7mW per Gb/s/port CPU CMOS TRX and SLC 50mm Organic Module Organic package OE Lens array Base circuit board with optical waveguides & turning mirrors Normalized Amplitude [a.u.] 1.25 1 0.75 0.5 0.25 0 λ=850nm L=2.55m Output Pulse Max escape bandwidth is limited by the number of OE modules and the number of waveguide layers For 2 waveguide layers, the bandwidth is 76.8 Tb/s Input Pulse -0.25 40 60 80 100 120 140 160 180 200 220 240 Time [ps] Bandwidth of 1m waveguide in excess of 40 GHz 27
Module Escape Bandwidth Summary: Bandwidth of Elements (Tb/s) * Data Rate Electrical BW (Tb/s) 1mm LGA Pitch Optical BW (Tb/s) One WG layer Optical BW (Tb/s) Two WG layers Electrical C4 90-160 Mod tl 29-114 8-4-8 Optical C4 90-160 Mod tl 29-114 8-4-8 10 Gb/s 6.3 23 46 LGA Pins 12.6 Optical Modules 76.8 20 Gb/s 12.6 38.4 76.8 Card tl 70 Optical WG 192 * Not drawn to scale Electrical interconnect limited by LGA Requires large number of wiring layers Optical interconnect limited by number of module and waveguide layers Adding second layer reduces the number of transceivers in the second rank 28
Electrical and Optical Metrics, 10 Gb/s Signaling Area on Package [mm 2 /port] Optical 0.576 Electrical 3.24 BW Escape from 50mm x 50mm module BW Perimeter Escape Density (D) @ 10 Gb/s [Gb/s/mm] Media distance*bandwidth/channel [GHz m] (single wavelength, no WDM) Active Channel Gb/s*distance/channel [Gb/s m] (limited by OE and I/O, no WDM) Power (80cm link) (P) [mw/gb/s/port] Technology Comparison Metric (D*BW/P) [Gb/s/mm peri * Gb/s m / mw/port] 38.4 Tb/s 192 > 45 > 15 11.2 38.4 6.3 Tb/s 25 ~ 12 ~ 14 12.5 2.8 29
Electrical and Optical Metrics, 20 Gb/s Signaling Area on Package [mm 2 /port] BW Escape from 50mm x 50mm module BW Perimeter Escape Density (D) @ 20 Gb/s [Gb/s/mm] Media distance*bandwidth/channel [GHz m] (single wavelength, no WDM) Active Channel Gb/s*distance/channel [Gb/s m] (limited by OE and I/O, no WDM) Power (80cm link) (P) [mw/gb/s/port] Technology Comparison Metric (D*BW/P) [Gb/s/mm peri * Gb/s m / mw/port] Optical 0.576 76.8 Tb/s 384 >45 >26 < 17.5 > 40 Electrical 3.24 12 Tb/s 60 ~ 12 ~ 16.5 30.0 1.65 30
Conclusions Measurement automation allowed optimization, massive matrix exploration Good model hardware correlation observed We used simulation extrapolation to determine electrical link limits 25 Gb/s possible with good channels, improved ICs Electrical and Optical Interconnect limits: Electrical: limited by LGA to 12.6 Tb/s for a 50mm module, 8 SIG/GND layers Optical: limited by number of OE modules and waveguide layers to 76 Tb/s, 2 waveguide layers Optical waveguides give >30x better BW per signal layer vs. electrical PCBs WDM would give even greater advantage for optical interconnects 31