Silicon photonics and memories

Similar documents
Silicon-Photonic Clos Networks for Global On-Chip Communication

Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics

Addressing Link-Level Design Tradeoffs for Integrated Photonic Interconnects

Designing VLSI Interconnects with Monolithically Integrated Silicon-Photonics. Vladimir Stojanović MIT

More-than-Moore with Integrated Silicon-Photonics. Vladimir Stojanović Berkeley Wireless Rearch Center UC Berkeley

Designing Future VLSI Systems with Monolithically Integrated Silicon-Photonics

Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

TDM Photonic Network using Deposited Materials

The Light at the End of the Wire. Dana Vantrease + HP Labs + Mikko Lipasti

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance

MODELING AND EVALUATION OF CHIP-TO-CHIP SCALE SILICON PHOTONIC NETWORKS

A 3.9 ns 8.9 mw 4 4 Silicon Photonic Switch Hybrid-Integrated with CMOS Driver

Optical Bus for Intra and Inter-chip Optical Interconnects

Lecture: Integration of silicon photonics with electronics. Prepared by Jean-Marc FEDELI CEA-LETI

NEXT GENERATION SILICON PHOTONICS FOR COMPUTING AND COMMUNICATION PHILIPPE ABSIL

EE 232 Lightwave Devices Optical Interconnects

A tunable Si CMOS photonic multiplexer/de-multiplexer

A Nanophotonic Interconnect for High- Performance Many-Core Computation

Silicon Optical Modulator

IBM T. J. Watson Research Center IBM Corporation

Opportunities and challenges of silicon photonics based System-In-Package

Si photonics for the Zettabyte Era. Marco Romagnoli. CNIT & TeCIP - Scuola Superiore Sant Anna

Device Requirements for Optical Interconnects to Silicon Chips

Performance and Energy Trade-offs for 3D IC NoC Interconnects and Architectures

Transmission-Line-Based, Shared-Media On-Chip. Interconnects for Multi-Core Processors

A high-speed, tunable silicon photonic ring modulator integrated with ultra-efficient active wavelength control

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Thermal Management of Manycore Systems with Silicon-Photonic Networks

Challenges for On-chip Optical Interconnect

ISSCC 2006 / SESSION 13 / OPTICAL COMMUNICATION / 13.7

OPTICAL I/O RESEARCH PROGRAM AT IMEC

ON THE WAY TO PHOTONIC INTERPOSERS, BUILDING BLOCKS FOR USR-OPTICAL COMMUNICATION. OPTICS Workshop DATE 2017 Yvain THONNART Mar.

A 32 Gbps 2048-bit 10GBASE-T Ethernet Energy Efficient LDPC Decoder with Split-Row Threshold Decoding Method

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

Si Photonics Technology Platform for High Speed Optical Interconnect. Peter De Dobbelaere 9/17/2012

Comparison of Bandwidth Limits for On-card Electrical and Optical Interconnects for 100 Gb/s and Beyond

Silicon photonics integration roadmap for applications in computing systems

Design Space Exploration of Optical Interfaces for Silicon Photonic Interconnects

Electronic-Photonic ICs for Low Cost and Scalable Datacenter Solutions

Interconnect-Power Dissipation in a Microprocessor

Index. Cambridge University Press Silicon Photonics Design Lukas Chrostowski and Michael Hochberg. Index.

ECEN620: Network Theory Broadband Circuit Design Fall 2014

Convergence Challenges of Photonics with Electronics

ECEN689: Special Topics in Optical Interconnects Circuits and Systems Spring 2016

On-Chip Optical Interconnects: Prospects and Challenges

Si CMOS Technical Working Group

IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS 2010 Silicon Photonic Circuits: On-CMOS Integration, Fiber Optical Coupling, and Packaging

Lecture 04 CSE 40547/60547 Computing at the Nanoscale Interconnect

Practical Limitations of State of the Art Passive Printed Circuit Board Power Delivery Networks for High Performance Compute Systems

Multiband RF-Interconnect for Reconfigurable Network-on-Chip Communications UCLA

- no emitters/amplifiers available. - complex process - no CMOS-compatible

Jason Cong, Glenn Reinman.

ECEN689: Special Topics in Optical Interconnects Circuits and Systems Spring 2016

Innovations in Photonic Integration Platforms

Silicon Photonics Transceivers for Hyper Scale Datacenters: Deployment and Roadmap

ESE532: System-on-a-Chip Architecture. Today. Message. Crossbar. Interconnect Concerns

Communications. Mitchell Fields, Ph. D. Director of Strategic Marketing

D6.3: Evaluation of the 2nd generation 2x2 PLATON optical interconnect router

Cisco PONC Pavan Voruganti Senior Product Manager. March 2015

DSENT A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling

Addressing Link-Level Design Tradeoffs for Integrated Photonic Interconnects

Silicon Photonics Technology Platform To Advance The Development Of Optical Interconnects

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Optical Networks emerging technologies and architectures

Chapter 7 Introduction to 3D Integration Technology using TSV

Silicon Photonics for Mid-Board Optical Modules Marc Epitaux

PhoenixSim: A Simulator for Physical-Layer Analysis of Chip-Scale Photonic Interconnection Networks

Silicon Photonics in Optical Communications. Lars Zimmermann, IHP, Frankfurt (Oder), Germany

Photo-Electronic Crossbar Switching Network for Multiprocessor Systems

RF Interconnects for Communications On-chip*

EECS 427 Lecture 22: Low and Multiple-Vdd Design

BANDWIDTH LIMITATIONS IN FUTURE MANY-CORE PROCESSORS. THIS ARTICLE FIRST

Integrated Photonics using the POET Optical InterposerTM Platform

Mitigation of Mode Partition Noise in Quantum-dash Fabry-Perot Mode-locked Lasers using Manchester Encoding

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Petar Pepeljugoski IBM T.J. Watson Research Center

LSI and Circuit Technologies for the SX-8 Supercomputer

Photonics Integration and Evolution of the Optical Transceiver Presented by: Giacomo Losio ProLabs

Measurement Results for a High Throughput MCM

Silicon Photonics Photo-Detector Announcement. Mario Paniccia Intel Fellow Director, Photonics Technology Lab

Low Transistor Variability The Key to Energy Efficient ICs

Impact of High-Speed Modulation on the Scalability of Silicon Photonic Interconnects

Integrated electro-optical waveguide based devices with liquid crystals on a silicon backplane

UNIT-II LOW POWER VLSI DESIGN APPROACHES

New silicon photonics technology delivers faster data traffic in data centers

Cisco s CLEC Networkers Power Session

Monolithic Integra/on of O-band Photonic Transceivers in a Zero-change 32nm SOI CMOS

Photonic Integrated Beamformer for Broadband Radio Astronomy

Communication is ubiquitous; communication is the central fabric of human existence.

Emerging Highly Compact Amplification Solutions for Coherent Transmission

Physical Layer Analysis and Modeling of Silicon Photonic WDM Bus Architectures

Zukunftstechnologie Dünnglasbasierte elektrooptische. Research Center of Microperipheric Technologies

20Gb/s 0.13um CMOS Serial Link

MTO Technology Programs Progress. Frank Stroili Technical Director, RF/Mixed signal

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

Synthesis of Optimal On-Chip Baluns

Practical Information

DATE 2016 Early Reliability Modeling for Aging and Variability in Silicon System (ERMAVSS Workshop)

AS core count increases in manycore systems to support

Transcription:

Silicon photonics and memories Vladimir Stojanović Integrated Systems Group, RLE/MTL MIT

Acknowledgments Krste Asanović, Christopher Batten, Ajay Joshi Scott Beamer, Chen Sun, Yon-Jin Kwon, Imran Shamim Rajeev Ram, Milos Popovic, Franz Kaertner, Judy Hoyt, Henry Smith, Erich Ippen Hanqin Li, Charles Holzwarth Jason Orcutt, Anatoly Khilo, Jie Sun, Cheryl Sorace, Eugen Zgraggen Michael Georgas, Jonathan Leu, Ben Moss Dr. Jag Shah DARPA MTO Texas Instruments Intel Corporation 2

Processors scaling to manycore systems 64-tile system (64-256 cores) - 4-way SIMD FMACs @ 2.5 5 GHz - 5-10 TFlops on one chip - Need 5-10 TB/s of off-chip I/O - Even larger bisection bandwidth 2 cm Intel 48 core -Xeon 2 cm 3

Bandwidth, pin count and power scaling 8 Flops/core @ 5GHz 2,4 cores 256 cores Package pin count Need 16k signal pins in 2017 for HPC 2 TFlop/s signal pins 1 Byte/Flop 4

Electrical Baseline in 2016 Node Board 10 TFlop/s 512 GB 80 Tb/s mem BW CPU Power 1kW -> 100W Energy-efficiency 100 pj/flop -> 10pJ/Flop 200 W Cross-chip Processor + Router P R DIMM DIMM DIMM DIMM Request DIMM DIMM DIMM DIMM P P P P P P P P P P P P CPU 64 x 8 x 32 = 16k High-speed signal pins DIMM DIMM DIMM DIMM 1024 400 W I/O P P P P 1kW Compute Memory Power 1kW 200 W Cross-chip 400 W I/O 400 W Activate P Processor Router Memory Controller DIMM DIMM DIMM DIMM Response 512 x 1GB chips 8 chips per DIMM 1DIMM per memory channel Need at least 16 banks/chip to sustain BW 64 memory channels (controllers) 1.28 Tb/s per controller 160 Gb/s per chip (16 x 10 Gb/s) @ 5pJ/b 5

Monolithic CMOS-Photonics in Computer Systems Supercomputers Si-photonics in advanced bulk CMOS, thin BOX SOI and process NO costly process changes Embedded apps Bandwidth density need dense WDM Energy-efficiency need monolithic integration 6

CMOS photonics density and energy advantage Metric Energy (pj/b) Bandwidth density (Gb/s/μ) Global on-chip photonic link 0.1-0.25 160-320 Global on-chip optimally repeated electrical link 1 5 Off-chip photonic link (100 μ coupler pitch) 0.1-0.25 6-13 Off-chip electrical SERDES (100 μ pitch) 5 0.1 Assuming 128 10Gb/s wavelengths on each waveguide 7

But, need to keep links fully utilized Fixed and static energy increase at low link utilization! Energy [fj/b] 8 8

Core-to-Memory network: Electrical baseline C = Core, DM = Module Mesh Router Router and Access Point Both cross-chip and I/O costly 9

Aggregation with Optical LMGS* network * Local Meshes to Global Switches Ci = Core in Group i, DM = Module, S = Crossbar switch Shorten cross-chip electrical Photonic both part cross-chip and off-chip 10

Photonic LMGS: Physical Mapping Network layout optimization significantly affects the component requirements 64-tile system w/ 16 groups, 16 Modules, 320 Gbps bi-di tile- module BW [Joshi et al PICA 2009] 11

Photonic LMGS - U-shape 64-tile system w/ 16 groups, 16 Modules, 320 Gbps bi-di tile- module BW 12

Photonic LMGS - U-shape 64-tile system w/ 16 groups, 16 Modules, 320 Gbps bi-di tile- module BW 13

Photonic LMGS - U-shape 64-tile system w/ 16 groups, 16 Modules, 320 Gbps bi-di tile- module BW 14

Photonic LMGS - U-shape 64 tiles 64 waveguides (for tile throughput = 128 b/cyc) 256 modulators per group 256 ring filters per group Total rings > 16K 0.32W (thermal tuning) 15

Photonic device requirements in LMGS - U-shape Through loss (db/ring) Waveguide loss (db/cm) Optical Laser Power Die Area Overhead Waveguide loss and Through loss limits for 2 W optical laser power 16

Photonic LMGS ring matrix vs u-shape LMGS ring matrix LMGS u-shape 0.64 W power for thermal tuning circuits 2 W optical laser power Waveguide loss < 0.2 db/cm Through loss < 0.002 db/ring 0.32 W power for thermal tuning circuits 2 W optical laser power Waveguide loss < 1.5 db/cm Through loss < 0.02 db/ring [Batten et al Micro 2009] [Joshi et al PICA 2009] 17

Power-bandwidth tradeoff 2-3x better 8-10x better 1 group, OPF = 1 4 group, OPF = 1 16 group, OPF = 1 1 group, OPF = 4 4 group, OPF = 2 16 group, OPF = 1 Electrical with grouping Electrical with grouping and over-provisioning Optical with grouping and over-provisioning 18

System Organization Defragmentation [Beamer et al ICS 2009] Example 256 core node with 64 core dies 19

System Organization Die view 64 core die supporting 256 core node 20

Electrical is also Limited Pin-bandwidth on the compute chip I/O energy to move between chips Cross-chip energy within chip Activation energy within chip 21

Solution: Silicon Photonics [Beamer et al ISCA 2010] Great bandwidth density Great off-chip energy efficiency Costs little additional energy to use on-chip after off-chip Enables page size reduction 22

Current Structure 23

Photonics to the Chip Electrical Baseline (E1) Photonics Off-Chip w/electrical On-Chip (P1) 24

Photonics Into the Chip 2 Data Access Points per Column (P2) 8 Data Access Points per Column (P8) 25

Reducing Activate Energy Want to activate less bits while achieving the same access width Increase number of I/Os per array core, which decreases page size Compensate the area hit by smaller photonic off-chip I/O Initial Design Double the I/Os (and bandwidth) 26

Methodology Photonic Model - aggressive and conservative projections Model - Heavily modified CACTI-D Custom C++ architectural simulator running random traffic to animate models Setup is configurable, in this presentation: 1 chip to obtain 1GB capacity with >500Gbps of bandwidth provided by 64 banks 27

Energy for On/Off-Chip Floorplan 28

Reducing Row Size 4 I/Os per Array Core 32 I/Os per Array Core 29

Latency Not a Big Win Latency marginally better Most of latency is within array core Since array core mostly unchanged, latency only slightly improved by reduced serialization latency 30

Area Neutral 4 I/Os per Array Core 32 I/Os per Array Core 31

Scaling Capacity Motivation: allow the system to increase capacity without increasing bandwidth Shared Photonic Bus Vantrease et al., ISCA 2008 Disadvantage: high path loss (grows exponentially) due to couplers and waveguide 32

Split Photonic Bus Advantage: much lower path loss Disadvantage: all paths lit 33

Guided Photonic Bus Advantage: only 1 low loss path lit 34

Scaling Results Aggressive Photonic Device Specs 35

With Photonics... 10x memory bandwidth for same power Higher memory capacity without sacrificing bandwidth Area neutral Easily adapted to other storage technologies 36

Conclusion Computer interconnects are very complex microcommunication systems Cross-layer design approach is needed to solve the on-chip and off-chip interconnect problem Most important metrics Bandwidth-density (Gb/s/um) Energy-efficiency (mw/gb/s) Monolithic CMOS-photonics can improve the throughput by 10-20x But, need to be careful Optimize network design (electrical switching, optical transport) Use aggregation to increase link utilizations Optimize physical mapping (layout) for low optical insertion loss 37

Backup Slides

Photonic Technology Monolithically integrated silicon photonics being researched by MIT Center for Integrated Photonic Systems (CIPS) Orcutt et al., CLEO 2008 Holzwarth et al., CLEO 2008

Photonic Link Each wavelength can transmit at 10Gbps Dense Wave Division Multiplexing (DWDM) 64 wavelengths per direction in same media Rough Comparison Electrical Photonic Off-Chip I/O Energy (pj/bit) 5 0.150 Off-Chip BW Density (Tbps/mm 2 ) 1.5 50.000

Resonant Rings light not resonant resonant light resonant light w/ drop path figures inspired by [Vantrease, ISCA 08]

Ring Modulators Modulator uses charge injection to change resonant wavelength When resonant light passes it mostly gets trapped in ring resonant racetrack modulator modulator off

Ring Modulators Modulator uses charge injection to change resonant wavelength When resonant light passes it mostly gets trapped in ring resonant racetrack modulator modulator on

Photonic Components

Why 5pJ/b for Electrical? Prior work has claimed lower than our forecasted 5pJ/b for off-chip electrical I/O 2.24 pj/b @ 6.25Gbps (Palmer et al., ISSCC 2007) 1.4 pj/b @ 10Gbps (O Mahony et al., ISSCC 2010) Some important differences to consider: We assume 20Gbps per pin Otherwise will definitely be pin limited At higher data rates it is hard to be as energy efficient: 8-13pJ/b @ 16Gbps (Lee et al., JSSC 2009) process has slower transistors leading to less energy efficient drivers Background energy averaged in (clocking, fixed energy, not 100% utilization)

Control Distribution Electrical Baseline & Control H-Tree Access Point Control distributed from the center of the chip H-tree spreads out to banks Photonic Floorplan showing Control Can power gate control lines to inactive banks

Full Energy Conservative Aggressive 64 Wavelengths, 4 I/Os 64 Wavelengths, 32 I/Os 8 Wavelengths, 32 I/Os

Conservative Utilization Aggressive 64 Wavelengths, 4 I/Os 64 Wavelengths, 32 I/Os 8 Wavelengths, 32 I/Os

Full Area 64 Wavelengths, 4 I/Os 64 Wavelengths, 32 I/Os 8 Wavelengths, 32 I/Os

Full Scaling Aggressive Conservative