6th International Conference of Soft Computing and Pattern Recognition, August 11-14, 2014, Tunis, Tunisia On-Chip Optical Interconnects: Prospects and Challenges Abderazek Ben Abdallah The University of Aizu School of Computer Science and Engineering Division of Computer Engineering Adaptive Systems Laboratory Aizu-Wakamatsu, Japan E-mail: 1
Agenda Motivation Optical Interconnect Prospects PHENIC Si-Photonics Network-on- Chip Technology Challenges Conclusion 2
HP computing today The switch backplane Tianhe-2, # 1 in Nov. 2013 Features 16,000 nodes, each with 2 Intel Xeon IvyBridge CPUs and 3 Xeon Phi CPUs for a combined total of 3,120,000 computing cores 33.9 Pflops ( 4% of the Exascale target (2020)) 17.8 MW (89 % of the 20 MW power limit) 8
A closer look at a computing system CPU On-chip bottleneck 2nJ/Inst PCI express ( 48/MB/s) Off-chip GPU If we consider exascale within 20 MW? We need 20pJ/Instruction! Target performance is far by using today's machines. 200pJ/Inst 5
Communications cost Energy cost of data movement relative to the cost of a flop for current and 2018 systems. (Shalf et al., VECPAR 2010) Challenges Preparing the operands costs more than performing computing on them! There is no Moore s law for communications. 6
Gate vs. interconnect delays Sor. IDEAL Research 7
Agenda Motivation Optical Interconnect Prospects PHENIC Si-Photonics Network-on- Chip Technology Challenges Conclusion 8
The idea Replace wires with waveguides and electrons with photons! (Photo: Spectrum 2005, Paniccia) 9
Milestones 1mm 1m 1 km 1 Mm Si Photonics target area On-chip chip to chip 10 cm board to board rack to rack 100m LAN long haul 1000 Km 1cm 1m 10Km Optical wire/waveguide Optical cable/fiber 10
A typical architecture today 8.5 GBpS 30 mw/gbps DDR3 DRAM Coper link Multicore Processor (CMP) Big cores for single thread performance Small cores for multithread performance Accelerating Multi- and Many-core Coper link consumes large power an alternative approach is needed. 11
Photonics in computing system today Transmission over fiber DRAM Optical link Receiver/Transmitter Multicore Processor (CMP) Uses monolithic integration that reduces energy consumption Utilizes the standard bulk CMOS flow Cladding is used to increase the total internal reflection reduces data loss 12
Photonics in computing system today λ1 λ2 λ3 λn Transmission over fiber (WDM) >1 TBps <1 mw/gbps channel DRAM WDM, DWDM Receiver/Transmitter Multicore Processor (CMP) Supports WDM that improves bandwidth density DWDM can transports tens to hundreds of wavelengths per fiber. Integrated Tb/s optical link on a single chip is ongoing 13
(Si) Photonics benefits over electronic Low operating costs Possibility to integrate more optical functionalities in a single component Low manufacturing cost Low power Consumption High Integration High Reliability Low heating of components Higher density of interconnects 19
Data rate Gb/s Doubling the Data Rate Every 2 Years 15
Intel 50Gb/s WDM link (A. Aldiuno et al, IPR 2010) 12.5 Gb/s x 4 channels = 50 Gb/s (Intel Lab.) Source: SemiconductorTODAY Compounds&AdvancedSilicon, Vol. 5, Issue 6 July/August 2010 16
Si-Photonics in computing system today Si-Photonics interposer Optical I/O s for chip-to-chip and chip-to-board links (IBM, Intel, Fujitsu) E-O-E transceivers for Opto-Silicon Interposer 17
Silicon waveguide Used on-chip Moderate loss, crossover issues Free space Use air Bunch of micro-mirrors and micro-lenses guide the light around On-chip use Hollow metal waveguide Used for slightly longer distances, at the board level Low loss, ease of fabrication Fiber optic cable Off-chip interconnect Channel technology 18
Si-Photonics building blocks Resonator Modulator Laser Source (input) Photodetectors N+ P+ Vm Main components Laser Source: Inject the required laser lights into waveguide Modulators: Modulate the laser lights to 0 and 1 states Photodetectors: Detect the laser lights and convert to electrical signal Turn Resonators: Control the routing direction of the laser lights 19
Si-Photonics building blocks A reversely biased p-i-n diode to eliminate the TPAinduced FCA Raman Silicon Laser Simulated Raman Scattering (SRS) On-chip: Vertical Cavity Surface Emitting Laser (VCSEL) One of the largest volume (and hence, cheapest) lasers currently in use Is often integrated on-chip Enables direct modulation ( You directly turn the laser ON/OFF in accordance with the data being transmitted ) Not fully CMOS compatible Does not support DWDM 20
Si-Photonics building blocks 5cm SOI nanowire 1.28Tb/s (32 l x 40Gb/s) Si Wire/Waveguide IBM/Columbia Germanium on SOI, Silicon on Insulator (to 3.6 μm), Silcon Sapphre (to 5.6 μm), Silicon on Nitride (to 6.7 μm) Silicon is transparent above 1100 nm Nearly all optical data links function at the near-infrared wavelength range between 800 nm and 1600 nm We operate at 1310 nm (Industry Standard) SOI wafers cost about 10 times as much as conventional wafers 21
Transmission over Si Wire/Waveguide Snell s Law of Refraction: sin 1 sin 2 n 2 n 1 v 1 v 2 reflected ray n 1 n 2 reflected ray n 1 n 2 refracted ray refracted ray 1 1 2 1 1 2 incident ray n 2 n 1 incident ray n 2 n 1 22
Total internal reflection in Si Wire/Waveguide reflected ray n 1 n 2 refracted ray Let 2 = /2: 1 1 incident ray n 2 n 1 2 Then For 1 > c, light ray is completely reflected. sin 1 n 2 n 1 c sin 1 n 2 n 1 Total internal reflection 23
Total internal reflection in Si Wire/Waveguide n cladding n core n cladding reflected ray n 1 n 2 refracted ray n cladding n core 1 1 2 Total internal reflection keeps all optical energy within the core, even if the fiber bends. incident ray n 2 n 1 core cladding image from Wikipedia 24
Si-Photonics building blocks Mach-Zehnder Interferometer (MZI) SOR: Intel Lab. Modulator Enables high-speed conversion from E to O signals. Encodes data on a single wavelength channel that is combined with other signals through WDM MRs are used for modulation due to their high modulation speed (10~20Gbps), low power (47fJ/bit) and small footprint (µm2) 25
Si-Photonics building blocks Photodetectors The same Microring used for modulation can be used as a wavelength selective filter (photodetectors) to extract light out of the waveguide, if the microring is doped with a photo-detecting material such as CMOScompatible germanium. The resonant light will be absorbed by the germanium and converted into an electrical signal. 26
What is needed for on-chip Si-Photonics interconnects? There is still a problem of scaling! 27
Processor is scaling to Man-core Processor Scaling to Man-core Are trending toward multi-core architectures with a growing number of cores -> require an increasingly efficient and low-power communications infrastructure to achieve the desired level of bandwidth & connectivity. Si-photonic NoCs provide an effective solution to the power and bandwidth limitations of existing E-NoCs used within CMPs 28
Processor is scaling to Many-core Processor Scaling to Many-core Are trending toward many-core architectures with a growing number of cores -> require an increasingly efficient and low-power communications infrastructure to achieve the desired level of bandwidth & connectivity. Si-photonic NoCs provide an effective solution to the power and bandwidth limitations of existing E-NoCs used within CMPs 29
Bandwidth, pin count and power scaling 1 Byte/Flop, 8 Flops/core @ 5GHz 41
What is needed for on-chip Si-Photonics interconnects? 31
Critical Specs Size Bandwidth Power consumption Switching speed Insertion loss Differential loss Crosstalk 32
Si Photonics on-chip communication C C C C Shared $ Shared $ Switch controller X C C C C Shared $ Shared $ C C C C C C C C Merit #1: High Bandwidth Can scale easily via WDM/DWDM (electronics only via bus width ) 33
Si Photonics on-chip communication C C C C Shared $ Shared $ Switch controller X C C C C Shared $ Shared $ C C C C C C C C Merit #2: Low power consumption 34
Si Photonics on-chip communication C C C C Shared $ Shared $ Switch controller X C C C C Shared $ Shared $ C C C C C C C C Merit #3: High Switching speed The goal is not communicate as fast as possible, but as fast as needed depending on the application (speed of light 299,792 km/s) 35
Si Photonics on-chip communication C C C C Shared $ Shared $ Switch controller X C C C C Shared $ Shared $ C C C C C C C C Merit #3: High Switching speed The goal is not communicate as fast as possible, but as fast as needed depending on the application (Normal or Burst types). 36
Landscape of SiP on-chip networks (PNoC) Mesh Mesh Crossbar Clos [Shacham 07] [Petracca 08] [Joshi 09a] [Pan 09] [Shacham 07] [Petracca 08] [1-21] 37
The basic PNoC building block in1 out1 in2 out1 in2 BAR state out2 in1 CROSS state out2 2x2 switch BAR state switch: data passes through CROSS state switch: data passes to opposite port Typical wavelength Range: 1260 ~ 1360 or 1510 ~ 1610 nm (Mechanical Switch) Problems: Lack of processing at bit level in optical domain Lack of efficient buffering in optical domain 38
The basic PNoC building block Just cascading 2x2 switch is not efficient and increases loses. 39
Agenda Motivation Optical Interconnect Prospects PHENIC Si-Photonics Network-on- Chip Technology Challenges Conclusion 40
PHENIC: Hybrid Si-Photonic NoC via size < ~ 2μm Benefits Higher integration Shorter interconnect (important for Short message mode) Heterogeneous integration Reliability Short message mode & Large/Burst mode 41
Routing in Hybrid Si-Photonic NoC D S 43
Routing in Hybrid Si-Photonic NoC 1.Reserve the path 2.ACK 3. Transmit data on the Photonic layer D 4.Release (tear-down) S 44
Electrical router and control OASIS-RV1 Chip Layout (45nm CMOS Process, 222.387 uw, 557 pins). Major tasks Photonic route computation (path setting) Route computation for short messages on the electronic later (network) Other control tasks for the photonic switch on the photonic layer (network) 58
Photonic wavelength switch Major tasks Photonic data transmission Optical data cannot be stored (no optical buffers!) No computation performed 59
Bandwidth, power and latency 47
Agenda Motivation Optical Interconnect Prospects Case Study: PHENIC Si-Photonics Network-on-Chip Technology Challenges Conclusion 48
Electronics integration Intel, 4004, 1971) Intel, core i7, 2011) 2300 transistors thousands of multiplications per sec 10 μm PMOS A billion transistors billions of multiplications per sec 32 nm CMOS 49
Photonics integration CMOS sensor array Intel s 50 Gb/s (4x12.5Gb/s) transceiver (2012). 1st Semiconductor Laser (~1962) Challenges Single Channel transmitter Wafer-scale fabrication is difficult Si does not support some functions Improvement of cost, space, power, reliability is needed Luxtera s photograph of CMOS 4x10Gb/s WDM die (2007) 50
E-O-E Transceivers (Tx/Rx) Multilayer option MULTI-CHIPS OPTION Challenges Single photonics platform (wafer-scale fabrication) Efficient E/O and O/E conversion CMOS-driven components 51
E-O-E Transceivers (Tx/Rx) Si Photonics Option MODULATORS Si-Photonics option LASERS DETECTORES MUX MUX PLC OPTICAL I/O s Features Small photonics component footprint CMOS compatible fabrication processes 3D connectivity to CMOS wafers for improved O-E performance 52
Compact of ON-chip optical wires/wiveguides Requirements Performance -> loss ~1dB/cm High density -> Bending radius ~1μm Challenges Meet low-loss despite Si sidewalls imperfection Realize efficient I/O (fiber) coupling despite large mode mismatch 53
Reliability Challenges & Vision Macro Solutions Redundant active/passive component (cores, routers etc.) Architecture Techniques Micro Solutions PBC ECC Moore s law: increasing the bit count exponentially: 2x every 2 years Comp. Param. Reconfiguration Circuit Techniques Cell creation Process Techniques State of the Art Processes Transient, intermittent, and permanent errors/faults are reliability challenges 54
Agenda Motivation Optical Interconnect Prospects Case Study: PHENIC Si-Photonics Network-on-Chip Technology Challenges Conclusion 55
Concluding remarks Computer system interconnects are very complex micro-communication components Most important metrics Bandwidth-density Energy-efficiency Si-Photonics design approach can improve system throughput by 15-20x Many issues should be carefully handled Optimize network design (electrical switching, optical transport) Optimize physical mapping (layout) for low optical insertion loss 56
References 1. Achraf Ben Ahmed, A. Ben Abdallah, PHENIC: Towards Photonic 3D-Network-on-Chip Architecture for High-throughput Many-core Systems-on-Chip, IEEE Proceedings of the 14th International conference on Sciences and Techniques of Automatic control and computer engineering (STA'2013), Dec. 2013. [DOI] 2. A. Ben Abdallah, PHENIC: Silicon Photonic 3D-Network-on-Chip Architecture for High-performance Heterogeneous Many-core System-on-Chip>PDF, Technical Report, Ref. PTR0901A0715-2013, September 1, 2013. 3. OASIS 3D-Router Hardware Physical Design, Technical Report, Adaptive Systems Laboratory, Division of Computer Engineering, University of Aizu, July 8, 2014. 4. Akram Ben Ahmed, A. Ben Abdallah, Graceful Deadlock-Free Fault-Tolerant Routing Algorithm for 3D Network-on-Chip Architectures, Journal of Parallel and Distributed Computing, 2014. [DOI] 5. Akram Ben Ahmed, Achraf Ben Ahmed, A. Ben Abdallah, Deadlock-Recovery Support for Fault-tolerant Routing Algorithms in 3D-NoC Architectures, IEEE Proceedings of the 7th International Symposium on Embedded Multicore/Many-core SoCs (MCSoC-13), pp., 2013. [DOI] 6. Akram Ben Ahmed, A. Ben Abdallah, Architecture and Design of High-throughput, Low-latency and Fault Tolerant Routing Algorithm for 3D-Network-on-Chip, The Jnl. of Supercomputing, December 2013, Volume 66, Issue 3, pp 1507-1532. [DOI] 7. Akram Ben Ahmed, T. Ouchi, S. Miura, A. Ben Abdallah, ''Run-Time Monitoring Mechanism for Efficient Design of Application-specific NoC Architectures in Multi/Manycore Era'', ''' IEEE Proc. of the 6th International Workshop on Engineering Parallel and Multicore Systems (epamus2013'), July 2013.''' [DOI] 8. Akram Ben Ahmed, T. Ouchi, S. Miura, A. Ben Abdallah, Run-Time Monitoring Mechanism for Efficient Design of Application-specific NoC Architectures in Multi/Manycore Era, Proc. IEEE 6th International Workshop on Engineering Parallel and Multicore Systems (epamus2013'), July 2013. 9. Akram Ben Ahmed, A. Ben Abdallah, ''Low-overhead Routing Algorithm for 3D Network-on-Chip'', '''IEEE Proc. of the The Third International Conference on Networking and Computing (ICNC'12), pp. 23-32, 2012.''' [DOI] 10. Akram Ben Ahmed, A. Ben Abdallah, ''LA-XYZ: Low Latency, High Throughput Look-Ahead Routing Algorithm for 3D Network-on-Chip (3D-NoC) Architecture'', '''IEEE Proceedings of the 6th International Symposium on Embedded Multicore SoCs (MCSoC-12), pp. 167-174, 2012. [DOI] 11. Akram Ben Ahmed, A. Ben Abdallah, ''ONoC-SPL Customized Network-on-Chip (NoC) Architecture and Prototyping for Data-intensive Computation Applications'', '''IEEE Proceedings of The 4th International Conference on Awareness Science and Technology, pp. 257-262, 2012. DOI 12. Kenichi Mori,A. Ben Abdallah, OASIS Network-on-Chip Prototyping on FPGA, Master's Thesis, The University of Aizu, Feb. 2012. [Thesis], [slides] 13. Ben Ahmed Akram, A. Ben Abdallah,[[On the Design of a 3D Network-on-Chip for Many-core SoC, Master's Thesis, The University of Aizu, Feb. 2012. [Thesis], [slides] 14. Shohei Miura, A. Ben Abdallah, Design of Parametrizable Network-on-Chip, '''Master's Thesis, The University of Aizu, Feb. 2012.''' 15. Ryuya Okada, A. Ben Abdallah, ''Architecture and Design of Core Network Interface for Distributed Routing in OASIS NoC'', '''Graduation Thesis, The University of Aizu, Feb. 2012.' 16. A. Ben Ahmed, A. Ben Abdallah, K. Kuroda, Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC, IEEE Proc. of the 5th International Conference on Broadband, Wireless Computing, Communication and Applications (BWCCA-2010), pp.67-73, Nov. 2010. (''best paper award'') 17. Kenichi Mori, A. Ben Abdallah, OASIS Network-on-Chip Prototyping on FPGA, Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2012 18. K. Mori, A. Esch, A. Ben Abdallah, K. Kuroda, Advanced Design Issues for OASIS Network-on-Chip Architecture, IEEE Proc. of the 5th International Conference on Broadband, Wireless Computing, Communication and Applications (BWCCA-2010),pp.74-79, Nov. 2010. 19. T. Uesaka, OASIS NoC Topology Optimization with Short-Path Link, Technical Report, Systems Architecture Group,March 2011. 20. K. Mori, A. Ben Abdallah, OASIS NoC Architecture Design in Verilog HDL, Technical Report,TR-062010-OASIS, Adaptive Systems Laboratory, the University of Aizu, June 2010. 21. Shohei Miura, Abderazek Ben Abdallah, Kenichi Kuroda, PNoC: Design and Preliminary Evaluation of a Parameterizable NoC for MCSoC Generation and Design Space Exploration, The 19th Intelligent System Symposium (FAN 2009), pp.314-317, Sep.2009. 22. Kenichi Mori, Abderazek Ben Abdallah, Kenichi Kuroda, ''Design and Evaluation of a Complexity Effective Network-on-Chip Architecture on FPGA'', The 19th Intelligent System Symposium (FAN 2009), pp.318-321, Sep. 2009. 23. A. Ben Abdallah, T. Yoshinaga and M. Sowa, Mathematical Model for Multiobjective Synthesis of NoC Architectures, IEEE Proc. of the 36th International Conference on Parallel Processing, Sept. 4-8, 2007, 57
References Multicore Systems-onchip: Practical Hardware/Software Design Issues Hardcover August 6, 2010 58
59
University of Aizu 60
Thank you. 61