PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

Similar documents
Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance

The Light at the End of the Wire. Dana Vantrease + HP Labs + Mikko Lipasti

MODELING AND EVALUATION OF CHIP-TO-CHIP SCALE SILICON PHOTONIC NETWORKS

Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics

Silicon photonics and memories

Transmission-Line-Based, Shared-Media On-Chip. Interconnects for Multi-Core Processors

Optical Bus for Intra and Inter-chip Optical Interconnects

Design Space Exploration of Optical Interfaces for Silicon Photonic Interconnects

A Nanophotonic Interconnect for High- Performance Many-Core Computation

TDM Photonic Network using Deposited Materials

- no emitters/amplifiers available. - complex process - no CMOS-compatible

Thermal Management of Manycore Systems with Silicon-Photonic Networks

ON THE WAY TO PHOTONIC INTERPOSERS, BUILDING BLOCKS FOR USR-OPTICAL COMMUNICATION. OPTICS Workshop DATE 2017 Yvain THONNART Mar.

mnoc: Large Nanophotonic Network-on-Chip Crossbars with Molecular Scale Devices

Silicon Photonics Photo-Detector Announcement. Mario Paniccia Intel Fellow Director, Photonics Technology Lab

PhoenixSim: A Simulator for Physical-Layer Analysis of Chip-Scale Photonic Interconnection Networks

Active Microring Based Tunable Optical Power Splitters

IBM T. J. Watson Research Center IBM Corporation

A 3.9 ns 8.9 mw 4 4 Silicon Photonic Switch Hybrid-Integrated with CMOS Driver

Multiband RF-Interconnect for Reconfigurable Network-on-Chip Communications UCLA

Silicon Photonics Technology Platform To Advance The Development Of Optical Interconnects

ON THE EXPLORATION OF NEXT-GENERATION INTERCONNECT DESIGN FOR CHIP MULTI-PROCESSORS

Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics

Integrated electro-optical waveguide based devices with liquid crystals on a silicon backplane

Silicon-Photonic Clos Networks for Global On-Chip Communication

A-WiNoC: Adaptive Wireless Network-on-Chip Architecture for Chip Multiprocessors

CHAMELEON: CHANNEL Efficient Optical Network-on-Chip

1 Introduction. Research article

NEXT GENERATION SILICON PHOTONICS FOR COMPUTING AND COMMUNICATION PHILIPPE ABSIL

Offline Optimization of Wavelength Allocation and Laser to Deal with Energy-Performance Tradeoffs in Nanophotonic Interconnects

Performance and Energy Trade-offs for 3D IC NoC Interconnects and Architectures

MICRO RING MODULATOR. Dae-hyun Kwon. High-speed circuits and Systems Laboratory

Addressing System-Level Trimming Issues in On-Chip Nanophotonic Networks

Mitigation of Mode Partition Noise in Quantum-dash Fabry-Perot Mode-locked Lasers using Manchester Encoding

Optical Local Area Networking

In Search of the Elusive All-Optical Packet Buffer

Innovative ultra-broadband ubiquitous Wireless communications through terahertz transceivers ibrow

OPTICAL NETWORKS. Building Blocks. A. Gençata İTÜ, Dept. Computer Engineering 2005

TOWARDS RELIABLE NANOPHOTONIC INTERCONNECTION NETWORK DESIGNS. by Yi Xu B.S., Nanjing University, 2004 M.S., Nanjing University, 2007

A high-speed, tunable silicon photonic ring modulator integrated with ultra-efficient active wavelength control

Physical Layer Analysis and Modeling of Silicon Photonic WDM Bus Architectures

Customized Computing for Power Efficiency. There are Many Options to Improve Performance

Impact of High-Speed Modulation on the Scalability of Silicon Photonic Interconnects

S-band gain-clamped grating-based erbiumdoped fiber amplifier by forward optical feedback technique

OTemp: Optical Thermal Effect Modeling Platform User Manual

A Fully Integrated 20 Gb/s Optoelectronic Transceiver Implemented in a Standard

Challenges for On-chip Optical Interconnect

Photonic Integrated Beamformer for Broadband Radio Astronomy

Microphotonics Readiness for Commercial CMOS Manufacturing. Marco Romagnoli

Silicon Optical Modulator

Electronic-Photonic ICs for Low Cost and Scalable Datacenter Solutions

Si photonics for the Zettabyte Era. Marco Romagnoli. CNIT & TeCIP - Scuola Superiore Sant Anna

On-chip Networks in Multi-core era

Bidirectional Transmission in an Optical Network on Chip With Bus and Ring Topologies

OPTICAL I/O RESEARCH PROGRAM AT IMEC

Interconnect-Power Dissipation in a Microprocessor

Jason Cong, Glenn Reinman.

EPIC: The Convergence of Electronics & Photonics

On the Area and Energy Scalability of Wireless Network-on-Chip: A Model-based Benchmarked Design Space Exploration

Heinrich-Hertz-Institut Berlin

High-Performance, Scalable Optical Network-On- Chip Architectures

Power-Efficient Calibration and Reconfiguration for On-Chip Optical Communication

ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Si CMOS Technical Working Group

Addressing Link-Level Design Tradeoffs for Integrated Photonic Interconnects

The Past, Present, and Future of Silicon Photonics

Opportunities and challenges of silicon photonics based System-In-Package

Scaling Silicon *anophotonic Interconnects

Numerical Analysis and Optimization of a Multi-Mode Interference Polarization Beam Splitter

On-Chip Optical Interconnects: Prospects and Challenges

Progress Towards Computer-Aided Design For Complex Photonic Integrated Circuits

Electrons Prohibited

RF Interconnects for Communications On-chip*

A WDM passive optical network enabling multicasting with color-free ONUs

A tunable Si CMOS photonic multiplexer/de-multiplexer

Convergence Challenges of Photonics with Electronics

Compact two-mode (de)multiplexer based on symmetric Y-junction and Multimode interference waveguides

Awaited Emerging Optical Components for All-Optical Ultra-Dense WDM-Networks

An Example Design using the Analog Photonics Component Library. 3/21/2017 Benjamin Moss

A single source microwave photonic filter using a novel single-mode fiber to multimode fiber coupling technique

A HIGH SPEED WDM PON FOR DOWNSTREAM DPSK ASK SIGNALS AND UPSTREAM OOK SIGNAL WITH BROADCAST CAPABILTY

WWDM Transceiver Module for 10-Gb/s Ethernet

Photonics Integration and Evolution of the Optical Transceiver Presented by: Giacomo Losio ProLabs

A Comparison of Optical Modulator Structures Using a Matrix Simulation Approach

Silicon Nanophotonics for Many-Core On-Chip Networks

New silicon photonics technology delivers faster data traffic in data centers

Index. Cambridge University Press Silicon Photonics Design Lukas Chrostowski and Michael Hochberg. Index.

Graphene electro-optic modulator with 30 GHz bandwidth

Heterogeneously Integrated Microwave Signal Generators with Narrow- Linewidth Lasers

Cascaded active silicon microresonator array cross-connect circuits for WDM networks-on-chip (invited)

Towards Energy-Propor1onal Op1cal Interconnects

Optical Integrated Devices in Silicon On Insulator for VLSI Photonics

Benjamin G. Lee, Member, IEEE, Aleksandr Biberman, Student Member, IEEE, Johnnie Chan, Student Member, IEEE, and Keren Bergman, Fellow, IEEE

Network Energy Performance of 5G Systems. Dr. Ylva Jading Senior Specialist Ericsson Research

A 1.7-to-2.2GHz Full-Duplex Transceiver System with >50dB Self-Interference Cancellation over 42MHz Bandwidth

Photonic Integrated Circuit for Radio-Frequency Interference Cancellation

APSUNY PDK: Overview and Future Trends

inemi OPTOELECTRONICS ROADMAP FOR 2004 Dr. Laura J. Turbini University of Toronto SMTA International September 26, 2005

Photonic time-stretching of 102 GHz millimeter waves using 1.55 µm nonlinear optic polymer EO modulators

Transcription:

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs Li Zhou and Avinash Kodi Technologies for Emerging Computer Architecture Laboratory (TEAL) School of Electrical Engineering and Computer Science Ohio University, Athens OH, USA 7 th International Symposium on Networks-on-Chip (NOCS), April 2-24, 203 Contact Website: http://oucsace.cs.ohiou.edu/~avinashk/

Multicores & Networks-on-Chip TILE-Gx72 [] 80-core Intel TeraFlops [2] 2880-core KEPLER (Nvidia) [3] With increasing number of cores, communication-centric design paradigm is becoming important (Networks-on-Chip) Energy for communication is increasing Delivered throughput is decreasing [] http://www.tilera.com/products/processors/tile-gx_family [2] http://www.intel.com/pressroom/kits/teraflops/ [3] http://www.nvidia.com/object/nvidia-kepler.html NOCS-3 TEAL 2

Energy Discrepancy & Throughput Energy discrepancy between computation and global communication with technology scaling Need to reduce global communication energy Relative.2 0.8 0.6 0.4 0.2 0 Compute Energy Interconnect Energy 45 32 22 4 0 7 Technology (nm) Source: Shekar Borkar, Intel Reduced throughput due to aggressive voltage and clock scaling On-die energy: Interconnect Compute Tile Power: Intel Tera-Flops (65 nm) [] Need to provide scalable bandwidth without sacrificing performance Potential Solutions: Nanophotonics, Wireless, 3D Stacking [] Y. Hoskote, A 5-GHz Mesh Interconnect for A Teraflops Processor, IEEE Computer Society, 2007 pp. 5-6 NOCS-3 TEAL 3

Why Photonics? Photonics provides Low energy (7.9 fj/bit) Small footprint (~2.5 μm) High bandwidth (~40 Gbps) Low latency (0.45 ps/mm) CMOS compatible. L. Xu, W. Zhang, Q. Li, J. Chan, H. L. R. Lira, M. Lipson, K. Bergman, 40-Gb/s DPSK Data Transmission Through a Silicon Microring Switch," IEEE Photonics Technology Letters 24. 2. S. Manipatruni, K. Preston, L. Chen, and M. Lipson, Ultra-low voltage, ultra-small mode volume silicon microring modulator, Opt. Express 8, 8235-8242 (200) NOCS-3 TEAL 4

Nanophotonic Link Buffer Chain Photodetector TIA Limiting Amplifier Driver for Electronics Micro-ring resonator T x T x T x T x R x R x R x R x λ λ 2 λ 3 λ 4 λ λ 2 λ 3 λ 4 Off-Chip Laser Core A Core B Laser power Compensates for a variety of light losses along its path Trimming power Microring resonators are sensitive to temperature variations. They require additional trimming power to maintain their resonant wavelength NOCS-3 TEAL 5

Power Breakdown Static Power Challenge 00% 80% 60% 40% 20% Laser Trimming Power Others (routing, O/E, E/O conversion) More than 60% of total power budget! 0% Radix-32 SWMR Corona Flexishare The off-chip laser source and on-chip microring resonators trimming power represent the majority of network power NOCS-3 TEAL 6

PROBE: Targeting on the static power (Preview) Key goal Save significant static optical power while meeting performance constraints Hardware mechanisms Tunable splitters -> adaptive channels Binary-tree based waveguide Global and local bandwidth controllers Approach Traffic load prediction Dynamic bandwidth scaling on the background Three pre-defined bandwidth modes Main results Static power savings more than 60%, with % penalty on throughput and 20% on execution time. NOCS-3 TEAL 7

Outline Introduction & Motivation PROBE Architecture & Implementation Traffic Prediction Dynamic Bandwidth Scaling Performance Analysis Conclusions & Future Work NOCS-3 TEAL 8

Architecture (/2) Tile 0 0 4 5 L 0 R 0 R R 2 R 3 L L 4 L 5 R 4 R 5 R 6 R 7 L 2 L 3 L 6 L 7 2 3 6 7 8 9 2 3 L 8 L 9 L 2 L 3 R 8 R 9 R 0 R R 2 R 3 R 4 R 5 L 0 L L 4 L 5 0 4 5 R: router, L: laser, : voltage regulator NOCS-3 TEAL 9

Splitter Key component essential components for signal distribution in optical networks splits a signal from a single waveguide into a large number of waveguides Passive splitter Fixed power ratio Power inefficient Tunable splitter [2] Tunable power ratio More flexibility Tuning Range: 0~99% Tuning speed: 6ns Power loss 0.2~0.8dB CMOS compatible (0.9V, 5~40μm) [] Dest. Dest. 2 Dest. 3 Dest. 4 [] M. Olivero and M. Svalgaard, UV-written Integrated Optical xn Splitter, Optics Express, Vol. 4 Issue, pp.62-70 (2006) [2] R. Thapliya, T. Kikuchi, and S. Nakamura, Tunable Power Splitter Based on An Electro-optic Multimode Interference Device, Journal of Applied optics, vol. 46, no. 9, 2007. NOCS-3 TEAL 0

Channel Design - Prototype [] Optical Signal (-α )(-e ) (-α )(- α 2 )(-e )(-e 2 ) 2 2 =/4 =/3 =/2 α: power ratio e: the access optical power loss β: power portion in that branch β = α (-e ) β 2 = α 2 (- α )(-e )(-e 2 ) 3 3 β 3 = α 3 (- α )(- α 2 )(-e )(-e 2 )(-e 3 ) Branch 4 β 4 = (- α )(- α 2 )(-α 3 )(-e )(-e 2 )(-e 3 ) e =e 2 =e 3 [] B. Z. Fu, Y. H. Han, H. W. Li, and X. W. Li, Accelerating Lightpath Setup Via Broadcasting in Binary-Tree Waveguide in Optical NoCs, In Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pp. 933-936, 200. NOCS-3 TEAL

Channel Design - Four Power State (/2) Pstate Pstate 2 /3 /2 2 2 3 3 Branch 4 β 4 =(/4)(-e) 3 Bw=.28Tb/s Branch 4 β 4 0 Bw=960Gb/s NOCS-3 TEAL 2

Channel Design - Four Power State (2/2) Pstate BW (Tb/s) α α2 α3 power loss (db).28 /4 /3 /2 0.49 2 0.96 /3 /2 0.39 3 0.64 /2 NA 0.30 4 0.32 NA NA 0.2 Pstate Pstate 2 Pstate 3 Pstate 4 2 2 2 2 3 3 3 3 Branch 4 β 4 =(/4)(-e) 3 Branch 4 β 4 0 Branch 4 β 4 0 Branch 4 β 4 0 Bw=.28Tb/s Bw=960Gb/s Bw=640Gb/s Bw=320Gb/s NOCS-3 TEAL 3

Waveguide Design Three-level binary-tree-based waveguide α (2,) 2 Channel To R Laser 0 α (,) α (2,2) Level direction Level 2 channel.............. Level 3 branch Channels To R 4, R 8, R 2 Channel 2 To R2 Channel 3 To R3 X direction Y direction NOCS-3 TEAL 4

Traffic Prediction (/2) Traffic indicators Link and buffer utilization [] First predictor - for low traffic variation Second predictor - for high traffic variation Based on the prior work which is inspired by history-based branch predictor and the observation of repetitive behavior of real traffic [2] [] X. Chen, L-S. Peh, G-Y. Wei, Y-K. Huang, and P. Prucnal, Exploring the Design Space of Power-Ware Opto-electronic Network Systems, International Symposium on High-Performance Computer Architecture (HPCA), pp. 20-3, 2005. [2] Y. S-C. Huang, K. C-K. Chou, C-T King, Application-Driven End-to-End Traffic Predictions for Low Power NoC Design, In IEEE Transactions on Very Large Scale Integration System, pp. -0, 202. NOCS-3 TEAL 5

Traffic Prediction (2/2) Second predictor - History based Channel #. H5 H4 H3 H2 H H0 0(i, x, 0) 5 5 3 5 4 (i, x, ) 2 2 0 2 2 0 2(i, x, 2) 2 4 2 4 4 3(i, y, 0) 3 5 4 3 Link level HTPT: History traffic pattern table H5~H: History traffic pattern, H0: current link utilization 5 3 5 4 4 2 3 4 5 Link Util 0.0~0.2 0.2~0.4 0.4~0.6 0.6~0.8 0.8~.0 P: predicted traffic load Tag Index P LRU 2 424 2 5 242 3 2 4 0 5354 3 2 0 3544 2 6 PT: Prediction table NOCS-3 TEAL 6

Dynamic Bandwidth Scaling (/3) Prediction Rw Lu, Bu Predict Rw Reconfiguration windows, set to 000 cycles in the simulation. Link and buffer utilization are gathered at each output port. Predict the resource utilization based on the traffic fluctuation. NOCS-3 TEAL 7

Dynamic Bandwidth Scaling (2/3) Prediction Rw Lu, Bu Predict Decision Bw Three modes Compare the predicted link utilization <-> pre-defined bandwidth. Performance mode (0.2 ~ 0.4), Balanced mode (0.4 ~ 0.6), Poweraware mode (0.6 ~ 0.8) Increase the bandwidth if over the upper bound, decrease if lower than the lower bound. Check the buffer utilization. NOCS-3 TEAL 8

Dynamic Bandwidth Scaling (3/3) Prediction Rw Lu, Bu Predict Decision Bw Three modes Tuning Lasers Microrings Calculate the splitter power ratios, and required laser power Tune the lasers, the splitters, and the on-chip microrings Delay is critical! Off-chip communication Tuning NOCS-3 TEAL 9

Outline Introduction & Motivation PROBE Architecture & Implementation Traffic Prediction Dynamic Bandwidth Scaling Performance Analysis Conclusions & Future Work NOCS-3 TEAL 20

Methodology 64-core system 5 GHz processor 64KB private L and 4MB per tile shared L2 caches, 4 GB DRAM, 60 cycle access latency, 6 on-chip DRAM controllers Detailed Networks-on-Chip Model Cycle-accurate simulator based on Booksim Virtual channel flow control (2 VCs, 6 flits buffer depth) 256 bits channel width Performance Analysis Latency, throughput, execution time, optical power Benchmarks SPLASH-2, PARSEC, and SPEC CPU 2006 traces Synthetic traffic pattern NOCS-3 TEAL 2

Latency (# of cycles) Latency (# of cycles) Load / Latency Curve 60 50 40 Uniform Without PROBE Power-aware Mode Balanced Mode Performance Mode PROBE 60 50 40 Bit Complement Without PROBE Power-aware Mode Balanced Mode Performance Mode PROBE 30 Fluctuation 30 20 20 0 0 % 0 0. 0.2 0.3 0.4 0.5 0.6 Injection rate (flit/node/cycle) 0 0 4.8% 0 0.05 0. 0.5 Injection rate (flit/node/cycle) Power-aware Mode Fluctuation: link utilizations go back and forth over the boundary Throughput: at most % penalty compared to the baseline Balanced Mode and Performance Mode are approaching to the baseline and have different closing points. NOCS-3 TEAL 22

optical power consumption Latency (# of cycles) Latency vs. Optical Power (/2) 60 50 40 30 20 0 0.2 0.8 0.6 0.4 0.2 Uniform Without PROBE Power-aware Mode Balanced Mode Performance Mode PROBE % 0 0. 0.2 0.3 0.4 0.5 0.6 Injection rate (flit/node/cycle) 25% 75% Critical point (injection rate) 0.05 (Three modes) 0.23 (Perf. Mode) 0.45 (Balanced Mode) Optical power saving 25% optical power saving due to % throughput loss (Poweraware Mode) Save ~75% optical power at low network load (Three modes) 0 NOCS-3 TEAL 23

optical power consumption optical power consumption Latency (# of cycles) Latency (# of cycles) Latency vs. Optical Power (2/2) 60 50 40 Bit Complement Without PROBE Power-aware Mode Balanced Mode Performance Mode PROBE 60 50 40 Transpose Without PROBE Power-aware Mode Balanced Mode Performance Mode PROBE 30 30 20 20 0 0.2 0.8 0 4.8% 0 4.7% 0 0.05 0. 0.5.2 0 0.05 0. Injection rate (flit/node/cycle) Injection rate (flit/node/cycle) 0.5 0.8 0.6 0.4 0.2 50% 75% 0.6 0.4 0.2 57% 75% 0 NOCS-3 TEAL 24 0

Normalized execution time Real Traffic Traces Exec. Time.6.4.2 0.8 0.6 0.4 0.2 0 Without PROBE Performance Mode Balanced Mode Power-aware PROBE Performance Mode: close to the baseline Balanced Mode: % penalty on average Power-aware Mode: 25% penalty on average NOCS-3 TEAL 25

optical power consumption Real Traffic Traces Optical Power.2 Without PROBE Performance Mode Balanced Mode Power-aware PROBE 0.8 0.6 0.4 0.2 0 Performance Mode: 59% more optical power saving Balanced Mode: 70% optical power saving on average Power-aware Mode: 72% optical power saving on average NOCS-3 TEAL 26

Conclusions The photonic interconnect design is boosted by the evolution of optical devices. PROBE is an energy-efficient solution to reduce the high static power consumption in photonic networks. PROBE further improves the on-chip resource utilization. NOCS-3 TEAL 27

Questions? THANK YOU! NOCS-3 TEAL 28