Transmission-Line-Based, Shared-Media On-Chip. Interconnects for Multi-Core Processors

Similar documents
A 1.6-to-3.2/4.8 GHz Dual Modulus Injection-Locked Frequency Multiplier in

To learn fundamentals of high speed I/O link equalization techniques.

Challenges for On-chip Optical Interconnect

A 10-Gb/s Multiphase Clock and Data Recovery Circuit with a Rotational Bang-Bang Phase Detector

Source Coding and Pre-emphasis for Double-Edged Pulse width Modulation Serial Communication

ECEN 720 High-Speed Links Circuits and Systems

LSI and Circuit Technologies for the SX-8 Supercomputer

ECEN 720 High-Speed Links: Circuits and Systems

A Low Power Single Phase Clock Distribution Multiband Network

A 10Gbps Analog Adaptive Equalizer and Pulse Shaping Circuit for Backplane Interface

5Gbps Serial Link Transmitter with Pre-emphasis

PROBE: Prediction-based Optical Bandwidth Scaling for Energy-efficient NoCs

A10-Gb/slow-power adaptive continuous-time linear equalizer using asynchronous under-sampling histogram

A 5-Gb/s 156-mW Transceiver with FFE/Analog Equalizer in 90-nm CMOS Technology Wang Xinghua a, Wang Zhengchen b, Gui Xiaoyan c,

Minimizing Coupling of Power Supply Noise Between Digital and RF Circuit Blocks in Mixed Signal Systems

A 0.18µm CMOS Gb/s Digitally Controlled Adaptive Line Equalizer with Feed-Forward Swing Control for Backplane Serial Link

High-Speed Interconnect Technology for Servers

Multiband RF-Interconnect for Reconfigurable Network-on-Chip Communications UCLA

Design of Low Power Reduced Area Cyclic DAC

Jason Cong, Glenn Reinman.

A Miniaturized Multi-Channel TR Module Design Based on Silicon Substrate

RF Interconnects for Communications On-chip*

MS Diploma and Semester Projects offered at the Microelectronic Systems Laboratory during the winter

4-Bit Ka Band SiGe BiCMOS Digital Step Attenuator

Optical Bus for Intra and Inter-chip Optical Interconnects

A 2-byte Parallel 1.25 Gb/s Interconnect I/O Interface with Self-configurable Link and Plesiochronous Clocking

A Variable-Frequency Parallel I/O Interface with Adaptive Power Supply Regulation

Ultra-high-speed Interconnect Technology for Processor Communication

Energy Efficient Transmitters for Future Wireless Applications

Low Jitter, Low Emission Timing Solutions For High Speed Digital Systems. A Design Methodology

Dedication. To Mum and Dad

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

REDUCING power consumption and enhancing energy

Application Note AN-13 Copyright October, 2002

on-chip Design for LAr Front-end Readout

Gigahertz SiGe BiCMOS FPGAs with new architecture and novel power management techniques

Low Power Design of Successive Approximation Registers

High Rejection BPF for WiMAX Applications from Silicon Integrated Passive Device Technology

CMOS 120 GHz Phase-Locked Loops Based on Two Different VCO Topologies

TIMING recovery (TR) is one of the most challenging receiver

SV2C 28 Gbps, 8 Lane SerDes Tester

A 5-8 Gb/s Low-Power Transmitter with 2-Tap Pre-Emphasis Based on Toggling Serialization

OFDM based High Data Rate, Fading Resilient Transceiver for Wireless Networks-on-Chip

Digital Step Attenuators offer Precision and Linearity

An 8-Gb/s Inductorless Adaptive Passive Equalizer in µm CMOS Technology

MICTOR. High-Speed Stacking Connector

A LOW POWER SINGLE PHASE CLOCK DISTRIBUTION USING 4/5 PRESCALER TECHNIQUE

A SiGe 6 Modulus Prescaler for a 60 GHz Frequency Synthesizer

Design and Analysis of High Gain Differential Amplifier Using Various Topologies

LSI and Circuit Technologies of the SX-9

A 0.18µm SiGe BiCMOS Receiver and Transmitter Chipset for SONET OC-768 Transmission Systems

Wavedancer A new ultra low power ISM band transceiver RFIC

Aerospace Structure Health Monitoring using Wireless Sensors Network

1P6M 0.18-µm Low Power CMOS Ring Oscillator for Radio Frequency Applications

Application of PC Vias to Configurable RF Circuits

Progress In Electromagnetics Research Letters, Vol. 23, , 2011

A 14-bit 2.5 GS/s DAC based on Multi-Clock Synchronization. Hegang Hou*, Zongmin Wang, Ying Kong, Xinmang Peng, Haitao Guan, Jinhao Wang, Yan Ren

Optimization of energy consumption in a NOC link by using novel data encoding technique

Downloaded from edlib.asdf.res.in

CMOS LNA Design for Ultra Wide Band - Review

Design and Performance Analysis of Low Power RF Operational Amplifier using CMOS and BiCMOS Technology

High-Speed Circuits and Systems Laboratory B.M.Yu. High-Speed Circuits and Systems Lab.

DESIGN OF MULTIPLYING DELAY LOCKED LOOP FOR DIFFERENT MULTIPLYING FACTORS

ISSCC 2006 / SESSION 10 / mm-wave AND BEYOND / 10.1

Broadband Beamforming of Terahertz Pulses with a Single-Chip 4 2 Array in Silicon

ALTHOUGH zero-if and low-if architectures have been

6-Bit Charge Scaling DAC and SAR ADC

On Chip High Speed Interconnects: Trade offs in Passive Compensation

UNIT-II LOW POWER VLSI DESIGN APPROACHES

An Optimal Design of Ring Oscillator and Differential LC using 45 nm CMOS Technology

A Novel Low Power Optimization for On-Chip Interconnection

The Light at the End of the Wire. Dana Vantrease + HP Labs + Mikko Lipasti

DESIGN AND VERIFICATION OF ANALOG PHASE LOCKED LOOP CIRCUIT

A Low Power Single Ended Inductorless Wideband CMOS LNA with G m Enhancement and Noise Cancellation

HIGH-SPEED LOW-POWER ON-CHIP GLOBAL SIGNALING DESIGN OVERVIEW. Xi Chen, John Wilson, John Poulton, Rizwan Bashirullah, Tom Gray

Compact Triple-Band Monopole Antenna with Inverted-L Slots and SRR for WLAN/WiMAX Applications

A Fully Integrated 20 Gb/s Optoelectronic Transceiver Implemented in a Standard

WITH the growth of data communication in internet, high

Low-power 2.5 Gbps VCSEL driver in 0.5 µm CMOS technology

Technical challenges for high-frequency wireless communication

An Example Design using the Analog Photonics Component Library. 3/21/2017 Benjamin Moss

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

A Switched-Capacitor Band-Pass Biquad Filter Using a Simple Quasi-unity Gain Amplifier

Session 3. CMOS RF IC Design Principles

A Comparative Study of Dynamic Latch Comparator

Extraction of Transmission Line Parameters and Effect of Conductive Substrates on their Characteristics

An Efficient D-Flip Flop using Current Mode Signaling Scheme

Optical Interconnection and Clocking for Electronic Chips

Parallel vs. Serial Inter-plane communication using TSVs

Using ICEM Model Expert to Predict TC1796 Conducted Emission

DESIGN OF A 500MHZ, 4-BIT LOW POWER ADC FOR UWB APPLICATION

Optical Local Area Networking

Ultra Wideband Amplifier Senior Project Proposal

COMPACT WIDE-SLOT TRI-BAND ANTENNA FOR WLAN/WIMAX APPLICATIONS

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication

Another way to implement a folding ADC

Design of a Broadband HEMT Mixer for UWB Applications

Advanced Transmission Lines. Transmission Line 1

Development of a 20 GS/s Sampling Chip in 130nm CMOS Technology

/$ IEEE

Transcription:

Design for MOSIS Educational Program (Research) Transmission-Line-Based, Shared-Media On-Chip Interconnects for Multi-Core Processors Prepared by: Professor Hui Wu, Jianyun Hu, Berkehan Ciftcioglu, Jie Xu, Shang Wang Institution: Department of Electrical and Computer Engineering, University of Rochester Date of Submission: April 11, 2011

Project description With the number of cores increases, the interconnection between the cores in a multi-core processor becomes increasingly critical to its performance and energy efficiency. Packet-switched interconnect, which has been proposed to replace conventional buses, offers many advantages such as bandwidth scalability and modularity. However, it requires routers that consist of complex circuits, occupy significant chip area and consume significant power [1]. In addition, repeated packet relaying adds latency to communication, which may degrades the performance significantly [2]. In this project, we propose to explore an alternative solution for on-chip interconnect in future multi-core processors, namely, transmission-line based shared-media interconnects [3], which potentially can provide both large bandwidth (in tens of Tbps) and extremely low latency. A Core Logic Transmission lines B Fig.1. A multi-core processor with transmission-line-based shared-medium on-chip interconnect. A transmission line allows speed-of-light signal propagation, which translates into extremely low latency [4], and large bandwidth, potentially providing sufficient throughput such that packet switching can be avoided [3]. We propose to use transmission lines as a shared communication medium, which is similar to a bus in terms of networking but significantly different in terms of circuit implementation, signaling, and performance. As shown in Fig. 1, the global interconnect based on this technology connects all cores in a processor, and is entirely made of a multi-drop transmission line system. Each core has multiple transceivers transmitting and receiving multi-bit data through the shared transmission lines. It is worth noting that the shared medium approach enables both point-to-point communications and broadcasting. Because of the simple bus topology, packet switching is avoided, leading to good energy efficiency and low latency. The throughput comparable to a packet-switching network can be achieved by operating the interconnect at higher data rate and by utilizing the low-latency interconnect more efficiently on the architectural level [3]. In Fig.1, for example, 16 transmission

lines are used in the global interconnect. Each transmission line operates at a data rate of 62.5 Gbps, and the whole interconnect system can achieve a total throughput of 1 Tbps. The high data rate can be achieved by using a fast communication clock frequency (40 GHz) and an M-ary coding (M=4), thanks to the good signal integrity of transmission lines. The transmission lines can achieve a low latency of 6 ps/mm, which leads to a maximum latency of less than 600 ps for communication between the communication nodes at two ends of the transmission lines (Node A and B in Fig. 1) in a 2.3-cm by 2.3-cm chip. That translates into 3 computing cycles for a 5-GHz processor, half of the time in a packet-switching mesh network. To demonstrate this new interconnect, we propose to design and fabricate a test chip through MOSIS Education Program (Research) using IBM 130 nm SiGe BiCMOS (8HP). The high-performance transistors and metals in this process would be sufficient for the high-performance transmission lines and high-speed transceiver circuits, without committing to a cutting-edge CMOS technology. The test chip will mainly consist of a prototype processor with four nodes connected by a transmission-line based global interconnect. As shown in Fig. 1, each core has multi-bit high-speed transceivers, which includes multiple sets of transmitter and receiver. Implemented with high-speed circuit techniques, each transceiver delivers multi-bit data over shared transmission lines. What differentiate this interconnect from conventional buses is the design of the transmission line system. Properly designed on-chip transmission lines allow low-loss, low-latency propagation of signals, exhibit small dispersion (i.e., large bandwidth), and generate less crosstalk. Note that this is different from transmission line design in microwave circuits due to the large number of transmission lines running in parallel within limited chip area. In other words, bandwidth density is the key performance target, and crosstalk likely poses the greatest design challenge. Based on our prior research on wideband on-chip transmission lines [5], we will investigate several new transmission line structures for interconnect applications, such as multilayer coplanar waveguides and strips, and optimize the design through both electromagnetic (EM) simulations and system analysis to satisfy all the requirements. SER Driver Core Logic DES Latched Sampler Amp CLK ILO Clock Multiplier Equally important is the design of the high-speed transmitter and receiver electronics. As shown in Fig. 2, Fig.2 Schematic of the electronics in a core dedicated for the proposed transmission-line-based interconnects.

the transmitter consists of a serializer (SER) to convert the low-speed parallel data into high-speed serial data, and a wideband driver to drive the transmission line. The receiver includes a wideband amplifier to amplify the received data, a latch sampler to sample the data and generate the large-swing output, and a deserializer (DES) to convert the received high-speed serial data into low-speed parallel data. We will take advantage of our prior research in ultra-wideband impulse radio circuits in the transceiver design [6]. A high-frequency ILO-based clock multiplier with phase tuning capability [7] to generate the required high-frequency communication clock and the optimum phase for the latched sampler. A dedicated clock distribution transmission line provides the reference clock to all the transceivers [8]. In addition, several stand-alone test circuits will be included in the test chip, such as the ILO-based clock multiplier, the transmitter driver, the receiver amplifier, and the latched sampler. Reference: [1] S. Mukherjee et al., The Alpha 21364 Network Architecture, IEEE Micro, 22(1):26-35, Jan./Feb. 2002. [2] N. Muralimanohar et al., Interconnect Design Considerations for Large NUCA Cashes, In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 369-380, Jun. 2007. [3] A. Carpenter, J. Hu, J. Xu, M. Huang, and H. Wu. A Case for Globally Shared-Medium On-Chip Interconnect, to appear in Proceedings of the International Symposium on Computer Architecture (ISCA), June 2011. [4] R.T. Chang et al., Near Speed-of-Light Signaling Over On-Chip Electrical Interconnects, IEEE Journal of Solid-State Circuits, 38(5):834-838, May 2003. [5] Y. Zhu, S. Wang, and H. Wu, "Multilayer Coplanar Waveguide Transmission Lines Compatible with Standard Digital Silicon Technologies", Int'l Microwave Symposium (IMS), TH2G-05, June 2007. [6] Y. Zhu, J. Zuegel, J. Marciante, and H. Wu, "Distributed Waveform Generator: A New Circuit Technique for Ultra-Wideband Pulse Generation, Shaping and Modulation," IEEE Journal of Solid-State Circuits, 44(3):808-823, March 2009. [7]L. Zhang, D. Karasiewicz, B. Cifctioglu, and Hui Wu, "A 1.6-to-3.2/4.8 GHz Dual-Modulus Injection-Locked Frequency Multiplier in 0.18um Digital CMOS," 2008 IEEE Radio-Frequency Integrated Circuits (RFIC) Symposium, pp.427-430, June 2008. [8] L. Zhang, A. Carpenter, B. Ciftcioglu, A. Garg, M. Huang, and H. Wu, "Injection-Locked Clocking: A Low-Power Clock Distribution Scheme for High-Performance Microprocessors," IEEE Transaction on Very Large Scale Integrated Circuit (TVLSI), 16(9):1251-1256, Sep. 2008.

Fabrication process: IBM 130 nm SiGe BiCMOS (8HP). Packaging requirements: No. We will wire-bond the chip to a custom substrate for testing purpose. Estimated project size (length and width): Length: 4mm; Width: 4mm Simulation plans: The simulation will be performed in several domains using several industrial standard EDA tools. The transmission line will be simulated and optimized in a full-wave planar electromagnetic simulator, Sonnet. We will try to achieve lower attenuation, larger bandwidth, less crosstalk transmission lines with smaller area. The transmitter and receiver will be simulated in Advanced Design System (ADS) and Cadence Spectre in both time-domain and frequency-domain. Each individual building block will be simulated first. The design goals are high speed, low noise and low power consumption. The ILO will be optimized for better jitter performance, larger locking range and phase tuning range. Then the whole on-chip interconnect system will be simulated in ADS and Spectre. After a single transmission-line design is done, the case with multiple transmission-lions will be simulated, focusing on the crosstalk performance. Test and characterization plans: The prototype will be characterized in both time and frequency domain. Before the whole system characterization, each building block will be characterized first using the stand-alone test circuits. The transceiver circuits gain, bandwidth and noise figure will be measured, and the transmission line s loss, dispersion, and impedance will be characterized. We will also characterize the working range, locking range, phase noise and phase tuning capability of the ILO-based clock multiplier. For the whole system characterization, the reference clock will be provided by a signal generator to the prototype and fed to all the nodes through the transmission-line-based clock distribution network. An on-chip high-speed pseudorandom binary sequence (PRBS) generator in the transmitter will generate the data pattern for the test purpose, the output of the latched sampler in the receiver will be observed through the oscilloscope. Initially, only one data link will be activated. We will measure the highest achievable data rate, power consumption, eye diagram, latency, and bit error rate for each time. Then all data links will be used to transmit data, and we will characterize the crosstalk performance of the system.