Photo-Electronic Crossbar Switching Network for Multiprocessor Systems Atsushi Iwata, 1 Takeshi Doi, 1 Makoto Nagata, 1 Shin Yokoyama 2 and Masataka Hirose 1,2 1 Department of Physical Electronics Engineering 2 Research Center for Nano-Devices and Systems Hiroshima University Higashi-Hiroshima, 739 JAPAN INTRODUCTION Rapid progress in VLSI technologies has made it possible to implement a gigantic logic system with around 10 7 transistors on a single chip. For example, the multiprocessor system for multimedia applications was integrated on a single CMOS chip. 1 In the system, data channels which are indispensable to communicate between processors and shared memories, was realized by crossbar switch as shown in Fig.1. The crossbar switch has a capability to simultaneously exchange multiple data through arbitrary paths. If we design an electronic crossbar switch which operates at over 500 Mb/s, it consumes extraordinarily high power and large chip area, because it requires transmission lines and drivers. Thus on-chip interconnects become a severe limit in the operation speed and the scale of integration. To overcome the limitation, optical interconnections have been proposed for on-chip wiring as well as printed circuits boards and modules. 2, 3 Considering the features of photonics and electronics, we reached to the basic design concept. Photonics should be applied to data communications, and the other processings such as arithmetic, storage and control should be implemented by electronics. 3 In this paper, we propose the hybrid implementation of crossbar switch which consists of optical channels and electronic switches, and demonstrate its potential utilizing multi chip module(mcm) technologies. This MCM implementation is just a milestone for our final target of ultra-scale opto-electronic IC (U-OEIC) which integrate optical interconnections on a Si ULSI chip. Proc. #0 RAM #0 Proc. #1 Proc. #2 Proc. #3 Crossbar Switch RAM #1 RAM #2 RAM #3 Fig. 1 Multiprocessor System using Crossbar Switch
ARCHITECTURE AND DESIGN FOR CROSSBAR SWITCH We consider two architectures for photo-electronic hybrid implementation of crossbar switches. One is the transmit side switching:tss (Fig. 2 (a)), and the other is the receive side switching:rss (Fig.2 (b)). In the former, among the laser diodes ('s) which are connected to one transmit port, one or several 's are selected by the destination address. The active 's transmit optical data through the waveguides which are connected to the photo detector (PD) of the specified receiving ports. In the latter, optical data are distributed to all connected PD's through the branched waveguide, and they are selectively received at the PD's which are enabled by the destination address. The former requires n 2 's and n PD's, and the latter requires n 2 PD's and n 's where n is the number of ports. In the present state of arts, a fabrication cost of may be higher than that of PD, but progress in GaAs on Silicon technologies will resolve the cost problem. Transmission efficiency of optical energy is most important for performance of a bit rate and power dissipation. the efficiencies of waveguides and optical components were simulated by electro-magnetic field analysis based on the Finite Differential Time Domain (FD-TD) method. 4 The corner bend and branch using the double reflection mirrors (DRM) were proposed and its high efficiency was demonstrated by using the Si waveguide technique. 6,7 DRM technique is applicable to many types of optical components. Two types of waveguides including corners, branches and combiners which correspond to TSS and RSS crossbar switches are designed and analyzed. Simulated losses of the branched waveguide using DRM are 0.22 db as shown in Fig. 3, and examples of density plots of electric field distributions are also shown in the figure. In the right-angled cross, leakage power to the crossing waveguide is less than 30 db. These analysis results coincide with the measured results within about 20%. Total optical energy transmission efficiencies (Receiving power / Transmitting power) of TSS and RSS are estimated as shown in Fig. 4. In the figure, the horizontal axis is the number of ports where each port is assembled with a 1-cm spacing. The loss of straight waveguide is assumed to be 0.3-0.6 db/cm. In order to couple waveguides to or PD, grating couplers or mirrors are required. The coupling loss of these couplers are expected to as low as about 4 db. TSS has high efficiency of 10% in an 8-port switch. In this case, the losses of the couplers dominates. On the other hand, the efficiency of RSS is as low as 1% in an 8-port switch, because optical power is divided into each branch, and power dispersion occurs at each branch. Therefore we adopt TSS scheme to the crossbar. DeMux Driver Wire Waveguid PD Amplifier Mux Port 0 Port 1 Port 2 Port 3 TX RX Port 0 Port 1 Port 2 Port 3 Address & Data Data Address & Data (a) Transmit Side Switching Scheme (b) Receive Side Switching Scheme Fig. 2 Photo-Electonic Crossbar Switch Architectures
(0.05dB) (0.36dB) (<0.1dB) (>30dB) Right-angled C Transmit Side Switching Corner Be Branch (0.22dB) (0.22dB) Receive Side Switching (0.22dB) (0.36dB) Fig. 3 Waveguide layout Design and Field Distributions Simulated by FD-TD Analysis Optical Power Efficiency 10 0 10-1 10-2 10-3 10-4 Receive side Switching Scheme Transmit side Switching Scheme Grating Coupler Loss = 4dB Waveguide Loss = 0.3dB/cm = 0.6dB/cm Waveguide Loss = 0.3dB/cm = 0.6dB/cm 10-5 8 16 24 32 40 Number of Ports Fig. 4 Transmission Efficiency of Photo-Electronic Crossbar Switches DEVICES AND CIRCUITS LIGHT SOURCE: Vertical Cavity Surface Emitting Lasers (VCSEL's) have been vigorously developed to reduce a size and a threshold current. An 8 x 8 VCSEL array with a 1 mw optical output and a 3-GHz cutoff frequency was developed for optical interconnections. 7 PHOTO DETECTOR: A PN junction photo detector fabricated by a standard CMOS process is applicable to short wave length (< 1 µm) optical interconnections. Shrinking a design rule of CMOS devices, however, O-E conversion efficiency degrade because junction depth also decrease. To reduce stray capacitance, we have to reduce a PD size. But there is a lower limit to obtain a sufficient alignment margin between PD's and waveguides. Therefore, instead of PN-PD, a PIN photo detector is effective for attaining both high efficiency and high speed. We have studied an integration technology of PIN PD to CMOS devices. WAVEGUIDE: Many kinds of structures and fabrication technologies of waveguides were developed. For the final target of U-OEIC, we have studied the optical waveguide implemented on a Si wafer with LSI fabrication technologies. The Si waveguide is constructed with Si 3 N 4 core, SiO 2 clad and Aluminum cover layer. 5
AMPLIFIER: A wide-band and low-power amplifiers are required for receiver. Bandwidth of the receiver is limited by the time constant determined by stray capacitance in the PD and the amplifier input, and resistance for I-V conversion. In order to reduce the stray capacitance, a PD must be integrated on the amplifier chip. We have designed a CMOS amplifier using a 0.25-µm CMOS devices. It consists of a cascade connection of three inverter stages and a bias circuit as shown in Fig. 5(a). Responses of the output voltage and supply current simulated by SPICE are shown in Fig. 5(b). These show that the simple amplifier can detect a 1-Gb/s optical signal which provide a photo current of 50 µa. The delay time is 0.3 ns and the power dissipation is as low as 700 µw at a supply voltage of 2 V. DRIVER: driver array chips which can be assembled in the neighborhood of VCSEL array are necessary. A power efficient constant current driver with an automatic power control circuit will be developed by deep sub-µm CMOS devices. These device technologies will make it possible to integrate 1-Gb/s optical interconnections on a Si chip. EXPERIMENTAL RESULTS To demonstrate the potential of the proposed crossbar switch, transmission characteristics are measured using the existing devices. We use a VCSEL chip, a CMOS amplifier chip with a PIN PD, and an optical fiber. Figure 6 shows wave forms of modulated optical power measured at the end of the fiber, and output voltages of the amplifier. The bit rate of the experimental setup is limited to 100 MHz which is the bandwidth of the CMOS amplifier. Output voltage deviation due to transmit data patterns is observed. It seems to come from instability of optical power an and amplifier bias point. Nevertheless, the results show that a transmission of 100- Mb/s NRZ data is attained at an optical power input of 60 µw. Input 1st 2nd 3rd 4V Bit rate = 100Mb/s Driver Voltage Voltage (V) Supply Current (µa) 2.0 1.6 1.2 0.8 0.4 0 400 300 200 100 Bias Circuit Output (a) Circuit Schmatic of Amplifier 3rd 1st Input 2nd 1st Stage 2nd+3rd 0 1 2 3 4 Time (ns) (b) Input and Output Voltages and Supply Currents Fig. 5 CMOS Receiver Amplifier 3V 80µW 60µW 40µW 20µW 0µW Optical Power 4V 3V 1.55V 1.50V 1.45V 1.40V 1.35V 2ns/div Driver Voltage Amplifier Output 5ns/div Fig.6 Measured Responses of VCSEL Outputs and Amplifier Outputs
EXPECTED PERFORMANCES BIT RATE The maximum bit rate which maintains a sufficiently low error rate is calculated as a function of an SNR (receiving optical power / total noise power). Figure 7 shows relations between bit rate and receiving optical power. The total noise is a statistical mean of thermal noise of the resistor, shot noise of PD, and noise of MOS amplifier. In a current range of 1-100 µa resistor thermal noise dominates. From the figure, we can expect that a 1-Gb/s bit rate is achieved at 50-µW receiving optical power. Assuming the total efficiency of optical channels is larger than 5%, 1-mW optical power is sufficient for 1 Gb/s. Another limit of bit rate which comes from CMOS amplifier bandwidth is estimated to be 3 Gb/s by circuit simulation. Bit Rate (bit/sec) 10 10 10 9 10 8 10 7 PD Efficiency =1 ER = 10-15 ER = 10-20 PD Efficiency =0.5 (CMOS Amp. Limit) 10-5 10-4 Receiving Optical Power (W) Fig. 7 Bit Rate vs. Receiving Power ER = 10-15 ER = 10-20 POWER DISSIPATION Because threshold current of existing 's is as high as 1 ma, its power efficiency is as low as about 10%. In the future, micro VCSEL's with a low threshold current of several µa, and a low loss waveguide coupler will be developed. As for amplifiers, a 1Gb/s operation will be available with 700-µW dissipation power as shown in Fig. 5. Therefore, optical channels with a bit rate of 1 Gb/s will be realized with less than 2-mW power dissipation. Let's compare the dissipation power with that of an electronic crossbar. To realize 1- Gb/s bit rate, high speed interface circuits with 50-Ω line driving and a low voltage swing are required. Since existing single-ended high speed interface circuits such as GTL or CTL can not reach to the bit rate, we have to use a differential CML circuits. It requires a pair of transmission lines, and consumes large power dissipation of exceed 20 mw. Thus the estimated performance of bit rate and power dissipation of the proposed photoelectronic crossbar is considerably improved comparing with the electronic one. MCM STRUCTURE The MCM is composed of the main substrate made from metals or ceramics and the optical plate made from Si or Quartz as shown Fig. 8. Processor chips, memory chips, driver chips, VCSEL chips, and PD Amplifier chips are assembled on the main substrate utilizing existing technologies. The waveguides and grating couplers are fabricated on the optical plate. How to align and assemble both substrate is an important subject. The alignment margin can be relaxed by optimizing the size and the layout of 's, PD's and couplers. Since around 100-µm margin is expected to obtained, a special alignment technique and an expensive equipment are not required. The optical plate can be mechanically assembled on the main substrate after alignment.
CONCLUSION Photo-electronic hybrid crossbar switching network for data communications between processors and shared memories is proposed. Switching is performed by electronics and data transfer is performed by photonics. The transmit side switching scheme is proposed for high transmission efficiency of optical channels. The experimental and estimation results show that the photo-electronic crossbar works at a bit rate of 1Gb/s and with less than 2 mw total dissipation power. Acknowledgement The authors would like to thank Dr. I. Hayashi and the members of U-OEIC Research Group. REFERENCES 1 K. Guttag, R. J. Gove, and J. R. Van Aken, A Single-Chip Multiprocessor for Multimedia: The MVP, IEEE Computer Graphics & App., 12:53 (1992). 2 J. W. Goodman, F. I. Leonberger, S. Y. Kung and R. A. Athale, Optical Interconnection for VLSI System, Proc. IEEE., 72:850 (1984). 3. A. Iwata and I. Hayashi, Optical Interconnections as a New LSI Technology, IEICE Trans. Electron., E76-C: 90 (1993). 4. T. Shibata, and H. Kimura, Computer-Aided Engineering for Microwave and Millimeter-Wave Circuits using FD-TD Technique of Field Simulations, Int.. J. Microwave and MM-wave Comp.-Aided Eng.,3:238 (1993). 5. T. Namba, A. Uehara, T. Doi, T. Nagata, Y. Kuoda, S. Miyazaki, K. Shibahara, S. Yokoyama,A. Iwata and M. Hirose, High-Efficiency Micromirrors and Branched Optical Waveguides on Si Chips, Jpn. J. Appl. Phys., 35, I, 2B:1405 (1996). 6. T. Doi, T. Namba, A. Uehara, M. Nagata, S. Miyazaki, K. Shibahara, S. Yokoyama, A. Iwata, T. Ae and M. Hirose, Optically Interconnected Kohonen Net for Pattern Recognition, Jpn. J. Appl. Phys., 35, I, 2B:1405 (1996). 7. H. Kosaka et al., LEOS '94,1:259 (1994). Driver PD&Amp Processor Chips Waveguides Array Memory Chips Main Substrate Processor and Memory Chip Optical Plate Array PD & Amp Main Substrate Fig. 8. Structure of MCM