SINGLE FLUX QUANTUM ONE-DECIMAL-DIGIT RNS ADDER

Applied Superconductivity Vol. 6, Nos 10±12, pp. 609±614, 1998 # 1999 Published by Elsevier Science Ltd. All rights reserved Printed in Great Britain PII: S0964-1807(99)00018-6 0964-1807/99 $ - see front matter SINGLE FLUX QUANTUM ONE-DECIMAL-DIGIT RNS ADDER NADA VUKOVIC and MARC J. FELDMAN University of Rochester, Rochester, NY 14627, USA AbstractÐResidue number system (RNS) arithmetic has a promising role for fault-tolerant high throughput superconducting single ux quantum (SFQ) circuits for digital signal processing (DSP) applications. We have designed one of the basic computational blocks used in DSP circuits, one-decimaldigit RNS adder. A new design for its main component, the single-modulus adder, has been developed. It combines simple and robust RSFQ elementary cells, both combinational and sequential. The central units are a circular shift register, a code converter, and the clock control circuitry. Our mod5 adder employs 195 Josephson junctions, consumes 50 mw of power, and occupies an area of less than 2 mm 2. Chips were fabricated at HYPRES, Inc. using 1 ka/cm 2 low-t c Niobium technology. The mod5 adder was successfully tested at low speed, and gave experimental bias margins of 226%. # 1999 Published by Elsevier Science Ltd. All rights reserved INTRODUCTION The use of the residue number system (RNS) o ers the possibility of high-speed processing because of the separability of operation on each of the residue digits [1]. In addition, RNS arithmetic is intrinsically fault tolerant [2]. Common signal processing tasks such as digital ltering, correlation, interpolation, prediction, and spectral analysis characteristically require large numbers of addition, subtraction, multiplication, negation, etc., operations that are very suitable for RNS arithmetic [3]. RNS design techniques have been most successful to date for nite impulse response (FIR) lters, which can take full advantage of fast RNS operations while avoiding the problems associated with scaling [4]. There is continually increasing interest in the realization of ultra-high speed, very low power superconducting LSI using single ux quantum (SFQ) logic. The most prevalent is rapid single ux quantum (RSFQ) logic [5]. This and most other superconducting logic schemes implement a binary approach to perform basic arithmetic operations, thus inheriting some of the weaknesses of the semiconductor binary logic, such as the carry propagation problem in addition and multiplication that may ultimately limit the performance of the system. In this paper we present results on SFQ circuits which perform RNS arithmetic. Both RNS arithmetic and superconducting digital logic have been recognized as especially well suited for high performance digital signal processing (DSP) circuits, where the computation is dominated by a repetitive sequence of multiply and add operations with infrequent calls to memory, and high speed is the primary criterion. The rst RNS implementation in superconducting electronics based on processing SFQ pulses was proposed in [6]. The basic architecture of an SFQ one-decimal-digit adder is presented in Fig. 1. Inputs X and Y are two one-decimal-digit integers, i.e. X = [0, 9] and Y = [0, 9]. The output, which is the sum of X and Y, can be any integer between 0 and 18. Two single-modulus adders, mod5 and mod4, are su cient to cover this dynamic range. The cyclic shift register (SR) is the primary circuit element used to code RNS numbers and perform arithmetic. The particular design for the single-modulus adder detailed in [6] lacks a robust RSFQ circuit implementation, for two reasons. The rst problem is that counter ow and concurrent clocking are mixed together, using the same clock line. In counter ow clocking the data ows faster then the clock, and in concurrent clocking the opposite is true, the clock propagates faster then the data [7]. This results in narrower margins to process variations in fabrication, because the only adjustable parameter, the bias current, has a tendency to speed up one part of the SR and slow down the other, or vice versa. Second, the choice of non destructive readout (NDRO) cell and DMUX in the feedback part of SR results in an overall lower maximum operating frequency as 609

610 N. VUKOVIC and M. J. FELDMAN Fig. 1. Block diagram of an SFQ one-decimal-digit RNS adder. well as layout constraints. The NDRO cell has never been established as one of the more robust RSFQ cells. This paper presents a new design for the single-modulus adder used in the one-decimal-digit adder. It combines simple and robust RSFQ elementary cells, both combinational and sequential. The design and successful functionality test of the mod5 adder will be discussed. DESIGN Figure 2 shows a block diagram of our mod5 adder. It performs mod5 addition for two residues, vxv 5 and vyv 5. The circuit consists of a ve stage shift register (SR5), a feedback path with an AND gate and con uence bu er (CB), a code converter (CC) composed of destructive readout (DRO) cells, splitters (S) and CBs, and an output AND gate. The overall clocking scheme is counter ow, i.e. clock and data ow in opposite directions. The numbers are coded in ``1-outof-n'' code, where n is a given modulus; in this case n is 5. Because of the chosen coding scheme, the design of the SR with the feedback loop is better suited to satisfy the timing requirements that exist in any synchronous clocking scheme when a stream of successive one's is applied [8]. Fig. 2. Block diagram of an SFQ mod5 adder. Notation: SR: shift register; DRO: destructive read-out cell; CB: con uence bu er; S: splitter, OR: or gate; AND: and gate; JTL2: two junction Josephson transmission line.

Single ux quantum one-decimal-digit RNS adder 611 Fig. 3. Circuit diagram and parameter values of the single stage of SR. The code converter converts the ``1-out-of-n'' code into ``number-of-pulses'' code. Two signals clk5 and clk5' are 1808 out of phase and represent sequence of ve ones (SFQ pulses) and ve zeros. These signals could easily be generated using a simple ten stage circular SR as part of the control circuitry for entire mod5 portion of an LSI RNS circuit. Input vyv 5, which is delayed by four clock cycles, is applied to the serial input of the code converter. The output of the code converter is applied to the clock input of the SR. The second number, vxv 5, is applied to the input of the SR. In the rst ve clock cycles the number vxv 5 is loaded into the SR. In the second ve clock cycles it will be advanced by vyv 5, stages around the SR, and so the number in the SR now represents vxv 5 + vyv 5. This sum is then readout through the second AND gate at the same time as the subsequent vxv 5 number is loaded. Note that each AND gate is used to perform the function of a switch in this design. These switches replace the DMUX and NDRO which were used in [6]. A schematic of the SR cell and its optimized parameters are shown in Fig. 3. The same cell has been implemented in a 4-bit data acquisition shift register and tested with experimental bias margins of 240%, in [9]. The mod5 adder and its subcells were fully optimized at 10 GHz using MALT [10] and JSPICE [11]. Table 1 shows the resulting parameter margins from optimization and bias current margins from experiment. Table 1. Margins of mod5 adder and its subcells (%) Cell name GL$ (%) GIcb$ (%) Simulated bias (%) Experimental bias (%) AND 40 55 55, +48 240 OR 42 59 58, +63 238 Code Conv 41 56 34, +32 229 5-stage SR 35 40 38, +42 235 mod5 adder 33 39 30, +35 226 $Percentages are lower bounds on the margins corresponding to the ``axis lengths'' gure of merit returned by MALT. GL and GIcb denote the global inductance and global critical current with the bias current adjusted proportionally, respectively.

612 N. VUKOVIC and M. J. FELDMAN LAYOUT AND TESTING The mod5 adder was laid out using the Cadence Design Framework II graphical environment [12] calibrated for the HYPRES, Inc. standard Nb process [13]. The micrograph of the circuits is shown in Fig. 5. It was fabricated at HYPRES, Inc. with target junction critical current density of 1 ka/cm 2. The mod5 adder employs 195 Josephson junctions, consumes 50 mw of power, and occupies an area of less than 2 mm 2. Low speed testing was performed using our automated thirty-nine channel data acquisition setup, controlled by a PC running Labview. Figure 4 shows the low speed test results on the mod5 adder. All critical combinations of the inputs are successfully tested. Figure 4 shows only three combinations: vxv 5 = 4 (00001), and vyv 5 = 2 (00100), 3 (00010) and 4 (00001). The resulting sums are vxv 5 + vyv 5 = 1 (01000), 2 (00100) and 3 (00010), respectively. The upper traces represent external inputs and clock signals which are coded using the return-to-zero (RZ) convention. Each edge triggers an SFQ pulse from a DC/SFQ converter. Outputs are captured by an SFQ/DC converter, where each transition corresponds to one SFQ pulse. The code converter output (Code Conv_Out) shows the correct sequence of pulses when data vxv 5 is applied, i.e. seven (5 + 2), eight (5 + 8) and nine (5 + 4) transitions. The experimental bias margins of the circuit and its subcells are shown in Table 1. CONCLUSIONS A new single module adder, mod5 adder, the main component of a one-decimal-digit RNS adder, was designed, laid out and evaluated at low speed. The mod5 adder consists of simple, elementary RSFQ cells and has a very good parameter margins. The circuit presents rst successful implementation of residue number system arithmetic in single ux quantum superconducting technology. Fig. 4. The experimental results of the mod5 adder.

Single ux quantum one-decimal-digit RNS adder 613 Fig. 5. Micrograph of the mod5 adder. AcknowledgementsÐThe authors would like to thank Qing Ke for bringing the RNS concept to their attention. This work was supported in part by the University Research Initiative at the University of Rochester, sponsored by the Army Research O ce under Grant No. DAAL03-92-G-0112. REFERENCES 1. N. S. SzaboÂ and R. I. Tanaka, Residue Arithmetic and Its Applications in Computer Technology. McGraw-Hill, New York (1967). 2. P.E. Beckmann and B.R. Musicus, IEEE Trans. Signal Proc. 41, 2300 (1993). 3. M. A. Soderstrand, W. K. Jenkins, G. A. Jullien, and F. J. Taylor (eds.), Residue Number System Arithmetic: Modem Applications in Digital Signal Processing. IEEE Press, New York (1986).

614 N. VUKOVIC and M. J. FELDMAN 4. M.A. Soderstrand and R.A. Escott, IEEE Trans. Circuits Syst. CAS-33, 5 (1986). 5. K.K. Likharev and V.K. Semenov, IEEE Trans. Appl. Superconduct. 1, 3 (1991). 6. Q. Ke and M.J. Feldman, IEEE Trans. Appl. Superconduct. 5, 2988 (1995). 7. K. Gaj, E.G. Friedman and M.J. Feldman, IEEE Trans. Appl. Superconduct. 5, 3320 (1995). 8. C.A. Mancini, N. Vukovic, A.M. Herr, K. Gaj, M.F. Bocko and M.J. Feldman, IEEE Trans. Appl. Superconduct. 7, 2832 (1997). 9. Q.P. Herr, K. Gaj, A.M. Herr, N. Vukovic, C.A. Mancini, M.F. Bocko and M.J. Feldman, IEEE Trans. Appl. Superconduct. 7, 2975 (1997). 10. Q.P. Herr and M.J. Feldman, IEEE Trans. Appl. Superconduct. 5, 3327 (1995). 11. S.R. Whiteley, IEEE Trans. Magn. 27, 2902 (1991). 12. Cadence Corporation, Cadence Openbook, San Jose, CA (1993). 13. HYPRES Niobium process ow and design rules are available from HYPRES, Inc., 175 Clearbrook Road, Elmsford, NY 10523, http://www.hypres.