FPGA IMPLEMENTATION OF 32-BIT WAVE-PIPELINED SPARSE- TREE ADDER

Similar documents
Design and Operation Of Parallel Carry-Save Pipelined Rsfq Multiplier For Digital Signal Processing

Design of 8-Bit RSFQ Based Multiplier for DSP Application

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

A Novel Approach For Designing A Low Power Parallel Prefix Adders

Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2

Analysis of Parallel Prefix Adders

Design and Implementation of Hybrid Parallel Prefix Adder

Design of High Speed and Low Power Adder by using Prefix Tree Structure

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

An Efficient Higher Order And High Speed Kogge-Stone Based CSLA Using Common Boolean Logic

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

A NOVEL IMPLEMENTATION OF HIGH SPEED MULTIPLIER USING BRENT KUNG CARRY SELECT ADDER K. Golda Hepzibha 1 and Subha 2

Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

IN the past few years, superconductor-based logic families

Area Delay Efficient Novel Adder By QCA Technology

Implementation and Performance Evaluation of Prefix Adders uing FPGAs

CONVENTIONAL design of RSFQ integrated circuits

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

Design and Implementation of a delay and area efficient 32x32bit Vedic Multiplier using Brent Kung Adder

Design and Estimation of delay, power and area for Parallel prefix adders

Multi-Channel Time Digitizing Systems

Direct measurements of propagation delay of single-flux-quantum circuits by time-to-digital converters

THE Josephson junction based digital superconducting

2 SQUID. (Superconductive QUantum Interference Device) SQUID 2. ( 0 = Wb) SQUID SQUID SQUID SQUID Wb ( ) SQUID SQUID SQUID

Digital Encoder for RF Transmit Waveform Synthesizer Amol Inamdar, Deepnarayan Gupta, Saad Sarwana, Anubhav Sahu, and Alan M.

A Taxonomy of Parallel Prefix Networks

CLAA, CSLA and PPA based Shift and Add Multiplier for General Purpose Processor

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

Design of Efficient 32-Bit Parallel PrefixBrentKung Adder

Adder (electronics) - Wikipedia, the free encyclopedia

Parallel Prefix Han-Carlson Adder

64 Bit Pipelined Hybrid Sparse Kogge-Stone Adder Using Different Valance

Performance Enhancement of Han-Carlson Adder

A CASE STUDY OF CARRY SKIP ADDER AND DESIGN OF FEED-FORWARD MECHANISM TO IMPROVE THE SPEED OF CARRY CHAIN

An Optimized Design for Parallel MAC based on Radix-4 MBA

Design and Implementation of High Speed Area Efficient Carry Select Adder Using Spanning Tree Adder Technique

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

An Efficient Design of Low Power Speculative Han-Carlson Adder Using Concurrent Subtraction

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

A Novel 128-Bit QCA Adder

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

DESIGN AND IMPLEMENTATION OF 128-BIT QUANTUM-DOT CELLULAR AUTOMATA ADDER

A Prescaler Circuit for a Superconductive Time-to-Digital Converter

IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, VOL. 28, NO. 2, MARCH

Design of Efficient Han-Carlson-Adder

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

32-Bit CMOS Comparator Using a Zero Detector

Design and implementation of Parallel Prefix Adders using FPGAs

Simulation study of brent kung adder using cadence tool

Area-Delay Efficient Binary Adders in QCA

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Design and Implementation of High Speed Carry Select Adder

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

On Built-In Self-Test for Adders

Comparative Analysis of Various Adders using VHDL

Design and Characterization of Parallel Prefix Adders using FPGAs

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Index terms: Gate Diffusion Input (GDI), Complementary Metal Oxide Semiconductor (CMOS), Digital Signal Processing (DSP).


Design and Implementation of Complex Multiplier Using Compressors

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

Design of Delay Efficient PASTA by Using Repetition Process

Performance Analysis of Advanced Adders Under Changing Technologies

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Performance Boosting Components of Vedic DSP Processor

High Speed Multioutput 128bit Carry- Lookahead Adders Using Domino Logic

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Integrated Circuit Design 813 Stellenbosch University Dept. E&E Engineering

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

ADVANCES in NATURAL and APPLIED SCIENCES

Circuit Description and Design Flow of Superconducting SFQ Logic Circuits

DESIGN OF BINARY MULTIPLIER USING ADDERS

Modelling Of Adders Using CMOS GDI For Vedic Multipliers

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

Design of an Energy Efficient 4-2 Compressor

Low Power and Area EfficientALU Design

LOW POWER HIGH SPEED MODIFIED SQRT CSLA DESIGN USING D-LATCH & BK ADDER

Design of an optimized multiplier based on approximation logic

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Pre Layout And Post Layout Analysis Of Parallel Counter Architecture Based On State Look-Ahead Logic

Multi-J c (Josephson Critical Current Density) Process for Superconductor Integrated Circuits Daniel T. Yohannes, Amol Inamdar, and Sergey K.

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

Review Paper on an Efficient Processing by Linear Convolution using Vedic Mathematics

SUPERCONDUCTOR DIGITAL-RF TRANSCEIVER COMPONENTS

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Analysis Parameter of Discrete Hartley Transform using Kogge-stone Adder

ISSN Vol.02, Issue.11, December-2014, Pages:

Design of Roba Mutiplier Using Booth Signed Multiplier and Brent Kung Adder

Design and Implementation of Modified High Speed Vedic Multiplier Using Modified Kogge Stone ADD ER

ISSN Vol.07,Issue.08, July-2015, Pages:

Transcription:

FPGA IMPLEMENTATION OF 32-BIT WAVE-PIPELINED SPARSE- TREE ADDER Kasharaboina Thrisandhya *1, LathaSahukar *2 1 Post graduate (M.Tech) in ATRI, JNTUH University, Telangana, India. 2 Associate Professor in ATRI, JNTUH University, Telangana, India. ABSTRACT In this novel presentation include the design, testing and architecture of the 32-bit asynchronous wave pipelined sparse-tree superconductor rapid single-flux quantum adder implemented. Compared to the Kogge Stone adder, our prefix parallel sparsetree adder has better efficiency on energy with significantly decreased complexity and almost no reduced operation frequency. The 32-bit adder core has 9941 Joseph-son junctions occupying an area of 8.5 mm2. It is designed operation frequency targeted as 30 GHz with the expected latency of 352ps at bias voltage of2.5mv. The adder chip was fabricated and tested successfully at low frequency for all test patterns with measured bias margins of +9.8%/ 10.7%. Index-terms: Adders, digital arithmetic, superconducting integrated circuits, superconducting logic circuits, sparse tree. I. INTRODUCTION In the universal digital circuits for almost any application is an adder. It is the fundamental building block of Arithmetic Logic Units (ALUs) in general - purpose and special-purpose digital signal microprocessors. Currently, in the CMOS domain, the design space of adder structures has been nearly exhausted, with only minimal improvements shown over previous designs. In contrast, emerging digital circuit technologies such as superconducting Rapid Single Flux Quantum (RSFQ) logic opens a way for researchers to explore new design methodologies for extremely fast, energy-efficient adders. In RSFQ logic, most adder designs demonstrated to date are bit-serial or digit-serial architectures which operate on a single bit or a small group of bits sequentially at a very high processing rate [1] [2]. Such designs allow for simple clocking and compact structures. However, the latency of serial adders scales O(n), where n is the number of bits per operand, which leads to long latencies for 32-/64-bit operations in general purpose processors. In the past, parallel architectures in RSFQ have been limited to small data widths or relatively long latency ripple-carry adders [3]. One study evaluated 32-/64-bit parallel Kogge-Stone RSFQ adders using co-flow clocking [4]. In the effort of realizing scalable, highperformance, fully parallel designs, a new technique of asynchronous hybrid wave-pipelining for RSFQ circuits has been developed at Stony Brook University (SBU) [5], [6]. Later, as a result of the collaboration between the SBU and HYPRES designers, an 8-bit wave-pipelined ALU was successfully designed, fabricated, and demonstrated correct operation at the rate of 20 GHz[7], [8].In this paper, we present the design of the first 32-bit asynchronous parallel adder implemented in RSFQ logic. It builds upon the proven hybrid wavepipelining techniques to provide 32-bit wide processing and synchronization. It incorporates an energy efficient, low complexity sparse-tree structure with very high processing rate. The work is based on a design study for a scalable 32-bit wave-pipelined sparse-tree adder conducted at SBU. II. 32-BIT SPARSE TREE RSFQ ADDER A. Sparse-tree RSFQ Adder High-performance parallel adders typically use prefix trees which generate carries in log2(n) time, where n is the number of bits of the data path. The Kogge- Stone adder (KSA) is considered to be the fastest among parallel-prefix adders. Further enhancements to the KSA prefix structure such as the sparse-tree configuration have been proposed and used in high- Volume: 05 Issue: 30 l Sep -2014 www.ijeec.com Page 451

performance Intel processors. In our 32-bit RSFQ adder design, we chose the sparse-tree structure to reduce the number of wiring junctions needed for its implementation without any significant effect on its processing rate. As a side effect, this will also lead to a more energy-efficient design by reducing the total bias current and power consumption. It consists of the following three stages: Initialization, Prefix-Tree and Summation. The Initialization stage receives two 32-bit data operands A and B to create bitwise Generate (G) and Propagate (P) signals which will be merged in a logarithmic manner in the Prefix-Tree stage. The Initialization stage consists of GPR_INIT logic blocks, one for each bit. The GPR_INIT creates the bitwise prefix functions described as Gi = Ai Bi and Pi = Ai Bi where i is the bit index column ranging from 31 down to 0 in the 32-bit adder. These functions are easily realized through clocked AND and XOR gates in a co-flow clocking arrangement. The clock is the Rdy signal provided to all bits Additionally, it is necessary to create the trailing reset signal R which will be used to reset the asynchronous elements in the Prefix-Tree. Signal R is a copy of the Rdy signal for each bit with wj-based delay lines to ensure data signals are processed before reset follows in the asynchronously Prefix-Tree. The Prefix-Tree stage consists of Carry-Merge (CM) blocks to merge the prefix signals and provide a group carry to each 4-bit summation block. In contrast, the Kogge-Stone prefix tree provides a carry to every individual bit of the adder. DFF (D flip-flop) buffers appropriately delay prefix and bitwise P signals until they are ready to be merged or processed at the Summation stage, respectively. The first three levels of the Prefix-Tree also perform the ripple-carry addition within each 4-bit group before data arrive at the Summation stage. Merging of the prefix signals is described in [10]. It is implemented with CFFs (resettable Muller C-flip-flop gates based on the Muller C-element and confluence buffers used as asynchronous OR gates without any danger of violating the time separation requirement of their input pulses. B. Parallel prefix adders The parallel prefix adders are more flexible and are used to speed up the binary additions. Parallel prefix adders are obtained from Carry Look Ahead (CLA) structure. We use tree structure form to increase the speed of arithmetic operation. Parallel prefix adders are fastest adders and these are used for high performance arithmetic circuits in industries. The construction of parallel prefix adder [10] involves three stages. Pre-possessing stage: In this stage we compute, generate and propagate signals to each pair of inputs A and B. These signals are given by the logic equations 1&2: Pi=Ai xor Bi... (1) Gi=Ai and Bi... (2) Carry generation network: In this stage we compute carries corresponding to each bit. Execution of these operations is carried out in parallel [9]. After the computation of carries in parallel they are segmented into smaller pieces. It uses carry propagate and generate as intermediate signals which are given by the logic equations 3&4: CPi:j=Pi:k+1 and Pk:j...(3) CGi:j=Gi:k+1 or (Pi:k+1 and Gk:j)...(4) Post processing: This is the final step to compute the summation of input bits. It is common for all adders and the sum bits are computed by logic equation 4&5: Ci-1=(Pi and Cin) or Gi... (4) Si=Pi xor Ci-1... (5) Volume: 05 Issue: 30 l Sep -2014 www.ijeec.com Page 452

Figure-1: Structural diagram of the 32-bit sparse-tree adder C. Carry Look Ahead Adder To reduce the computation time, engineers devised faster ways to add two binary numbers by using carry-look ahead adders. They work by creating two signals (P and G) for each bit position, based on if a carry is propagated through from a less significant bit position (at least one input is a '1'), a carry is generated in that bit position (both inputs are '1'), or if a carry is killed in that bit position (both inputs are '0'). In most cases, P is simply the sum output of a half-adder and G is the carry output of the same adder. After P and G are generated the carries for every bit position are created. Some advanced carry-lookahead architectures are the Manchester carry chain, Brent Kung adder, and the Kogge Stone adder.some other multi-bit adder architectures break the adder into blocks. It is possible to vary the length of these blocks based on the propagation Delay of the circuits to optimize computation time. These block based adders include the carry by pass adder which will determine P and G values for each block rather than each bit, and the carry select adder which pregenerates sum and carry values for either possible carry input to the block. A carry-look ahead adder (CLA) is a type of adder used in digital logic. A carry-look ahead adder improves speed by reducing the amount of time required to determine carry bits. It can be contrasted with the simpler, but usually slower, ripple carry adder for which the carry bit is calculated alongside the sum bit, and each bit must wait until the previous carry has been calculated to begin calculating its own result and carry bits (see adder for detail on ripple carry adders). The carry-look ahead adder calculates one or more carry bits before the sum, which reduces the wait time to calculate the result of the larger value bits. The Kogge-Stone adder and Brent-Kung adder are examples of this type of adder. III. SIMULATION RESULTS Various adders were designed using Verilog language in Xilinx ISE Navigator and all the simulations are performed using Model sim 6.5e Volume: 05 Issue: 30 l Sep -2014 www.ijeec.com Page 453

simulator. The performance of proposed adders are analyzed and compared. In this proposed architecture, the implementation code for modified 32-bit sparsetree RSFQ adder carry look Ahead adders were developed and corresponding values of delay and area were observed. Table1 shows the comparison of adders. The simulated outputs of 32-bit proposed adders are shown in figure. Figure-2: Simulation waveform for Sparse Tree Adder Figure-3: RTL diagram IV. CONCLUSION We have designed, fabricated, and tested the first 32-bitwave-pipelined sparse-tree RSFQ adder chip with the core complexity of 9941 JJs and the target operation rate of 30 GHz. We have successfully demonstrated the correct operation of the chip at low frequency, passing all carefully chosen test vector with a measured bias margin of +9.8%/ 10.7%. Another adder chip consisting of 12785 junctions with additional on-chip circuits for 30 GHz testing was also fabricated but its testing showed the need for another fabrication run. REFERENCES [1] H. Park, Y. Yamanashi, N. Yoshikawa, M. Tanaka, and A. Fujimaki, Design of fast digit-serial adders using SFQ logic circuits, IEICE Electronics Express, vol. 6, no. 19, pp. 1408 1413, 2009. [2] S. V. Polonsky, V. K. Semenov, P. I. Bunyk, A. F. Kirichenko, A. Y. Kidiyarov-Shevchenko, O. A. Mukhanov, P. N. Shevchenko, D. F. Schneider, D. Y. Zinoviev, and K. K. Likharev, New RSFQ circuits Josephson junction digital devices, IEEE Trans. Appl. Supercond., vol. 3, no. 1, pp. 2566 2577, Mar. 1993. [3] J. Y. Kim, S. Kim, and J. Kang, Construction of an RSFQ 4-bit ALU with half adder cells, IEEE Trans. Appl. Supercond., vol. 15, no. 2, pp. 308 311, Jun. 2005. [4] P. Bunyk and P. Litskevitch, Case study in RSFQ design: Fast pipelined parallel adder, IEEE Trans. Appl. Supercond., vol. 9, no. 2, pp. 3714 3720, Jun. 1999. [5] M. Dorojevets, C. Ayala, and A. Kasperek, Development and evaluation of design techniques for high-performance wave-pipelined wide datapath RSFQ processors, in Proc. 12th Int. Supercond. Electron. Conf. Fukuoka, Japan, 2009, SP-P46. [6] M. Dorojevets, C. L. Ayala, and A. K. Kasperek, Data-flow microarchitecture for wide datapath RSFQ processors: Design study, IEEE Trans. Appl. Supercond., vol. 21, no. 3, pp. 787 791, Jun. 2011. [4]. H. Aboushady, Y. Dumonteix, M. M. Louerat and H. Mehrez, Efficient polyphase decomposition of comb decimation filters in Sigma- Delta analog-todigital converters, IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Vol. 48, No. 10, pp. 898 903, 2001. [5]. Gordana Jovanovic Dolecek and Sanjit K. Mitra,"A New Two-Stage CIC-Based Decimation Filter",Proceedings of the 5th International Symposium on image and Signal Processing and Analysis, pp. 218 223, 2007. [6]. Gordana Jovanovic-Dolecek and Sanjit K Mitra, On Design of CIC Decimation Filter with Improved Response, IEEE 3rd International Symposium on Volume: 05 Issue: 30 l Sep -2014 www.ijeec.com Page 454

Communications, Control and Signal Processing, pp. 1072-1076, 2008. [7] T. Filippov, M. Dorojevets, A. Sahu, A. Kirichenko, C. Ayala, and O. Mukhanov, 8-bit asynchronous wave-pipelined RSFQ arithmeticlogic unit, IEEE Trans. Appl. Supercond., vol. 21, no. 3, pp. 847 851, Jun. 2011. [8] T. V. Filippov, A. Sahu, A. F. Kirichenko, I. V. Vernik, M. Dorojevets, C. L. Ayala, and O. A. Mukhanov, 20 GHz operation of an asynchronous wave-pipelined RSFQ arithmetic-logic unit, Phys. Proc., vol. 36, pp. 59 65, 2012. [9] P.M Kogge and H. S. Stone, A parallel algorithm for the efficient solution of a general class of recurrence equations, IEEE Trans. Computer, vol.c- 22, no. 8, pp. 786-793, Aug.1973. [10] D. Harris, A taxonomy of parallel prefix networks, in Signals, Systems and Computers, 2003. Conference Record of Thirty Seventh Asilomar Conference on, vol. 2, the Nov. 2003, pp.2217. Design. Kasharaboina Thrisandhya received B.Tech degree in ECE from Vanjari Seethaiah Memorial Engineering College in 2012, pursuing M.Tech (2012-2014) in the stream of VLSI at Aurora s Technological and Research Institute, (Affiliated to JNTUH) Hyderabad. Her interest area is VLSI Latha Sahukar, Presently working as Associate professor in ATRI, Hyderabad. Her s area of interest is VLSI Design, Communication Systems. Volume: 05 Issue: 30 l Sep -2014 www.ijeec.com Page 455