Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2

Similar documents
Analysis of Parallel Prefix Adders

A Novel Approach For Designing A Low Power Parallel Prefix Adders

Design and Estimation of delay, power and area for Parallel prefix adders

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

Design and Implementation of Hybrid Parallel Prefix Adder

Design and Characterization of Parallel Prefix Adders using FPGAs

Design of High Speed and Low Power Adder by using Prefix Tree Structure

Design and implementation of Parallel Prefix Adders using FPGAs

64 Bit Pipelined Hybrid Sparse Kogge-Stone Adder Using Different Valance

Design Of 64-Bit Parallel Prefix VLSI Adder For High Speed Arithmetic Circuits

Implementation and Performance Evaluation of Prefix Adders uing FPGAs

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS


INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Design and Implementation of High Speed Area Efficient Carry Select Adder Using Spanning Tree Adder Technique

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

High Performance Vedic Multiplier Using Han- Carlson Adder

Comparison among Different Adders

Design and Implementation of a delay and area efficient 32x32bit Vedic Multiplier using Brent Kung Adder

Parallel Prefix Han-Carlson Adder

Design of Efficient 32-Bit Parallel PrefixBrentKung Adder

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

FPGA Implementation of Area-Delay and Power Efficient Carry Select Adder

A Taxonomy of Parallel Prefix Networks

Area Delay Efficient Novel Adder By QCA Technology

Performance Analysis of Advanced Adders Under Changing Technologies

ADVANCES in NATURAL and APPLIED SCIENCES

A NOVEL IMPLEMENTATION OF HIGH SPEED MULTIPLIER USING BRENT KUNG CARRY SELECT ADDER K. Golda Hepzibha 1 and Subha 2

Performance Enhancement of Han-Carlson Adder

LOW POWER HIGH SPEED MODIFIED SQRT CSLA DESIGN USING D-LATCH & BK ADDER

Structural VHDL Implementation of Wallace Multiplier

Comparative Analysis of Various Adders using VHDL

Simulation study of brent kung adder using cadence tool

Power Efficient Weighted Modulo 2 n +1 Adder

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

FPGA IMPLEMENTATION OF 32-BIT WAVE-PIPELINED SPARSE- TREE ADDER

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN

A New Parallel Prefix Adder Structure With Efficient Critical Delay Path And Gradded Bits Efficiency In CMOS 90nm Technology

CLAA, CSLA and PPA based Shift and Add Multiplier for General Purpose Processor

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Implementation of 64 Bit KoggeStone Carry Select Adder with BEC for Efficient Area

ISSN Vol.02, Issue.11, December-2014, Pages:

Modelling Of Adders Using CMOS GDI For Vedic Multipliers

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

FPGA Implementation of Multiplication and Accumulation Unit using Vedic Multiplier and Parallel Prefix adders in SPARTAN 3E

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

Design and Implementation of High Radix Booth Multiplier using Koggestone Adder and Carry Select Adder

Research Journal of Pharmaceutical, Biological and Chemical Sciences

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Design of Efficient Han-Carlson-Adder

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

PERFORMANCE ANALYSIS OF DIFFERENT ADDERS USING FPGA

An Optimized Design for Parallel MAC based on Radix-4 MBA

ISSN:

Design of 16-bit Heterogeneous Adder Architectures Using Different Homogeneous Adders

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

Copyright. Vignesh Naganathan

Implementation of Parallel Prefix Adders Using FPGA S

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

Design of an optimized multiplier based on approximation logic

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

Design of Roba Mutiplier Using Booth Signed Multiplier and Brent Kung Adder

Design and Simulation of Low Power and Area Efficient 16x16 bit Hybrid Multiplier

An Efficient Design of Low Power Speculative Han-Carlson Adder Using Concurrent Subtraction

Design, Implementation and performance analysis of 8-bit Vedic Multiplier

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

AREA AND POWER EFFICIENT CARRY SELECT ADDER USING BRENT KUNG ARCHITECTURE

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

Design of Delay Efficient PASTA by Using Repetition Process

A Novel 128-Bit QCA Adder

Design and Implementation of Complex Multiplier Using Compressors

An Efficient Higher Order And High Speed Kogge-Stone Based CSLA Using Common Boolean Logic

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

DESIGN OF HIGH SPEED 32 BIT UNSIGNED MULTIPLIER USING CLAA AND CSLA

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

Adder (electronics) - Wikipedia, the free encyclopedia

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

ISSN Vol.03, Issue.07, September-2015, Pages:

Anitha R 1, Alekhya Nelapati 2, Lincy Jesima W 3, V. Bagyaveereswaran 4, IEEE member, VIT University, Vellore

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Survey of VLSI Adders

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Design of High Speed Carry Select Adder using Spurious Power Suppression Technique

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

VLSI IMPLEMENTATION OF AREA, DELAYANDPOWER EFFICIENT MULTISTAGE SQRT-CSLA ARCHITECTURE DESIGN

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

Transcription:

Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2 1 M.Tech scholar, GVIC, Madhanapally, A.P, India 2 Assistant Professor, Dept. of ECE, GVIC, Madhanapally, A.P, India Abstract The binary is the critical element in most digital circuit designs including digital signal processors (DSP) and microprocessor data path units. In VLSI implementations, parallel-prefix s are known to have the best performance. This paper investigates three types of PPA s (Kogge Stone Adder (KSA), Spanning Tree Adder (STA) and Sparse Kogge Stone Adder (SKA)). Additionally Ripple Carry Adder (RCA), Carry look ahead (CLA), Carry select Adder (CSLA) and Carry Skip Adder (CSA) are also investigated. These s are implemented in verilog Hardware Description Language (HDL) synthesis results are carried out using Xilinx Integrated Software Environment (ISE) Design Suite and simulation results are carried out using modelsim. Keywords Ripple ; select ; skip ; parallel prefix s; block cell; gray cell; Xilinx; modelsim. I. INTRODUCTION Binary s are one of the most essential logic elements within a digital system. In addition, binary s are also helpful in units other than Arithmetic Logic Units (ALU), such as multipliers, dividers and memory addressing. Therefore, binary addition is essential that any improvement in binary addition can result in a performance boost for any computing system and, hence, help improve the performance of the entire system. Parallel prefix s have better performance. The delays of the s are discussed [1]. In this paper, above mentioned PPA s and RCA and CSA are implemented and characterized on a Xilinx Spartan3e FPGA. Finally, delay, power and area for the designed s are presented and compared. II. DRAWBACKS OF CONVENTIONAL ADDERS The major problem for binary addition is the chain. As the width of the input operand increases, the length of the chain increases. Figure 1 demonstrates an example of an 8- bit binary add operation and how the chain is affected. This example shows that the worst case occurs when the travels the longest possible path, from the least significant bit (LSB) to the most significant bit (MSB). In order to improve the performance of propagate s, it is possible to accelerate the chain, but not eliminate it. Consequently, most digital designers often resort to building faster s when optimizing computer architecture, because they tend to set the critical path for most computations. Fig.1: Binary Adder Example. In order to reduce the delay in RCA (or) to propagate the in advance, we go for look ahead Basically this works on two operations called propagate and generate The propagate and generate equations are given by. ----------(1) ----------(2) ----(3) ----(4) ----(5) -----(6) Equations (3),(4),(5) and (6) are observed that, the complexity increases by increasing the bit width. So designing higher bit CLA becomes complexity. In this way, for the higher bit of CLA s, the complexity increases by increasing the width of the. So results in bounded fan-in rather than unbounded fan-in, when designing wide width s. In order to compute the carries in advance without delay and complexity, there is a concept called Parallel prefix approach III. DIFFERENCE BETWEEN PARALLEL- PREFIX ADDERS AND OTHERS Parallel-prefix s, also known as tree s, pre-compute the propagate and generate ISSN: 2231-5381 http://www.ijettjournal.org Page 435

signals. These signals are variously combined using the fundamental operator (fco). (g L, p L ) ο (g R, p R ) = (g L + p L g R, p L p R ) -------(7) Due to associative property of the fco, these operators can be combined in different ways to form various structures. For, example the four-bit -look ahead generator is given by: c 4 = (g 4, p 4 ) ο [ (g 3, p 3 ) ο [(g 2, p 2 ) ο (g 1, p 1 )] ] ------(8) A simple rearrangement of the order of operations allows parallel operation, resulting in a more efficient tree structure for this four bit example: c 4 = [(g 4, p 4 ) ο (g 3, p 3 )] ο [(g 2, p 2 ) ο (g 1, p 1 )] ------(9) Equations (8) and (9) are observed that, the look ahead takes 3 steps to generate the, but the bit PPA takes 2 steps to generate the. IV. PARALLEL-PREFIX ADDER STRUCTURE To resolve the delay of -look ahead s, the scheme of multilevel-look ahead s or parallel-prefix s can be employed. The idea is to compute small group of intermediate prefixes and then find large group prefixes, until all the bits are computed. These s have tree structures within a -computing stage similar to the propagate. However, the other two stages for these s are called pre-computation and postcomputation stages. In pre-computation stage, each bit computes its generate/propagate and a temporary sum. In the prefix stage, the group generate/propagate signals are computed to form the chain and provide the -in for the below. Gi:k = Gi:j + Pi:j. Gj-1:k Pi:k = Pi:j. Pj-1:k In the post-computation stage, the sum and -out are finally produced. The -out can be omitted if only a sum needs to be produced. si = ti ^ Gi:-1 cout = gn-1 + pn-1 _ Gn-2:-1 where Gi:-1 = ci with the assumption g-1 = cin. The general diagram of parallel-prefix structures is shown in Figure 2, where an 8-bit case is illustrated. All parallel-prefix structures can be implemented with the equations above; however, Equation can be interpreted in various ways, which leads to different types of parallel-prefix trees. There are several design factors that can impact the performance of prefix structures. Radix/Valency Logic Levels Fan-out Wire tracks Fig.2: 8-bit Parallel-Prefix Structure with save notation. BUILDING PREFIX STRUCTURES Parallel-prefix structures are found to be common in high performance s because of the delay is logarithmically proportional to the width. Such structures can usually be divided into three stages, pre-computation, prefix tree and postcomputation. In the prefix tree, group generate/propagate are the only signals used. The group generate/propagate equations are based on single bit generate/propagate, which are computed in the pre-computation stage. gi = ai. bi pi = ai ^ bi where 0 < I < n. g -1 = cin and p -1 = 0. Sometimes, pi can be computed with OR logic instead of an XOR gate. The OR logic is mandatory especially when Ling's scheme is applied. Here, the XOR logic is utilized to save a gate for temporary sum ti. In the prefix tree, group generate/propagate signals are computed at each bit. Gi:k = Gi:j + Pi:j. Gj-1:k Pi:k = Pi:j. Pj-1:k More practically, the above equation can be expressed using a symbol o denoted by Brent and Kung. Its function is exactly the same as that of a black cell. That is (Gi:k; Pi:k) = (Gi:j; Pi:j) o (Gj-1:k; Pj-1:k); or Fig.3: Cell Definitions. ISSN: 2231-5381 http://www.ijettjournal.org Page 436

Gi:k = (gi; pi) o (gi-1; pi-1) o o (gk; pk) Pi:k = pi. pi-1.. pk The "o" operation will help make the rules of building prefix structures. In the post-computation, the sum and -out are the final output. si = pi. Gi-1:-1 cout = Gn:-1 Where -1 is the position of -input. The generate/propagate signals can be grouped in different fashion to get the same correct carries. Based on different ways of grouping the generate/propagate signals, different prefix architectures can be created. Figure 3 shows the definitions of cells that are used in prefix structures, including black cell and gray cell. Black/gray cells implement the above two equations, which will be heavily used in the following discussion on prefix trees. PREFIX TREE FAMILY Parallel-prefix trees have various architectures. These prefix trees can be distinguished by four major factors. 1) Radix/Valency 2) Logic Levels 3) Fan-out 4) Wire Tracks In the following discussion about prefix trees, the radix is assumed to be 2 (i.e. the number of inputs to the logic gates is always 2). The more aggressive prefix schemes have logic levels [log2(n)], where n is the width of the inputs. However, these schemes require higher fanout, or many wire-tracks or dense logic gates, which will compromise the performance e.g. speed or power. Some other schemes have relieved fan-out and wire tracks at the cost of more logic levels. When radix is fixed, the design trade-off is made among the logic levels, fan-out and wire tracks. It is readily apparent that a key advantage of the tree structured is that the critical path due to the delay is on the order of log2n for an N-bit wide. The arrangement of the prefix network gives rise to various families of s. For this study, the focus is on the Kogge-Stone, known for having minimal logic depth and fanout (see Figure 4). Here we designate BC as the black cell which generates the ordered pair in equation (1); the gray cell (GC) generates the left signal only. The interconnect area is known to be high, but for an FPGA with large routing overhead to begin with, this is not as important as in a VLSI implementation. The regularity of the Kogge-Stone prefix network has built in redundancy which has implications for faulttolerant designs Fig.4: 16 bit Kogge-Stone. The sparse Kogge-Stone, shown in Figure 5, is also studied. This hybrid design completes the summation process with a 4 bit RCA allowing the prefix network to be simplified. Fig.5: 16 bit Sparse Kogge-Stone Another -tree known as the spanning tree -lookahead (CLA) is also examined [6]. Like the sparse Kogge-Stone, this design terminates with a 4- bit RCA. As the FPGA uses a fast -chain for the RCA, it is interesting to compare the performance of this with the sparse Kogge-Stone and regular Kogge- Stone s. Also of interest for the spanning-tree CLA is its testability features [7]. ISSN: 2231-5381 http://www.ijettjournal.org Page 437

Fig.6: 16-bit Spanning Tree Adder These s are implemented in verilog HDL in Xilinx ISE design suite and then verified using Xilinx Spartan3e FPGA through chip scope analyzer and simulation results are carried out using modelsim, the corresponding simulation results of the s are shown below figures. Fig.10: Kogge-Stone Adder Fig.11: Sparse Kogge-Stone Adder Fig.7: Ripple-Carry Adder Fig.12: Spanning Tree V. DISCUSSION OF RESULTS Fig.8: Carry-Select Adder The delays observed for designs from synthesis reports in Xilinx ISE synthesis reports are shown in Figure13. Fig.9: Carry-Skip Adder Fig.13: Simulation results for the designs ISSN: 2231-5381 http://www.ijettjournal.org Page 438

SI.NO. 1 2 3 4 5 6 The area of the designs is measured in terms of look up tables (LUT) and input output blocks (IOB) taken for Xilinx Spartan3e FPGA is plotted in the figure 14. Fig.14: LUT s and IOB s observed in Xilinx ISE From the table 1, the second column gives the designed names for sixteen bit. The third column represents the delay observed in Xilinx ISE tool. The fourth and fifth column represents area in terms of LUTs and IOBs observed in Xilinx ISE tool. The sixth column represents the power in milliwatts(mw) observed in Xpower analyzer in Xilinx tool and observed all the s have nearly same power. TABLE I. Comparison of delay, LUTs and IOBs for s Adder LUTs IOBs Delay Power Name Out of Out of (ns) (mw) (16 bit) 9312 232 Ripple select skip addr kogge stone sphares kogge spanning tree 24.686 32 50 80.98 17.501 41 50 80.98 24.841 40 50 80.98 14.041 94 50 80.98 17.527 57 65 80.98 21.72 39 65 80.98 The fourth and fifth columns give the area of the designs in terms of LUT s and IOB s. Out of all values, RCA has taken less number of LUT s. Out of four PPA s, KSA has less delay and has taken more LUT s. Out of four PPA s, STA has taken less area in terms of LUT s. Out of all s (mentioned), RCA has taken less area and CSA has more delay. The power for the all mentioned s is approximately 80.98 mill watts (mw) VI. CONCLUSION From the study of analysis done on area and power, we have concluded that the efficiency is improved by 56.87% in ours delay for RCA, when compared to KSA. The implementations that have been developed in this dissertation help to improve the design of parallel- prefix s and their associated computing architectures. This has the potential of impacting many application specific and general purpose computer architectures. Consequently, this work can impact the designs of many computing systems, as well as impacting many areas of engineers and science. REFERENCES [1] David H.K.Hoe, Chris Martinez and Sri Jyothsna Vundavalli, Design and Characterization of Parallel Prefix Adders using FPGAs, 2011 IEEE 43rd Southeastern Symposium in pp. 168-172, 2011. [2] N. H. E. Weste and D. Harris, CMOS VLSI Design, 4th edition, Pearson Addison-Wesley, 2011. [3] R. P. Brent and H. T. Kung, A regular layout for parallel s, IEEE Trans. Comput., vol. C-31, pp. 260-264, 1982. [4] D. Harris, A Taxonomy of Parallel Prefix Networks, in Proc. 37 th Asilomar Conf. Signals Systems and Computers, pp. 2213 7, 2003. [5] N. H. E. Weste and D. Harris, CMOS VLSI Design, 4th edition, Pearson Addison-Wesley, 2011. [6] D. Gizopoulos, M. Psarakis, A. Paschalis, and Y. Zorian, Easily Testable Cellular Carry Lookahead Adders, Journal of Electronic Testing: Theory and Applications 19, 285-298, 2003. [7] T. Lynch and E. E. Swartzlander, A Spanning Tree Carry Lookahead Adder, IEEE Trans. on Computers, vol. 41, no. 8, pp. 931-939, Aug. 1992. [8] K. Vitoroulis and A. J. Al-Khalili, Performance of Parallel Prefix Adders Implemented with FPGA technology, IEEE Northeast Workshop on Circuits and Systems, pp. 498-501, Aug. 2007. 172 [9] S. Xing and W. W. H. Yu, FPGA Adders: Performance Evaluation and Optimal Design, IEEE Design & Test of Computers, vol. 15, no. 1, pp. 24-29, Jan. 1998. [10] Beaumont-Smith, A, Cheng-Chew Lim, Parallel prefix design, Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium,pp. 218 225,2001.M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989 [11] K. Vitoroulis and A. J. Al-Khalili, Performance of Parallel Prefix Adders Implemented with FPGA technology, IEEE Northeast Workshop on Circuits and Systems, pp. 498-501, Aug. 2007. 172. [12] Avinash shrivastava and Chandrahas sahu, Performance Analysis of Parallel Prefix Adder Based on FPGA, IJETT Volume 21 Number 6 March 2015. ISSN: 2231-5381 http://www.ijettjournal.org Page 439