ISSN Vol.07,Issue.08, July-2015, Pages:

Similar documents
IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN

Efficient Multi-Operand Adders in VLSI Technology

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

An Optimized Design for Parallel MAC based on Radix-4 MBA

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

ISSN Vol.03,Issue.02, February-2014, Pages:

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

ISSN Vol.04, Issue.06, June-2016, Pages:

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

A Survey on Power Reduction Techniques in FIR Filter

A Review on Different Multiplier Techniques

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

Design of an optimized multiplier based on approximation logic

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Compressors Based High Speed 8 Bit Multipliers Using Urdhava Tiryakbhyam Method

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

AN ADVANCED VLSI ARCHITECTURE OF PARALLEL MULTIPLIER BASED ON HIGHER ORDER MODIFIED BOOTH ALGORITHM

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE

Design of high speed multiplier using Modified Booth Algorithm with hybrid carry look-ahead adder

Design and Implementation of High Radix Booth Multiplier using Koggestone Adder and Carry Select Adder

Novel Architecture of High Speed Parallel MAC using Carry Select Adder

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

MODIFIED BOOTH ALGORITHM FOR HIGH SPEED MULTIPLIER USING HYBRID CARRY LOOK-AHEAD ADDER

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Wallace Tree Multiplier Designs: A Performance Comparison Review

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Design and Analysis of CMOS Based DADDA Multiplier

Comparative Analysis of 16 X 16 Bit Vedic and Booth Multipliers

High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

Research Article Design of a Novel Optimized MAC Unit using Modified Fault Tolerant Vedic Multiplier

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

DESIGN OF LOW POWER MULTIPLIERS

Efficient FIR Filter Design Using Modified Carry Select Adder & Wallace Tree Multiplier

Faster and Low Power Twin Precision Multiplier

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

Performance Analysis of Multipliers in VLSI Design

Structural VHDL Implementation of Wallace Multiplier

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Design and Implementation of High Speed Carry Select Adder

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley Based Multiplier

A Novel Approach For Designing A Low Power Parallel Prefix Adders

Design and Implementation of Scalable Micro Programmed Fir Filter Using Wallace Tree and Birecoder

High Speed Non Linear Carry Select Adder Used In Wallace Tree Multiplier and In Radix-4 Booth Recorded Multiplier

Design and Performance Analysis of 64 bit Multiplier using Carry Save Adder and its DSP Application using Cadence

Implementation and Performance Analysis of different Multipliers

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

/$ IEEE

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

International Journal of Modern Engineering and Research Technology

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

International Journal of Advanced Research in Biology Engineering Science and Technology (IJARBEST)

International Journal of Advanced Research in Computer Science and Software Engineering

A Parallel Multiplier - Accumulator Based On Radix 4 Modified Booth Algorithms by Using Spurious Power Suppression Technique

HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS

Design of Digital FIR Filter using Modified MAC Unit

Optimized FIR filter design using Truncated Multiplier Technique

Design and Implementation of a delay and area efficient 32x32bit Vedic Multiplier using Brent Kung Adder

Low-Power Multipliers with Data Wordlength Reduction

Digital Integrated CircuitDesign

Area and Power Efficient Booth s Multipliers Based on Non Redundant Radix-4 Signed- Digit Encoding

Design of Efficient 64 Bit Mac Unit Using Vedic Multiplier

Design and Implementation of Digit Serial Fir Filter

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Design and Implementation of Complex Multiplier Using Compressors

Techniques to Optimize 32 Bit Wallace Tree Multiplier

Transcription:

ISSN 2348 2370 Vol.07,Issue.08, July-2015, Pages:1397-1402 www.ijatir.org Implementation of 64-Bit Modified Wallace MAC Based On Multi-Operand Adders MIDDE SHEKAR 1, M. SWETHA 2 1 PG Scholar, Siddartha Institute of Technology and Sciences, Hyderabad, TS, India. 2 Assistant Professor, Siddartha Institute of Technology and Sciences, Hyderabad, TS, India. Abstract: Although redundant addition is widely used to design parallel multioperand adders for ASIC implementations, the use of redundant adders on Field Programmable Gate Arrays (FPGAs) has generally been avoided. The main reasons are the efficient implementation of carry propagate adders (CPAs) on these devices (due to their specialized carry-chain resources) as well as the area overhead of the redundant adders when they are implemented on FPGAs. This project presents different approaches to the efficient implementation of generic carry-save compressor trees. In computing, especially digital signal processing, the multiply accumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier accumulator (MAC, or MAC unit); the operation itself is also often called a MAC or a MAC operation. Power dissipation is one of the most important design objectives in integrated circuit, after speed. Digital signal processing (DSP) circuits whose main building block is a Multiplier-Accumulator (MAC) unit. High speed and low power MAC unit is desirable for any DSP processor. This is because speed and throughput rate are always the concerns of DSP system. MAC unit consists of adder, multiplier, and an accumulator it preserves a unique mapping between input and output vector of the particular circuit. In this MAC operation is performed in two parts Partial Product Generation (PPG) circuit and Multi- Operand Addition (MOA) circuit. In the proposed scheme, we are using Modified Wallace tree multiplier which reduces the hardware complexity. As the proposed system requires less number of resources, we optimize the power consumption. In this project, a new MAC is designed based on modified Wallace tree multiplier along with Multi operand adder. Keywords: MAC, Modified Wallace Tree Multiplier, Cpas, Xilinx ISE, Verilog. I. INTRODUCTION The main reasons are the efficient implementation of carry propagate adders (CPAs) on these devices (due to their specialized carry-chain resources) as well as the area overhead of the redundant adders when they are implemented on FPGAs. Field Programmable Gate Arrays (FPGAs) are semiconductor devices that are based around a matrix of configurable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be reprogrammed to desired application or functionality requirements after manufacturing. This feature distinguishes FPGAs from Application Specific Integrated Circuits (ASICs), which are custom manufactured for specific design tasks. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC).This paper presents different approaches to the efficient implementation of generic carry-save compressor trees on FPGAs. They present a fast critical path, independent of bit width, with practically no area overhead compared to CPA trees. Along with the classic carry-save compressor tree, we present a novel linear array structure, which efficiently uses the fast carry-chain resources. This approach is defined in a parameterizable HDL code based on CPAs, which makes it compatible with any FPGA family or vendor. A detailed study is provided for a wide range of bit widths and large number of operands. Compared to binary and ternary CPA trees, increases speed ups for 16-bit width. II. COMPRESSORS A multiplier is one of the key hardware blocks in most digital and high performance systems such as FIR filters, digital signal processor, microprocessors etc. With advances in technology, many researchers have tried and strive to design multipliers which offer either of the following- high speed, low power consumption, less area combination of them in multipliers, thus making them compatible for various high speed, low power, and compact VLSI implementations. However, area and speed are two conflicting constraints. Therefore, improving speed always results in larger area. The most efficient multiplier structure will vary depending on the throughput requirement of the application. The first step of the design process is the selection of the optimum circuit structure. The combined factors of low power, low transistor count and minimum delay makes the 5:2 and 4:2 compressors, the appropriate choice. In these compressors, the outputs generated at each stage are efficiently used by replacing the XOR blocks with multiplexer blocks.the select bits to the multiplexers are available much ahead of the inputs so that the critical path delay is minimized. The various adder structures in the conventional architecture are replaced by compressors. The use of two full adders would introduce a delay of 4 whereas the use of 4:2 compressors reduces the latency to 3 as Copyright @ 2015 IJATIR. All rights reserved.

shown in Fig.1. Two full adders are replaced by a single 4:2 compressor. MIDDE SHEKAR, M. SWETHA wires i.e. carry. If there are two rows of the same weight left, input them into a half adder. If there is just one row left, connect it to the next layer. The advantage of the Wallace tree is that there are only O(log n) reduction layers (levels), and each layer has O(1) propagation delay. As making the partial products is O(1) and the final addition is O(log n), the multiplication is only O(log n), not much slower than addition (however, much more expensive in the gate count). For adding partial products with regular adders would require O(log n2 ) time. Fig.1. 4:2 Compressor. III. Wallace Tree Multiplier A Wallace tree multiplier is an efficient hardware implementation of a digital circuit that multiplies two integers devised by an Australian computer scientist Chris Wallace. Wallace tree reduces the no. of partial products and use carry select adder for the addition of partial products. IV. MODIFIED WALLACE TREE MULTIPLIER A modified Wall ace multiplier is an efficient hardware implementation of digital circuit multiplying two integers. Generally in conventional Wallace multipliers many full adders and half adders are used in their reduction phase. Half adders do not reduce the number of partial product bits. Therefore, minimizing the number of half adders used in a multiplier reduction will reduce the complexity. Hence, a modification to the Wallace reduction is done in which the delay is the same as for the conventional Wallace reduction. The modified reduction method greatly reduces the number of half adders with a very slight increase in the number of full adders. Reduced complexity Wall ace multiplier reduction consists of three stages. First stage the N x N product matrix is formed and before the passing on to the second phase the product matrix is rearranged to take the shape of inverted pyramid. During the second phase the rearranged product matrix is grouped into non-overlapping group of three as shown in the figure 2, single bit and two bits in the group will be passed on to the next stage and three bits are given to a full adder. The number of rows in the in each stage of the reduction phase is calculated by the formula Fig.2. 8 bit 8 bit wallace tree multiplier. In this fig.2 blue circle represent full adder and red circle represent the half adder. Wallace tree has three steps. Multiply each bit of multiplier with same bit position of multiplicand. Depending on the position of the multiplier bits generated partial products have different weights. Reduce the number of partial products to two by using layers of full and half adders. After second step we get two rows of sum and carry, add these rows with conventional adders. As long as there are three or more rows with the same weight add following layers. Take any three rows with the same weights and input them into a full adder. The result will be an output row of the same weight i.e. sum and an output row with a higher weight for each three input (2) If the value calculated from the above equation for number of rows in each stage in the second phase and the number of row that are formed in each stage of the second phase does not match, only then the half adder will be used. The final product of the second stage will be in the height of two bits and passed onto the third stage. During the third stage the output of the second stage is given to the carry propagation adder to generate the final output. Thus 64 bit modified Wallace multiplier is constructed and the total number of stages in the second phase is 10. As per the equation the number of row in each of the 10 stages was calculated and the use of half adders was restricted only to the 10 th stage. The total number of half adders used in the second phase is 8 and the total number of full adders that was used during the second phase is slightly increased that in the conventional Wallace multiplier as shown in Fig.3. Since the 64 bit modified Wallace multiplier is difficult to represent, a typical l0-bit by 10-bit reduction shown in fig.4 for understanding. The modified Wallace tree shows better performance when carry save adder is used in final stage instead of ripple carry adder. The carry save adder which is used is considered to be the critical part in the multiplier because it is responsible for the largest amount of computation. (1)

Implementation of 64-bit Modified Wallace MAC based on Multi-operand Adders Fig.5. n-bit width cs 9:2 compressor tree based on a linear array. Fig.3. Modified Wallace Reduction Process. Fig.6. Time model of the proposed cs 9:2 compressor tree. Fig.4.Block diagram of modified wallace tree multiplier. V. REGULAR CS COMPRESSOR TREE DESIGN The classic design of a multi operand CS compressor tree attempts to reduce the number of levels in its structures as shown in Figs.5 to 10. The 3:2 counters or the 4:2 compressors are the most widely known building blocks to implement it. Fig.7. Critical path of the proposed 9:2 compressor tree for linear array behavior.

Fig.8. 5:3 compressor. MIDDE SHEKAR, M. SWETHA VI. ARCHITECTURE OF MAC UNIT Multiplier-Accumulator (MAC) operation is an important operation for many DSP and video processing applications. On FPGAs, multi-input addition has traditionally been implemented using trees of carry-propagate adders. This approach has been used because the traditional look up table (LUT) structure of FPGAs is not amenable to compressor trees, which are used to implement multi-input addition and parallel multiplication in ASIC technology. In prior work, we developed a greedy heuristic method to map compressor trees onto the general logic of an FPGA. Although redundant addition is widely used to design parallel multi operand adders for ASIC implementations, the use of redundant adders on Field Programmable Gate Arrays (FPGAs) has generally been avoided. MAC unit is an inevitable component in many digital signal processing (DSP) applications involving multiplications and/or accumulations. MAC unit is used for high performance digital signal processing systems. The DSP applications include filtering, convolution, and inner products. Most of digital signal processing methods use nonlinear functions such as discrete cosine transform (DCT) or discrete wavelet transforms (DWT). Because they are basically accomplished by repetitive application of multiplication and addition, the speed of the multiplication and addition arithmetic determines the execution speed and performance of the entire calculation. Fig.9. 15:4 compressor. Fig.10. Transformation of n-bit width 9:2 linear array compressor tree. Fig.11. Architecture of 64 Bit Mac. Multiplication-and-accumulate operations are typical for digital filters. Therefore, the functionality of the MAC unit enables high-speed filtering and other processing typical for DSP applications. Since the MAC unit operates completely independent of the CPU, it can process data separately and thereby reduce CPU load. The application like optical communication systems which is based on DSP, require extremely fast processing of huge amount of digital data. The Fast Fourier Transform (FFT) also requires addition and multiplication. 64 bit can handle larger bits and have more memory as shown in Fig.11. A MAC unit consists of a multiplier and an accumulator containing the sum of the previous successive products. The MAC inputs are obtained from the memory location and given to the multiplier. A multiplier is one of the key hardware blocks in most digital and high performance systems such as FIR filters, micro processors and digital signal processors etc. A system's performance is generally determined by the performance of the multiplier because the multiplier is generally the slowest element in the whole system and also it is occupying more area consuming.

Implementation of 64-bit Modified Wallace MAC based on Multi-operand Adders VII. SIMULATION RESULTS Simulation results of this paper is shown in bellow Figs.12 to 17. A. Schematic Diagrams of different Compressors Fig.15. Schematic diagram of 15:4 compressor. Fig.12. Schematic diagram of 5:3 compressor. Fig.13. Schematic diagram of 11:2 compressor. Fig.16. By using above compressor we designed a MAC unit. Fig.14. Schematic diagram of 9:2 compressor. Fig.17. Waveform.

VIII. CONCLUSION Efficiently implementing MAC on FPGA, in terms of area and speed, is made possible by using the specialized carry-chains of these devices in a novel way. Similar to what happens when using ASIC technology, the proposed CS linear array compressor trees lead to marked improvements in speed compared to CPA approaches and, in general, with no additional hardware cost. Furthermore, the proposed high-level definition of CSA arrays based on CPAs facilitates ease-of-use and portability, even in relation to future FPGA architectures, because CPAs will probably remain a key element in the next generations of FPGA. AAs compare to conventional multiplier number of hardware components are less there by area over head can be reduced cost is less. In future we can extend it to implement as ALU. The functionality is verified through XILINX ISE using VERILOG HDL. IX. REFERENCES [1].Young-Ho Seo and Dong-Wook Kim, "New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm," IEEE Transactions on very large scale integration (vlsi) systems, vol. 18, no. 2,february 2010. [2]. Ron S. Waters and Earl E. Swartzlander, Jr., "A Reduced Complexity Wallace Multiplier Reduction, " IEEE Transactions On Computers, vol. 59, no. 8, Aug 2010. [3]. C. S. Wallace, "A suggestion for a fast multiplier," ieee Trans. ElectronComput., vol. EC-13, no. I, pp. 14-17, Feb. 1964. [4]. Shanthala S, Cyril Prasanna Raj, Dr.S.Y.Kulkarni, "Design and VLST Implementation of Pipelined Multiply Accumulate Unit," IEEE International Conference on Emerging Trends in Engineering and Technology, ICETET-09. [5] B. Cope, P. Cheung, W. Luk, and L. Howes, Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study, IEEE Trans. Computers, vol. 59, no. 4, pp. 433-448, Apr. 2010. [6] S. Dikmese, A. Kavak, K. Kucuk, S. Sahin, A. Tangel, and H. Dincer, Digital Signal Processor against Field Programmable Gate Array Implementations of Space-Code Correlator Beamformer for Smart Antennas, IET Microwaves, Antennas Propagation, vol. 4, no. 5, pp. 593-599, May 2010. [7] F. Schneider, A. Agarwal, Y.M. Yoo, T. Fukuoka, and Y. Kim, A Fully Programmable Computing Architecture for Medical Ultrasound Machines, IEEE Trans. Information Technology in Biomedicine, vol. 14, no. 2, pp. 538-540, Mar. 2010.. [8] J.S. Kim, L. Deng, P. Mangalagiri, K. Irick, K. Sobti, M. Kandemir, V. Narayanan, C. Chakrabarti, N. Pitsianis, and X. Sun, An Automated Framework for Accelerating Numerical Algorithms on Reconfigurable Platforms Using Algorithmic/Architectural Optimization, IEEE Trans. Computers, vol. 58, no. 12, pp. 1654-1667, Dec. 2009. [9] H. Lange and A. Koch, Architectures and Execution Models for Hardware/Software Compilation and their MIDDE SHEKAR, M. SWETHA System-Level Realization, IEEE Trans. Computers, vol. 59, no. 10, pp. 1363-1377, Oct. 2010.