A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Similar documents
ISSN Vol.07,Issue.08, July-2015, Pages:

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

An Optimized Design for Parallel MAC based on Radix-4 MBA

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Structural VHDL Implementation of Wallace Multiplier

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Mahendra Engineering College, Namakkal, Tamilnadu, India.

ISSN Vol.03,Issue.02, February-2014, Pages:

Design of an optimized multiplier based on approximation logic

Novel Architecture of High Speed Parallel MAC using Carry Select Adder

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley Based Multiplier

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

MODIFIED BOOTH ALGORITHM FOR HIGH SPEED MULTIPLIER USING HYBRID CARRY LOOK-AHEAD ADDER

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Design of high speed multiplier using Modified Booth Algorithm with hybrid carry look-ahead adder

Design and Analysis of CMOS Based DADDA Multiplier

Implementation and Performance Analysis of different Multipliers

AN ADVANCED VLSI ARCHITECTURE OF PARALLEL MULTIPLIER BASED ON HIGHER ORDER MODIFIED BOOTH ALGORITHM

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Faster and Low Power Twin Precision Multiplier

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Tirupur, Tamilnadu, India 1 2

Design of Efficient 64 Bit Mac Unit Using Vedic Multiplier

DESIGNING OF MODIFIED BOOTH ENCODER WITH POWER SUPPRESSION TECHNIQUE

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

An Area Efficient and High Speed Reversible Multiplier Using NS Gate

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

A NOVEL WALLACE TREE MULTIPLIER FOR USING FAST ADDERS

Low Power FIR Filter Structure Design Using Reversible Logic Gates for Speech Signal Processing

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

A Survey on Power Reduction Techniques in FIR Filter

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

/$ IEEE

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

VHDL Code Generator for Optimized Carry-Save Reduction Strategy in Low Power Computer Arithmetic

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

Research Article Design of a Novel Optimized MAC Unit using Modified Fault Tolerant Vedic Multiplier

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

Design and Implementation of Scalable Micro Programmed Fir Filter Using Wallace Tree and Birecoder

HIGHLY RELIABLE LOW POWER MAC UNIT USING VEDIC MULTIPLIER

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE

A Novel Approach For Designing A Low Power Parallel Prefix Adders

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

A Review on Different Multiplier Techniques

Multiplier and Accumulator Using Csla

Low-Power Multipliers with Data Wordlength Reduction

ISSN:

A Parallel Multiplier - Accumulator Based On Radix 4 Modified Booth Algorithms by Using Spurious Power Suppression Technique

Digital Integrated CircuitDesign

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

Design of Parallel MAC Based On Radix-4 & Radix-8 Modified Booth Algorithm

HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS

International Journal of Modern Engineering and Research Technology

Design and Simulation of 16x16 Hybrid Multiplier based on Modified Booth algorithm and Wallace tree Structure

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Key words High speed arithmetic, error tolerant technique, power dissipation, Digital Signal Processi (DSP),

ISSN Vol.02, Issue.11, December-2014, Pages:

Design and Implementation of a delay and area efficient 32x32bit Vedic Multiplier using Brent Kung Adder

Design and Implementation of High Speed Carry Select Adder

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

VLSI Designing of High Speed Parallel Multiplier Accumulator Based On Radix4 Booths Multiplier

Implementation of High Speed and Low Area Digital Radix-2 CSD Multipliers using Pipeline Concept

Pipelined Linear Convolution Based On Hierarchical Overlay UT Multiplier

Compressors Based High Speed 8 Bit Multipliers Using Urdhava Tiryakbhyam Method

Implementation of Truncated Multiplier for FIR Filter based on FPGA

Optimized FIR filter design using Truncated Multiplier Technique

Efficient FIR Filter Design Using Modified Carry Select Adder & Wallace Tree Multiplier

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

Comparative Analysis of Various Adders using VHDL

Comparative Analysis of 16 X 16 Bit Vedic and Booth Multipliers

Jayaprakash et al., International Journal of Advanced Engineering Technology E-ISSN

Reduced Redundant Arithmetic Applied on Low Power Multiply-Accumulate Units

Ajmer, Sikar Road Ajmer,Rajasthan,India. Ajmer, Sikar Road Ajmer,Rajasthan,India.

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

Design of 8-4 and 9-4 Compressors Forhigh Speed Multiplication

ISSN: X International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 1, Issue 5, November 2012

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

FPGA IMPLEMENATION OF HIGH SPEED AND LOW POWER CARRY SAVE ADDER

Performance Analysis of Multipliers in VLSI Design

Anitha R 1, Alekhya Nelapati 2, Lincy Jesima W 3, V. Bagyaveereswaran 4, IEEE member, VIT University, Vellore

Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2

Transcription:

Proceedings of International Conference on Emerging Trends in Engineering & Technology (ICETET) 29th - 30 th September, 2014 Warangal, Telangana, India (SF0EC024) ISSN (online): 2349-0020 A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier 1 CH Murthy, and 2 T. Santhosh Kumar 1 2 Department. of ECE, MLR Institute of Technology, Dundigal, Hyderabad, India 1 sriram.t1984@gmail.com and 2 santoshkumar.tula@gmail.com ABSTRACT A Novel High Performance Design of 64 bit Multiplier-and-Accumulator (MAC) Unit is implemented in this paper by using Modified Wallace Tree Multiplier and Carry Save Adder. MAC unit performs many important Arithmetic and other operations in many of the digital signal processing (DSP) applications. This 64-bit MAC Unit is suitable to use in 64-bit DSP Processors. The Project is coded in Verilog-HDL. The Power Dissipation of entire MAC Unit is 182.312 mw. Key words Modified Wallace Multiplier, Carry Save Adder, Multiplier and Accumulator (MAC), Digital Signal Processor (DSP), Verilog-HDL. I. Introduction MAC Unit is a common step in many Digital Signal Processing (DSP) applications involving multiplications and/or accumulations. Modern computers have high performance dedicated MAC unit consisting of Multiplier, adder and an accumulator register to store the result [1]. Particularly this MAC Unit is used for high performance digital signal processing systems. Where, the DSP applications include Digital Filtering, Convolution, and Computing of Inner Products. This MAC Unit is Isolated from CPU so that operations such as multiplications and/or additions are done separately, by reducing the CPU Load. Fast Fourier Transform (FFT) Operations are performed by MAC Unit because FFT requires addition and/or Multiplication operations [2]. A MAC unit consists of a multiplier, adder and an accumulator containing the sum of the previous successive products. The MAC Unit obtain inputs from the memory location such as RAM and given to the Multiplier.MAC Unit is used in DSP Applications that uses discrete cosine transform (DCT) or discrete wavelet transforms (DWT). Where, Multiplication is accomplished by repetitive application of addition, the speed of the multiplication and addition arithmetic determines the execution speed and performance of the entire Calculation. The functionality of the MAC unit enables highspeed filtering and other processing which are typical for DSP applications. Particularly, in applications like optical Communication Systems which is based on DSP, require extremely fast processing of huge amount of digital data. The Design of MAC Unit in this Paper Consists of 64 bit Modified Wallace Tree Multiplier, 128 bit Carry Save Adder (CSA) and 129 bit Accumulator. This Paper is divided into Five Sections. In the First section the introduction about MAC unit is discussed. In the Second section the detailed operation of MAC unit is described. The Third section deals with the basic building blocks of MAC Unit. In the Fourth section, Results and Comparisons of various MAC units are given. Finally the Conclusion is made in the Fifth section. II. MAC UNIT The Multiplier-Accumulator (MAC) operation is the key operation not only in DSP applications but also in multimedia information processing and various other applications. As mentioned above, MAC unit consist of multiplier, adder and register/accumulator. ICETET 2014 SF0EC024 P a g e 142

In this paper, we used 64 bit modified Wallace multiplier. The MAC Unit take inputs from the memory location such as RAM and given to the multiplier block. This is very useful in 64 bit digital signal processor. The inputs which is being fed from the memory location is 64 bit. When the input is given to the multiplier it starts computing value for the given 64 bit input and hence the output will be 128 bits. The multiplier output is given as the input to carry save adder (CSA) which performs addition. The function of the MAC unit is given by the following equation 63 Y = Ai x Bi (1) i=0 where, Ai & Bi are two 64 bit input Operands, Y is the output of MAC Unit and i is a 64 bit value. This Equation performs Summation of partial products. The Carry Save Adder (CSA) produces 129 bit output. Since, one bit is for the carry (128 bits + 1 bit). Then, the output of CSA is given to the accumulator register. The accumulator used is designed with Parallel in Parallel out (PIPO) Type [3]. Because the CSA Produces output in Parallel form and also the bits are huge. PIPO register is used where the input bits are given in parallel and output is taken in parallel. The output of the accumulator register is taken out or fed back as one of the input to the CSA. Fig. 1 shows the basic architecture of MAC unit. A Wallace tree is an efficient hardware implementation of a digital circuit that multiplies two integers. The Wallace tree has three steps: Multiply (that is - AND) each bit of one of the arguments, by each bit of the other. Secondly, reduce the number of partial products to two by layers of full and half adders. Finally, Group the wires in two numbers, and add them with a conventional adder. The Benefit of Wallace Tree is that there are only few layers and thereby reducing the propagation delay. But, As the number of bits to multiply increases, no. of adders (full adders & half adders) increases thereby increasing propagation delay [4]. In this Paper, Modified Wallace Tree Multiplier is proposed where no. of half adders get reduced and therefore circuit complexity reduces, power dissipation also decreases at a reduced propagation delay. Modified Wallace Tree Multiplier has three stages, to achieve the desired functionality. Initially, the N x N Product Matrix is formed and rearranged to take the shape of inverted pyramid. Secondly, the rearranged product matrix is grouped into nonoverlapping group of three parts. Single bit and two bits column in the group will be passed to the next stage and three bits column is given to a full adder circuit[5]. The number of rows (R) in each stage of the reduction stage is calculated as R i+1 = 2 [R i / 3] + R i Mod 3 (2) If R i Mod 3 = 0, Then R i+1 = 2 [R i / 3] (3) If the Calculated value from the above equation for number of rows in the each stage in the 2 nd phase and the number of rows that are formed in each stage of 2 nd phase does not match, only then the Half Adder will be used [6]. Figure 1: Block Diagram of MAC Unit. III. Modified Wallace Multiplier And Carry Save Adder The Final Product of the 2nd Stage will be in the Height of two bits and passed on to the 3rd Stage. During the Third Stage the Output of Second Stage is then given to the carry propagation adder to generate the final output. ICETET 2014 SF0EC024 P a g e 143

Figure 2: A 10 - bit Modified Wallace Tree Multiplier. Figure 3: A Typical 8 bit Carry Save Adder Thus 64 bit Modified Wallace Tree Multiplier is constructed and the total number of stages in the second phase is ten. As per the equation the number of row in each of the ten stages was calculated and the use of half adders was restricted only to the Tenth stage. The total number of half adders used in the second phase is eight and the total number of full adders that was used during the second phase is slightly increased that in the Conventional Wallace Tree Multiplier [7]. Here, we compute the sum of two 128 bit binary numbers so 128 half adders at the first stage is required instead of 128 full adders. Since, we add bits of two binary numbers only [9]. If, P and Q are two 128 bit numbers then it produces the partial products and carry Si and Ci respectively. Where, Si = Pi Qi Ci = Pi. Qi Since, the 64 bit Modified Wallace Tree Multiplier Representation is very difficult, A Typical 10 bit by 10 bit reduction example is as shown in Figure 2. the Modified Wallace Tree shows better Performance when CSA is used in final stage instead of Ripple Carry Adder (RCA)[8]. The Carry Save Adder (CSA) is a type of Digital Adder, used to compute the sum of three or more number of bits in binary form. CSA gives less propagation delay and the Glitching problem in RCA is also avoided. Since, the Representation of 128 bit CSA is very difficult, A Typical example of 8 bit CSA is shown in Figure 3. However, a CSA Produces all the output values in parallel. so that, the computation time is reduced compared to RCA. Also, Parallel in Parallel Out (PIPO) is used in Accumulator Stage [10]. IV. RESULTS The Design is developed using Verilog - HDL and Synthesized using Xilinx 13.2 ISE. As a previous work different MAC Units were developed using different combination of multipliers and adders. A Comparison of different multipliers used such as (i) Modified Booth Algorithm, (ii) Dadda Multiplier, ICETET 2014 SF0EC024 P a g e 144

(iii) Wallace Multiplier. The different adders used are (i) Carry Save Adder, (ii) Carry Select Adder. The Area, Power Dissipation and Delay comparison of different MAC Units are shown in Figure 4 as a graph and in tabular form in Table 1. Figures 4 and 5 shows the cell area and total power dissipation of different MAC unit. These figures clearly indicates that the MAC unit which are developed using Wallace Tree Multiplier Requires less area, dissipate less power and also delay caused by it is also less when compared other MAC unit which are done either by Modified Booth or Dadda multiplier. Hence a 64 bit MAC unit is designed using Modified Wallace Multiplier and Carry Save Adder. The simulation waveform result of 64 bit MAC unit in shown in the figure 6. In the simulation result it is observed that there is a delay by one clock cycle, this is because it takes some time to compute since the multiplier used here is 64 bit and the adder used here is 128 bits. Table 1: Area and Delay Comparison of MAC Units Figure 4: Graph Comparison of Cell Area in µm 2 Figure 5: Comparison of Total Power Dissipation in mw. ICETET 2014 SF0EC024 P a g e 145

REFERENCES [1] C.S.Wallace " A Suggestion for a fast multipliers, "IEEE Transaction Electronic Computers, Vol 13, Feb 1967. [2] C.S.Wallace "Asuggestion for a fast multiplier" IEEE Transaction Electronic Computers, Vol EC-13, Feb 1964. Table 2: Power Dissipation of Different MAC Units [3] Young-Ho Seo and Dong-Wook Kim, "New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix -2 Algorithm. "IEEE Transactions on very large scale integration systems, Vol. 18, Feb 2010". [4] Shanthala S, Dr. Cyril Prasanna Raj, Dr. S Y Kulkarni, " Design and Implementation of Pipelined MAC " IEEE Int. Conf. on Emerging trends in Eng. & Tech., ICETET - 09. [5] Low Voltage Low Power VLSI Sub Systems by Kiat Seng Yeo Kaushik Roy. [6] Dadda, " Some Schemes for Parallel Multipliers," Alta N Frequenza, Vol,34, 1965. Figure 6: Simulation Waveform of 64-bit MAC Unit V. CONCLUSIONS Hence, a High Performance 64 bit MAC Unit is designed and implemented using Modified Wallace Tree Multiplier and Carry Save Adder. When compared to all other MAC Units which are developed earlier using different combinations of multipliers and adders, the designed Modified Wallace Tree Multiplier offers High Performance with Less Area, Less Power Dissipation with Less Propagation Delay, which further increases the overall speed of MAC Unit. This MAC Unit is designed using Verilog - HDL and Synthesized using Xilinx 13.2 ISE. [7] L. Dadda, "On Parallel Digital Multiplier ", Alta N Frequen Vol.45, pp 574-580, 1976. [8] V.G.Oklobdzija, "High-Speed VLSI Arithmetc Units:Adders and Multipliers" A.Chandrakasan, IEEE Press, 2000. [9] W.J. Townsend, E.E. Swartzlander Jr. and Dr. D.A Abraham "A Comparison of Dadda and Wallace Multiplier", 2003. [10] FabrizioLamberti and NikosAndrikos "Reducing the time of computation in two's compliment multipliers", vol 60, Feb 2011. ICETET 2014 SF0EC024 P a g e 146