Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Similar documents
Mahendra Engineering College, Namakkal, Tamilnadu, India.

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

/$ IEEE

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

A Parallel Multiplier - Accumulator Based On Radix 4 Modified Booth Algorithms by Using Spurious Power Suppression Technique

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Design of Parallel MAC Based On Radix-4 & Radix-8 Modified Booth Algorithm

Novel Architecture of High Speed Parallel MAC using Carry Select Adder

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

An Optimized Design for Parallel MAC based on Radix-4 MBA

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

ISSN Vol.03,Issue.02, February-2014, Pages:

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Review of Booth Algorithm for Design of Multiplier

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

VLSI Designing of High Speed Parallel Multiplier Accumulator Based On Radix4 Booths Multiplier

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Design and Implementation of FPGA Radix-4 Booth Multiplication Algorithm

ISSN Vol.07,Issue.08, July-2015, Pages:

Modified Partial Product Generator for Redundant Binary Multiplier with High Modularity and Carry-Free Addition

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Comparison of Conventional Multiplier with Bypass Zero Multiplier

CHAPTER 1 INTRODUCTION

Design of high speed multiplier using Modified Booth Algorithm with hybrid carry look-ahead adder

AN ADVANCED VLSI ARCHITECTURE OF PARALLEL MULTIPLIER BASED ON HIGHER ORDER MODIFIED BOOTH ALGORITHM

Design and Implementation of High Radix Booth Multiplier using Koggestone Adder and Carry Select Adder

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

Digital Integrated CircuitDesign

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Ajmer, Sikar Road Ajmer,Rajasthan,India. Ajmer, Sikar Road Ajmer,Rajasthan,India.

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

MODIFIED BOOTH ALGORITHM FOR HIGH SPEED MULTIPLIER USING HYBRID CARRY LOOK-AHEAD ADDER

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

REVIEW ARTICLE: EFFICIENT MULTIPLIER ARCHITECTURE IN VLSI DESIGN

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

Design of an optimized multiplier based on approximation logic

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN

Performance Analysis of Multipliers in VLSI Design

Keywords: Column bypassing multiplier, Modified booth algorithm, Spartan-3AN.

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

Structural VHDL Implementation of Wallace Multiplier

ADVANCES in NATURAL and APPLIED SCIENCES

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

DESIGNING OF MODIFIED BOOTH ENCODER WITH POWER SUPPRESSION TECHNIQUE

ISSN Vol.02, Issue.11, December-2014, Pages:

Compressors Based High Speed 8 Bit Multipliers Using Urdhava Tiryakbhyam Method

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder

International Journal of Advanced Research in Computer Science and Software Engineering

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley Based Multiplier

Efficient FIR Filter Design Using Modified Carry Select Adder & Wallace Tree Multiplier

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Implementation of Efficient 16-Bit MAC Using Modified Booth Algorithm and Different Adders

A Review on Different Multiplier Techniques

High Performance Low-Power Signed Multiplier

A Compact Design of 8X8 Bit Vedic Multiplier Using Reversible Logic Based Compressor

Area and Power Efficient Booth s Multipliers Based on Non Redundant Radix-4 Signed- Digit Encoding

ABSTRACT: Saroornagar Rangareddy, Telangana, India 3 Associate Professor, HOD,Dept of ECE, TKR College of Engineering and Technology,

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

Design and Simulation of Low Power and Area Efficient 16x16 bit Hybrid Multiplier

Reconfigurable High Performance Baugh-Wooley Multiplier for DSP Applications

AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

DESIGN OF HIGH SPEED 32 BIT UNSIGNED MULTIPLIER USING CLAA AND CSLA

High Performance 128 Bits Multiplexer Based MBE Multiplier for Signed-Unsigned Number Operating at 1GHz

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Faster and Low Power Twin Precision Multiplier

A Novel Approach For Designing A Low Power Parallel Prefix Adders

Multiplier and Accumulator Using Csla

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Design and Implementation of Complex Multiplier Using Compressors

Pipelined Linear Convolution Based On Hierarchical Overlay UT Multiplier

Transcription:

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm Vijay Dhar Maurya 1, Imran Ullah Khan 2 1 M.Tech Scholar, 2 Associate Professor (J), Department of ECE, Integral University, Lucknow, U.P. India 1 mauryavijay27@gmail.com, 2 iukhan@iul.ac.in Abstract: This paper is all about to implementation of Multiplier-and-accumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was raised. The proposed CSA tree uses 1 scomplement-based radix-2 modified Booth s algorithm (MBA) and has the modified array for the sign extension in order to escalation the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. Keywords: Booth Multiplier, Carry Save Adder (CSA) Tree, Computer Arithmetic, Digital Signal Processing (DSP), Multiplier and- Accumulator (MAC). I. INTRODUCTION With the recent rapid advances in multimedia and communication systems, real-time signal processing s like audio signal processing, video/image processing, or large-capacity data processing are increasingly being demanded. The multiplier and multiplier-andaccumulator (MAC) [2] are the essential elements of the digital signal processing such as filtering, convolution, and inner products. Most digital signal processing methods use nonlinear functions such as discrete cosine transform (DCT) [3] or discrete wavelet transform (DWT) [4]. Because they are basically accomplished by repetitive application of multiplication and addition, the speed of the multiplication and addition arithmetic s determines the execution speed and performance of the entire calculation. Because the multiplier requires the longest delay among the basic operational blocks in digital system, the critical path is determined by the multiplier, in general. For high-speed multiplication, the modified radix-4 Booth s algorithm (MBA) [5] is commonly used. However, this cannot completely solve the problem due to the long critical path for multiplication [6], [7]. In general, a multiplier uses Booth s algorithm [8] and array of full adders (FAs), or Wallace tree [9] instead of the array of FAs., i.e., this multiplier mainly consists of the three parts: Booth encoder, a tree to compress the partial products such as Wallace tree, and final adder [10], [11]. Because Wallace tree is to add the partial products from encoder as parallel as possible, its operation time is proportional to, where N is the number of inputs. It uses the fact that counting the number of 1 s among the inputs reduces the number of outputs into. In real implementation, many (3:2) or (7:3) counters are used to reduce the number of outputs in each pipeline step. The most effective way to increase the speed of a multiplier is to reduce the number of the partial products because multiplication precedes a series of additions for the partial products. To reduce the number of calculation steps for the partial products, MBA algorithm has been applied mostly where Wallace tree has taken the role of increasing the speed to add the partial products. To increase the speed of the MBA algorithm, many parallel multiplication architectures have been researched [11] [12]. In this paper, a new architecture for a high-speed MAC is proposed. In this MAC, the computations of multiplication and accumulation are combined and a hybrid-type CSA structure is proposed to reduce the critical path and improve the output rate. It uses MBA algorithm based on 1 s complement number system. A modified array structure for the sign bits is used to increase the density of the operands. A carry lookahead adder (CLA) is inserted in the CSA tree to reduce the number of bits in the final adder. In addition, in order to increase the output rate by optimizing the pipeline efficiency, intermediate calculation results are accumulated in the form of sum and carry instead of the final adder outputs. Fig. 1. Basic Arithmetic Steps of Multiplication and Accumulation 35

This paper is organized as follows. In Section II, Present Schemes Used general MAC will be given, and the architecture for Booth s Recoding Algorithm will be described in Section III. In Section V, the implementation and result will be analyzed and the characteristic of the proposed MAC will be shown. Finally, the conclusion will be given in Section VI. 36 II. PRESENT SCHEMES USED There are different methods present in this domain such as A. Binary Multiplication B. Array Multiplier C. Multiplier and Accumulator Unit There are lot of disadvantages in the previous methods, such as low performance when number of bits are increased, there is a chance of mismatch of connection to perform different multiplications, additions with carry. A. Binary Multiplication: In the binary number system the digits, called bits, are limited to the set [1, 2]. The result of multiplying any binary number by a single binary bit is either 0, or the original number. This makes forming the intermediate partial-products simple and efficient. Summing these partial-products is the time consuming task for binary multipliers. One logical approach is to form the partialproducts one at a time and sum them as they are generated. Often implemented by software on processors that do not have a hardware multiplier, this technique works fine, but is slow because at least one machine cycle is required to sum each additional partial product. For applications where this approach does not provide enough performance, multipliers can be implemented directly in hardware. The two main categories of binary multiplication include signed and unsigned numbers. Digit multiplication is a series of bit shifts and series of bit additions, where the two numbers, the multiplicand and the multiplier are combined into the result. Fig. 2. Multiplication Process Considering the bit representation of the multiplicand x = xn-1..x1 x0 and the multiplier y = yn-1..y1y0 in order to form the product up to n shifted copies of the multiplicand are to be added for unsigned multiplication. The entire process consists of three steps, partial product generation, partial product reduction and final addition. B. Array Multiplier: A 4 x 4 array multiplier and the functions of M0, M1, M2, and M4 (M s are either half adders or full adders) are shown in figure 3. X3X2X1X0 is the 4 bit multiplicand and Y3Y2Y1Y0 is the 4 bit multiplier. Full adder is the important component in each cell. Each cell consist of AND gate, which determines whether a multiplicand bit, Xj is added to the incoming partial product bit based on the value of the multiplier bit Yi. PPi is unchanged and passed vertically downward if Yi=0,else each row adds the multiplicand (appropriately shifted) to the incoming partial product, PPi to generate the outgoing partial product PP (i+1). The path from the upper right corner of the array to the high order product bit output at the bottom left corner of the array is the worst case signal propagation delay. Fig. 3. Array Multiplier C. Multiplier and Accumulator Unit: The inputs for the MAC are to be fetched from memory location and fed to the multiplier block of the MAC, which will perform multiplication and give the result to adder which will accumulate the result and then will store the result into a memory location. This entire process is to be achieved in a single clock cycle (Waste & Harris, 3rd Ed). The architecture of the MAC unit which had been designed in this work consists of one 16 bit register, one 16-bit Modified Booth Multiplier, 32- bit accumulator. To multiply the values of A and B, Modified Booth multiplier is used instead of conventional multiplier because Modified Booth multiplier can increase the MAC unit design speed and reduce multiplication complexity. The product of Ai X Bi is always fed back into the 32-bit accumulator and then added again with the next product Ai x Bi. This MAC unit is capable of multiplying and adding with previous product consecutively up to as many as times.

37 Fig. 4. Simple MAC Architecture III. BOOTH S RECODING ALGORITHM Parallel Multiplication using basic Booth s Recoding algorithm technique based on the fact that partial product can be generated for group of consecutive 0 s and 1 s which is called as Booth s recoding. These Booth s Recoding algorithm is used to generate efficient partial product. These Partial Products always have large number of bits than the input number of bits. This width of partial product is usually depends upon the radix scheme used for recoding. These generated partial products are added by compressors as explained in section 3.2. So, these scheme uses less partial products which comprises low power and area. There are two types of algorithm Radix-2 and Radix- 4 to generate efficient partial products for multiplication. First we will explain basic technique of Booth s Recoding algorithm and then Modified Booth s Recoding technique for Radix-2 algorithm. A. Basic Technique of Booth s Recoding Algorithm for Radix-2: Booth has proposed Radix algorithm for high speed multiplication which reduces partial products for multiplication. The Booth s algorithm for multiplication is based on this observation. To do a multiplication A*B, where A= an, an-1..a0 is a multiplier B= bn, bn-1..b0 is a multiplicand then, we check every two consecutive bits in A at a time. Suppose A is Multiplier having value -5 and B is Multiplicand having value +2 then, B=> 0010 (+2) A=> 1011 (-5). After looking into above table for multiplicand, first we see two LSB values and then adjacent values in A. We get partial product as For 10 we have to perform -1.B, i.e., 2 s complement of B, 1110. For 11 we have to put all 0 s i.e., 0000. For 01 we have to perform 1.B, i.e., value of B, 0010 For 10 again -1.B, i.e. 1110. Here, some bits are encapsulated called as correction bits to match the width of partial products. B. Basic Technique of Modified Booth s Recoding Algorithm Radix-2: Modified Booth algorithm has been proposed for high speed multiplication.this type of multiplier operates much faster than an array multiplier for longer operands because its computation time is proportional to the logarithm of the word length of operands. Booth multiplication is a technique that allows faster multiplication by grouping the multiplier bits. The grouping of multiplier bits and Radix-2 Booth encoding reduce the number of partial products to half. So we take every second column, and multiply by ±1, ±2, or 0, instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0.The advantage of this method is halving of the number of partial products. For Booth encoding the multiplier bits are formed in blocks of three, such that each block overlaps the previous block by one bit. Start from the LSB for grouping, and the first block only uses two bits of the multiplier. Figure 7 shows the grouping of bits from the multiplier term. Fig 5. Grouping of Bits from the Multiplier Term To obtain the correct partial product each block is decoded. Table 1 shows the encoding of the multiplier Y, using the modified Booth algorithm, generates the following five signed digits, -2, - 1, 0, +1, +2. Each encoded digit in the multiplier performs a certain operation on the multiplicand X. Table 1. Operations on the Multiplicand IV. PROPOSED MAC If an operation to multiply two N-bit numbers and accumulate into a 2N-bit number is considered, the critical path is determined by the 2N-bit accumulation operation. if a pipeline scheme is applied for each step

in the standard design of Fig,the delay of last accumulator must be reduced in order to improve the performance of the MAC. Fig. 6. Internal Block Diagram of 16*16 Basic Multiplier A. Booth Encoding: The first step is radix-2 booth encoding in which a row of partial products is generated from the multiplicand (X) and multiplier (Y). Partial product can be achieved using the various techniques such as booth algorithm, modified booth algorithm. This result is added to the preceding multiplication result (Z). The MAC process can be written as P X Y Z (2.1) Where the multiplicand X and multiplier Y are of n bits each and multiplication result P has 2n bits. V. IMPLEMENTATION AND RESULT Radix-2 modified booth MAC performs both multiplication and accumulation. Multiplication result is obtained by multiplying multiplicand and multiplier. This multiplication result is accumulated with previous result. The black box view of radix-2 modified Booth MAC module is shown in figure 8. Fig 8. Black Box View of Radix-2 Modified Booth MAC Table 2. Project Summary Fig. 7. Booth Encoder B. Partial Product Summation: The second step is the partial product summation to add all the partial products and convert them in the form of sum (S) and carry (C). This is done using a carry save adder and carry look ahead adder (CLA) for serial parallel multipliers. For parallel multiplier the addition is done using carry-save techniques, or summand skip. C. Final Addition: The last step is the final addition in which multiplication is produced by summing sum (S) and carries (C). Final adder is required to generate the multiplication result. Fig. 5, Shows the basic hardware architecture of the MAC architecture. It performs the multiplication operation to produce the final result by multiplying the multiplier (X) and the multiplicand (Y). Table 3. Device Utilization Summary 38

Fig 10. Simulation Result for 16-Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm VI. CONCLUSION A 16x16 multiplier-accumulator (MAC) is presented in this work. A RADIX -2 Modified Booth multiplier circuit is used for MAC architecture. Compared to other circuits, the Booth multiplier has the highest operational speed and less hardware count. The basic building blocks for the MAC unit are identified and each of the blocks is analyzed for its performance. Power and delay is calculated for the blocks. 1- Bit MAC unit is designed with enable to reduce the total power consumption based on block enable technique. Using this block, the N-bit MAC unit is constructed and the total power consumption is calculated for the MAC unit. The power reduction techniques adopted in this work. The MAC unit designed in this work can be used in filter realizations for High speed DSP applications. VII. REFERENCES [1] Young-Ho Seo, Member, IEEE, and Dong-Wook Kim, Member, IEEE, A New VLSI Architecture of Parallel Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 18, No. 2, February 2010. [2] J. J. F. Cavanagh, Digital Computer Arithmetic. New York: McGraw- Hill, 1984. [3] Information Technology-Coding of Moving Picture and Associated Autio, MPEG-2 Draft International Standard, ISO/IEC 13818-1, 2, 3, 1994. [4] JPEG 2000 Part I Fina1119l Draft, ISO/IEC JTC1/SC29 WG1. [5] O. L. MacSorley, High speed arithmetic in binary computers, Proc. IRE, vol. 49, pp. 67 91, Jan. 1961. [6] S. Waser and M. J. Flynn, Introduction to Arithmetic for Digital Systems Designers. New York: Holt, Rinehart and Winston, 1982. [7] A. R. Omondi, Computer Arithmetic Systems. Englewood Cliffs, NJ: Prentice-Hall, 1994. [8] A. D. Booth, A signed binary multiplication technique, Quart. J. Math., vol. IV, pp. 236 240, 1952. [9] C. S. Wallace, A suggestion for a fast multiplier, IEEE Trans. Electron Comput., vol. EC-13, no. 1, pp. 14 17, Feb. 1964. [10] A. R. Cooper, Parallel architecture modified Booth multiplier, Proc. Inst. Electr. Eng. G, vol. 135, pp. 125 128, 1988. [11] N. R. Shanbag and P. Juneja, Parallel implementation of a 4 4-bit multiplier using modified Booth s algorithm, IEEE J. Solid-State Circuits, vol. 23, no. 4, pp. 1010 1013, Aug. 1988. [12] G. Goto, T. Sato, M. Nakajima, and T. Sukemura, A 54 54 regular structured tree multiplier, IEEE J. Solid- State Circuits, vol. 27, no. 9, pp. 1229 1236, Sep. 1992. 39