Reconfigurable High Performance Baugh-Wooley Multiplier for DSP Applications

Similar documents
A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

AS growing demands on portable computing and communication

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Comparative Study of Different Variable Truncated Multipliers

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Tirupur, Tamilnadu, India 1 2

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

S.Nagaraj 1, R.Mallikarjuna Reddy 2

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Mahendra Engineering College, Namakkal, Tamilnadu, India.

An Optimized Design for Parallel MAC based on Radix-4 MBA

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Design of an optimized multiplier based on approximation logic

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

A New Architecture for Signed Radix-2 m Pure Array Multipliers

Digital Integrated CircuitDesign

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Comparison of Conventional Multiplier with Bypass Zero Multiplier

Design and Field Programmable Gate Array Implementation of Basic Building Blocks for Power-Efficient Baugh-Wooley Multipliers

Keywords , IJARCSSE All Rights Reserved Page Lecturer, EN Dept., DBACER,

Design of Roba Mutiplier Using Booth Signed Multiplier and Brent Kung Adder

Modified Design of High Speed Baugh Wooley Multiplier

A Survey on Power Reduction Techniques in FIR Filter

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

DESIGNING OF MODIFIED BOOTH ENCODER WITH POWER SUPPRESSION TECHNIQUE

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley Based Multiplier

Ajmer, Sikar Road Ajmer,Rajasthan,India. Ajmer, Sikar Road Ajmer,Rajasthan,India.

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

An Efficient Baugh-WooleyArchitecture forbothsigned & Unsigned Multiplication

ISSN Vol.07,Issue.08, July-2015, Pages:

A Parallel Multiplier - Accumulator Based On Radix 4 Modified Booth Algorithms by Using Spurious Power Suppression Technique

DESIGN OF LOW POWER MULTIPLIERS

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS

Low-Power Multipliers with Data Wordlength Reduction

Optimized FIR filter design using Truncated Multiplier Technique

DESIGN OF LOW POWER / HIGH SPEED MULTIPLIER USING SPURIOUS POWER SUPPRESSION TECHNIQUE (SPST)

Data Word Length Reduction for Low-Power DSP Software

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Design of Low Power Column bypass Multiplier using FPGA

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

AN EFFICIENT DESIGN OF ROBA MULTIPLIERS 1 BADDI. MOUNIKA, 2 V. RAMA RAO M.Tech, Assistant professor

VLSI Implementation of Digital Down Converter (DDC)

Area Efficient and Low Power Reconfiurable Fir Filter

An Analysis of Multipliers in a New Binary System

Performance Evaluation of Booth Encoded Multipliers for High Accuracy DWT Applications

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Keywords: Column bypassing multiplier, Modified booth algorithm, Spartan-3AN.

Performance Analysis of Multipliers in VLSI Design

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder

DESIGN OF EFFICIENT MULTIPLIER USING ADAPTIVE HOLD LOGIC

Design and Implementation of High Radix Booth Multiplier using Koggestone Adder and Carry Select Adder

Implementation of Booths Algorithm i.e Multiplication of Two 16 Bit Signed Numbers using VHDL and Concept of Pipelining

VLSI Design of High Performance Complex Multiplier

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Implementation and Performance Analysis of different Multipliers

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

Design and Implementation of High Speed Carry Select Adder

An Efficient Design of Parallel Pipelined FFT Architecture

Design and Performance Analysis of 64 bit Multiplier using Carry Save Adder and its DSP Application using Cadence

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

Innovative Approach Architecture Designed For Realizing Fixed Point Least Mean Square Adaptive Filter with Less Adaptation Delay

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 7, July 2012)

Design and Performance Analysis of a Reconfigurable Fir Filter

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

Low Power FIR Filter Structure Design Using Reversible Logic Gates for Speech Signal Processing

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

DESIGN OF AREA EFFICIENT TRUNCATED MULTIPLIER FOR DIGITAL SIGNAL PROCESSING APPLICATIONS

/$ IEEE

Design and Implementation of Scalable Micro Programmed Fir Filter Using Wallace Tree and Birecoder

A Faster Carry save Adder in Radix-8 Booth Encoded Multiplier

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

International Journal of Advanced Research in Computer Science and Software Engineering

Class Project: Low power Design of Electronic Circuits (ELEC 6970) 1

Pipelined Linear Convolution Based On Hierarchical Overlay UT Multiplier

AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Transcription:

Reconfigurable High Performance Baugh-Wooley Multiplier for DSP Applications Joshin Mathews Joseph & V.Sarada Department of Electronics and Communication Engineering, SRM University, Kattankulathur, Chennai, India E-mail : joshin166@gmail.com, saradasaran@gmail.com Abstract This paper presents a power efficient reconfigurable Baugh Wooley multiplier that provides six configuration modes. The various modes are 1.one n n fixed width multiplier,2. Two n/2 n/2 fixed width multiplier 3.one n/2 n/2 fixed width multiplier 4.one n/2 n/2 full precision multiplier 5.two n/4 n/4 full precision multiplier 6. One n/4 n/4 full precision multiplier. The design of a normal multiplier will consumes more power in DSP processor. The proposed multiplier architecture support sub word parallelism and additional features which enhance their performances in dsp application that takes only slightly less area and delay than conventional multipliers for general purpose processing. In order to reduce the power, we are applying gated clock technique and zero input technique. A fixed width multiplier is used for implementing six modes. The power of the proposed reconfigurable structure is reduced by 6-7% when compared to existing pipelined reconfigurable Baugh Wooley multiplier. This architecture supports higher clock frequency when compared with 2 s complement Baugh Wooley multiplier which supports 100MHz.The area of the proposed architecture is reduced 20% compared to the existing 2 s complement Baugh Wooley multiplier. Keywords BaughWooley algorithm, Reconfigurable, Clockgating, Twoscomplementmultiplication, Hardware description language (HDL). I. INTRODUCTION Multiplication is a very important operation in DSP applications. The power efficient multiplier is essential due to the increased demand in expanding computing and communication operations which offers a better power reduction. Most of the multiplication algorithms are based on the Baugh Wooley or Booth [1][2][3]. This algorithm is widely used in digital filters, fast fourier transforms, discrete cosine transform, convolution, wavelet transform and other important dsp related multimedia applications etc. In digital signal processing applications requires flexible working ability, less power and higher performance system so that modifications are needed to meet all these requirements. In most of the cases fixed width multipliers are used for the multiplication purposes. With this fixed type, area and power reduction are achieved to a large extent. The hardware implementation of a multiplication operation consists of three stages; especially the generation of partial products, reduction of partial products and final carry propagation addition. In fixed width multiplier the least significant bits are truncated and concentrate only on the higher order bits for the multiplication process[6]. The ignoring of the least significant partial part will lead to two main errors in the multiplication process i.e. reduction and rounding errors. In a full width multiplier n n multiplier it gives a2n output as sum of partial products. If the final product is truncated to n bits, the product matrix contributes little to the final result. As more columns which contribute the partial products are eliminated out, the area and power consumption of the arithmetic unit and delay also reduced to a larger extent. Different configuration parameters are required for making different functioning process in DSP. For attaining different configuration pattern, different multiplier structure is needed but the hardware complexity is higher. Reconfiguringan existing structure will leads to greater flexibility without compromising on performance. Former reconfigurable structures have four modes for various DSP functions[5].in this paper it has improved up to six different modes by reconfiguring the low power fixed width multiplier structure with power reduction techniques, also an error compensation technique in the design to reduce the error[7]. The six configuration modes include 1. One n n fixed width multiplier, 103

2. Two n/2 n/2 fixed width multiplier 3. One n/2 n/2 fixed width multiplier 4. One n/2 n/2 full precision multiplier 5. Two n/4 n/4 full precision multiplier 6. One n/4 n/4 full precision multiplier Within this work it has introduced a pipelined, power efficient reconfigurable Baugh-Wooleymultiplier that contributes six configuration modes which will functions in various bit length process. The paper is organized as follows: section 2 gives an description about two s complement parallel array multiplication algorithm, section 3 gives an insight about design of reconfigurable fixed width baughwooley multiplier. In section 4 discuss about the power reduction techniques which have been in the proposed architecture and simulation results are presented in section5.last,brief statements conclude the presentation of the paper. II. TWO S COMPLEMENT PARALLEL ARRAY MULTIPLICATION ALGORITHM In higher performance circuits the multiplication process consumes most area in the arithmetic computation. Two s complement is the most popular method in representing signed integer in computer science.its use wide today because it does not require the addition and subtraction circuitry to examine the signs of the operands to determine whether to add or subtract. BaughWooley multiplier is usedfor both unsigned and signed multiplications. Baugh wooley multiplier operates on signed operands with 2 s complement representation to make sure that the signs of all the partial products are positive.the unsigned multiplicationmatrix is being modified for operation of two s complement operandsusing the technique done by Baugh and Wooley[1]. The inputs of the multipliers represent n bits in two s complement fraction as, X= + (1) The first two terms of above equation are positive and last two terms are negative. In order to calculate the product, instead of subtract the last two terms it is possible to add the opposite values [1] [4]. Since its representation in 2 s complement the opposite is easily calculated considering the entire bit complemented and adding 1 in the least significant column: X.Y= (4) Fig.1: Partial product array diagram for an n n Baugh- Wooley multiplier. Final equation will be, X.Y=- Y= + (2) A full precision product X.Y is given by X.Y= - (3) 104

The above equation represents the BaughWooley algorithm for two s complement multiplication process[1]. III. DESIGN OF RECONFIGURABLE FIXED WIDTH BAUGH WOOLEY MULTIPLIER This section describes the implementation of six different configuration modes under limited hardware resource. Most of the applications it has require only single precision product, wherethe double word length result is rounded to single precision. It is only necessary to estimate the carries generated which is ripple into the most significant part of the product[8]. In the present work reduced the accuracy degradation in fixed width multipliers by truncating with rounding technique which has accuracy almost equal to the rounding technique with a little circuit complexity.the three modules denoted by mul1, mul2, mul3 are used to achieve the six modes of operation. For attain various configuration modes various configuration parameters has been set out. The elaborated structure of MUL1, MUL2, MUL3 are given in the previous paper [5]. The prototype of the reconfigurable architecture is given below. The three modules denoted by mul1, mul2, mul3 are used to achieve the six modes of operation. Forattain various configuration modes various configuration parameters has been set out. The elaborated structure of MUL1, MUL2, MUL3 are given in the previous paper [5]. 3.1 CM1: n n fixed width multiplier In CM1, multiplier receives two n bit input data and produces an n bit product. All the three multiple blocks are used for the calculation purpose. Each partial product isgenerated independely and summed up to get the final result. In this mode, compensation vector is used to add carry to the final stage. For avoiding of addition of compensation vector twice a control unit has been used in multiplier block 1.The partial array diagram and the configuration parameters has been given below. Fig.3. (a) Partial products for fixed width multiplication, (b) Partial Products for CM1, (c) Configuration parameters 3.2 CM2: n/2 n/2 fixed width multipliers The input is given as two n/2 numbers and output is taken as two n/2 numbers. It is manifest that the mul1 and mul2 blocks are suitable for two n/2 n/2 multiplication. In this mode the configuration parameters are set has 1 for CP 0,CP 1 and CP 2. Fig. 2 : Proposed pipelined reconfigurable multiplier 105

Fig 4. (a) Partial products for CM2 Fig 4(b) Input and output relations for CM2 3.3 CM3:one n/2 n/2 fixed width multiplier In this mode, two multipliers are used to obtain the final result. Two multiplicand operations are not necessary for smaller bit length applications so that only one multiplier is required to obtain the result. The power consumption is reduced by using only one multiplier block mul1. Fig.5. (a) Proposed partial product array diagram for CM3, (b) configuration parameter settings. 3.4 Mode 4: one n/2 n/2 full precision multiplier In this case multiplier block 3 is alone is used for the operation.two n/2 numbers are multiplied and n bit product is given as the output. The partial product diagram and mode setting are given in figure 6. Fig.6. (a) Partial products for CM4, (b) Configuration Parameters for CM4 3.5 CM5: two n/4 n/4 full precision multiplier This configuration mode is widely used in low resolution operation which performs two n/4 n/4 full precision multiplications. With minimum numbers of modules and partial product configuration we make use of mul3 is used to fulfill mode5 operation. The operation of the parameters setting is explained in figure7. 3.6 CM6: n/4 n/4 full precision multiplier This mode is an extension to mode5 which uses lesser resources to arrive at multiplication process. This mode is added advantage for low power application where a small part of architecture is being used up. In this only the higher order bits of mul3 has been using up for the calculation part. The higher bits from both the inputs has been invoking for calculations. Using the above mentioned operating modes and the reconfigurable architecture, a new architecture is proposed to arrive at the functionality. The figure gives an over view of an architecture. The entire architecture is divided it into 3 sections.stage1 decodes the operation condition for different modes of operation. These bits select which multiplier functionality to be performed in a particular time. The mode select bits are determined according to the reconfigurable region or modules designed.operation code (op) is used to determine the type of multiplicationperformed; either n x n fixed width or n/2 x n/2 fixed width or n/2 x n/2 full precision or n/4 x n/4 full precision. In second stage each MUL module performs independent multiplication operation according to the multiplicand inputs and the decoded control signals from the stage 1. The product from each MUL is then sent to stage 3 for final addition. MUX in the final stage is used to select the output of the multipliers based on the input control signals. 106

The hardware over head is the main disadvantage of this scheme. This duplicated registers can increase the area of the multiplier. Fig.7. (a) Partial products for CM5, (b) Configuration parameters for CM5 IV. DESIGN OF RECONFIGURABLE POWER EFFICIENT ARCHITECTURE Power Consumption in baughwooley multipliers is minimum compared to other conventional multiplier units. So it is cleared that both signed and unsigned binary multiplication through baughwooley multiplication is suited for the reconfigurable multiplier implementation. The reconfigurable structure invokes all the hardware resources for its operation. The introduction of clock gating and zero input technique into the proposed structure makes it more power efficient. The control signal n isintroduced to achieve m3 and m6 modes of operation. It has no significance when we used in CM1 and CM4 modes. The power efficient reconfigurable fixed width multiplier is shown in figure 8. 4.1 Clock gating Clock gating is applied to the register in the second and third stage of the multiplier. The main aim of this is to avoid unnecessary transition in the multiplication process. With our requirement only registers are disabled based on the mode of operation 1. If multiplier is operated in m1 mode then mul1, mul2, mul3 are conditionally disabled based on the zero inputs to the multiplier. 2. For mode2, mul3 is being disabled. 3. For mode3, mul2 and mul3 are disabled. 4. For mode4, mul1 and mul2 are disabled. 5. For mode5, mul1 and mul2 are disabled. 6. For mode6, mul1 and mul2 are disabled and mul3 is partially disabled by disabling the gated register. Fig.8 Proposed power efficient pipelined reconfigurable fixed width multiplier. 4.2 Zero input technique: The functional blocks mul1, mul2 and mul3 can be functionally disabled based on the zero inputsthey receive. The condition for zero value is follows 1. If x [7:4] is zero, input register of mul1 and mul3 can be disabled 2. If x[3:0] is zero, input register of mul2 can be disabled 3. If y[7:4] is zero, input register of mul2 and mul3 can be disabled 4. If y[3:0] is zero,input register of mul1 can be disabled. In most cases if the inputsoperands are zero the product of the multiplication process may not be zero, because some of the partial products in the multiplication process has complemented out. The actual outputs of the mul3and mul2 should be (11110000) 2 and (001111) 2 [5].The output of mul1 may not be same in all the cases the output depends on the partial product vector.in such case the actual product of MUL1 in the disabledcondition is {0100, x3y3 & Km2, (x3y3 & Km2) }. The control unit (CU) is used to treat Km2 = 1 when MUL2 is disabled. Latch L is used to keep the present value when MUL1 is disabled. For the operations other than M1 mode, input registers of ADD1 can be disabled. Based on the above stated conditions the input signal is decoded and g_m1, g_m2 and g_m3 are generated which control the gated registers of MUL1, MUL2 and MUL3 respectively. The gated 107

register at stage 3 is controlled by t[3] which is taken as value 1 only in the operation mode CM1. V. SIMULATION RESULTS Fig.9: Simulated Power of reconfigurable 2 s complement multiplier Fig. 10 : Simulated Power of reconfigurable 2 s complement power efficient multiplier From the simulated results the power efficient reconfigurable multiplier is more efficient than normal pipelined reconfigurable multiplier. By calculating the LUT s area used in the structure, power efficient reconfigurable 2 s complement multiplier consumes less area than the normal pipelined reconfigurable multiplier.hence the area and the power consumption isreduced and the performance and the throughput is increased. clock gating and zero input technique. The power efficient architecture will reduce 6-7% of the power with respect to the proposed reconfigurable multiplier with six modes. The frequency of operation is doubled compared to other reconfigurable architectures. The same methodology can be used for n=16,32, and 64. The average power of the multiplier is reduced with the addition of two more modes. VII. REFERENCES [1] C.R. Baugh and B.A. Wooley, A Two s Complement Parallel Array Multiplication Algorithm, IEEE Trans. Computers, vol. 22, no. 12, pp. 1045-1047, Dec. 1973. [2] A.D. Booth, Signed Binary Multiplication TechniquesQuarterly J. Mechanics and Applied Math., vol. 4, pp. 236-240, 1951. [3] O.L. MacSorley, High-Speed Arithmetic in Binary Computer, Proc. Conf. Institute of Radio Engineers (IRE 61), vol. 49, pp. 67-91, 1961. [4] K. Hwang, Computer Arithmetic: Principles, Architecture, and Design. John-Wiley, 1979. [5] Tu, J.-H., Van, L.-D.: Power-Efficient Pipelined Reconfigurable Fixed-Width Baugh-Wooley Multipliers. IEEE Trans. Computers 58(10) (October 2009) [6] Jou, J.M., Kuang, S.R., Chen, R.D.: Design of Low-Error Fixed-Width Multiplier for DSP applications. IEEE Trans. Circuits and Systems 46(6), 836 842 (1999) [7] Krithivasan, S., Schulte, M.J.: Multiplier Architectures for Media Processing. In: Proc. EEE Asilomar Conf. Signals, Systems, and Computers, vol. 2, pp. 2193 2197 (November 2003) [8] Tsao, Y.-L., Chen, W.-H., Tan, M.-H., Lin, M.- C., Jou, S.-J.: Low-Power Embedded DSP Core for Communication Systems. EURASIP J. Applied Signal Processing, 1355 1370 (January 2003). VI. CONCLUSION A pipelined reconfigurable power efficient two s complement multiplier using Baugh Wooley algorithm is implemented.the structure has been modeled in Verilog HDL. Better power efficiency is achieved by 108