A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

Similar documents
Reconfigurable High Performance Baugh-Wooley Multiplier for DSP Applications

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Design of an optimized multiplier based on approximation logic

Design and Field Programmable Gate Array Implementation of Basic Building Blocks for Power-Efficient Baugh-Wooley Multipliers

Design of Area and Power Efficient FIR Filter Using Truncated Multiplier Technique

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

S.Nagaraj 1, R.Mallikarjuna Reddy 2

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

An Optimized Design for Parallel MAC based on Radix-4 MBA

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Tirupur, Tamilnadu, India 1 2

Low-Power Multipliers with Data Wordlength Reduction

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Modified Design of High Speed Baugh Wooley Multiplier

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

An area optimized FIR Digital filter using DA Algorithm based on FPGA

AS growing demands on portable computing and communication

Implementation of High Speed Area Efficient Fixed Width Multiplier

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

2. URDHAVA TIRYAKBHYAM METHOD

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

A Survey on Power Reduction Techniques in FIR Filter

Low power and Area Efficient MDC based FFT for Twin Data Streams

DESIGN OF AREA EFFICIENT TRUNCATED MULTIPLIER FOR DIGITAL SIGNAL PROCESSING APPLICATIONS

Design and Implementation of Complex Multiplier Using Compressors

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 7, July 2012)

Performance Evaluation of Booth Encoded Multipliers for High Accuracy DWT Applications

Mahendra Engineering College, Namakkal, Tamilnadu, India.

HIGH SPEED FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

VLSI Designing of High Speed Parallel Multiplier Accumulator Based On Radix4 Booths Multiplier

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

AN EFFICIENT DESIGN OF ROBA MULTIPLIERS 1 BADDI. MOUNIKA, 2 V. RAMA RAO M.Tech, Assistant professor

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley Based Multiplier

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

Design A Redundant Binary Multiplier Using Dual Logic Level Technique

ISSN Vol.07,Issue.08, July-2015, Pages:

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast Arithmetic Circuits

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

Design of Baugh Wooley Multiplier with Adaptive Hold Logic. M.Kavia, V.Meenakshi

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Design of Low Power Column bypass Multiplier using FPGA

An Efficient Baugh-WooleyArchitecture forbothsigned & Unsigned Multiplication

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

Design and Implementation of High Speed Carry Select Adder

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Area Efficient and Low Power Reconfiurable Fir Filter

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

Design and Implementation of Digit Serial Fir Filter

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

Innovative Approach Architecture Designed For Realizing Fixed Point Least Mean Square Adaptive Filter with Less Adaptation Delay

DESIGN OF LOW POWER / HIGH SPEED MULTIPLIER USING SPURIOUS POWER SUPPRESSION TECHNIQUE (SPST)

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

VLSI Implementation of Digital Down Converter (DDC)

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

CHAPTER 1 INTRODUCTION

Architecture for Canonic RFFT based on Canonic Sign Digit Multiplier and Carry Select Adder

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

Comparative Study of Different Variable Truncated Multipliers

Comparison of Conventional Multiplier with Bypass Zero Multiplier

FPGA Implementation of Area-Delay and Power Efficient Carry Select Adder

ISSN Vol.03,Issue.02, February-2014, Pages:

DESIGNING OF MODIFIED BOOTH ENCODER WITH POWER SUPPRESSION TECHNIQUE

DESIGN OF LOW POWER MULTIPLIERS

International Journal of Advanced Research in Biology Engineering Science and Technology (IJARBEST)

Area and Power Efficient Booth s Multipliers Based on Non Redundant Radix-4 Signed- Digit Encoding

PERFORMANCE COMPARISION OF CONVENTIONAL MULTIPLIER WITH VEDIC MULTIPLIER USING ISE SIMULATOR

Optimized FIR filter design using Truncated Multiplier Technique

ISSN: X International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 1, Issue 5, November 2012

Field Programmable Gate Arrays based Design, Implementation and Delay Study of Braun s Multipliers

A Parallel Multiplier - Accumulator Based On Radix 4 Modified Booth Algorithms by Using Spurious Power Suppression Technique

Implementation of High Speed and Low Area Digital Radix-2 CSD Multipliers using Pipeline Concept

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

International Journal of Computer Engineering and Applications, Volume XI, Issue XI, Nov. 17, ISSN

Implementation of Booths Algorithm i.e Multiplication of Two 16 Bit Signed Numbers using VHDL and Concept of Pipelining

Design of Signed Multiplier Using T-Flip Flop

Design and Implementation of 128-bit SQRT-CSLA using Area-delaypower efficient CSLA

Transcription:

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog K.Durgarao, B.suresh, G.Sivakumar, M.Divaya manasa Abstract Digital technology has advanced such that there is an increased need for power efficient and faster designs. Fixed width multipliers are mostly used in almost all fields of applications like communication, speech processing and digital processing applications such as FFT, DCT, IFFT, windowing technique. Baugh-wooley multiplier is a preferred choice for the realization of 2 s complement multiplication operation used in these applications. In this paper mainly proposed the performance evaluation of the kintex-7, low power Spartan-6, zync-7000 FPGA families of devices, from the synthesized results with different optimized goals. Keywords fixed width ; modified baugh wooley multiplier; kintex 7; carry select adder I. INTRODUCTION The core of digital system is the arithmetic logic unit and in that multiplier has played a vital role in the logic unit of any processor of digital system. Multipliers are widely used in almost all fields of applications of like communications, speech processing applications, digital signal processing applications such as wavelet transforms, discrete cosine transforms(dct), fast Fourier transforms( FFT), windowing technique. The common multiplication method is add and shift algorithm. In parallel multipliers number of partial products to be added is the main parameter that determines the performance of the multiplier. To reduce the number of partial products we have most recently used algorithms in that most popular one is Baugh-wooley multiplier. Digital signal processor applications require efficient and low error fixed width multipliers, in which the bit size of the product is same as the bit sizes of inputs of multiplier and multiplicand. Fixed width multipliers generate only the most significant product bits. These most significant bits generate truncation errors. These errors can be removed by using of error compensation bias circuits. In earlier multipliers include the basic tasks of serially inputting values and adding them adding the partial product to get final product value. They all fallow the basic shift and add method. These multipliers were not fast enough for digital systems. Also new algorithms had been developed for multiplication of signed values and they had to be compatible for unsigned value multiplication. Array or matrix multipliers are developed for fast multiplication processes. In array multiplier dats fed to parallel and all partial products are obtained simultaneously and latter added to get the final product value. The main aim of thesis is to synthesize and simulate fixed width modified Baugh-wooley multiplier using 7 series Xilinx field programmable gate arrays (FPGAs) such as kintex-7, zync-7000 with different optimized goals. This design is evaluated based on number of FPGA slices and lookup tables (LUTs) utilized. Mimum frequency and power consumption. The remaining paper is managed as fallows. Section II presents the architectural details of modified Baughwooley multiplier. The performance evaluation results for FPGA are presented in section III. Finally section IV presents the conclusion and future scope. II. BAUGH-WOOLEY MULTIPLIER Charles Baugh and Bruce Wooley in 1973 developed an algorithm for signed values in two s complement form was based on parallel array multiplier architectures as result chip area and less delay achieved. Let us consider the multiplicand and multiplier operands A = a n 1, a, a n 3, a n 4,.. a 1,a 0 B = b n 1, b, b, b n 3, b n 4 b 1,b 0 Represented in 2 s complement format by( 1) and (2) respectively. The product, P using Baugh-wooley algorithm can be represented by equation (3) A = a n 1 2 n 1 + B = b n 1 2 n 1 + P = a n 1 b n 1 2 2 + 2 n 1 P = A B 2 i (1) b j 2 j (2) b j 2 i+j 2 n 1 b n 1 2 i (3) a n 1 Bit position 2n-1 2n-2 2n-3 2n-4...n n-1 n-2 n-3.0 -X 1 1 x x n 3.x 1 x 0 + 1 0 0 0 b j 2 j +(-Y) 1 1 y y n 3 y 1 y 0 + 1 0 0 0 3835

P = a n 1 b n 1 2 2 + b j 2 i+j +2 n 1 2 n 1 + a n 1 b j 2 j + 1 +2 n 1 2 n 1 + b n 1 2 i + 1.. (4) signal, T[3], is used to determine the final correct product among different configuration modes Decoder inputs A[7:0] B[7:0 2 to 4 decoder As can be seen from the above equation, multiplication of two 2 s complement numbers can be expressed in the form which involves only positive bit products. T[3:0] The 8 bit array based modified baugh-wooley multiplier with four stages of pipelined structure which is shown in fig 1.it consists of 2 to 4 decoder, two multiplexers(mux),three multipliers (MUL1,MUL2,MUL3). The architecture of MUL1, MUL2, MUL3 are shown in fig. 2,fig. 4,fig. 6 respectively. The building blocks for MUL1, MUL2, MUL3 are further shown in fig. 2, fig. 4, fig. 6 respectively. And in this architectures I use some notations ND, A, HA, represents a NAND gate, an AND gate, a half adder and full adder, respectively. In the block diagram from Fig. 1. All modules are controlled by 2 to 4 decoder controlling signals {, T[1],, T[3] }, for the next processing as summarized in Table 1. Based on the control signals, the three multiplication modules can be manipulated at the second stage. A MUX is used in the second stage to select the output of MUL3 or the concatenated output of MUL1 and MUL2 as shown in Fig. 1. To minimize the error, two sub calibration circuits, SCC1 and SCC2, are used, as shown in Fig. 3 and Fig. 5, respectively. a[7:3] a[7:4] b[7:4] a[3:0] b[7:4] b[3:0] T[1:0] MUL3 MUL2 MUL1 m2[12:7] Km2 m1[12:7] M3[15:8] M2[11:8],M1[11:8] MUX1 T[3] 8 TABLE I. DECODER TRUTH TABLE 8 bit carry select adder Configuration Modes Control Signals OP [1:0] T[3] T[1] 00 M1 1 0 0 0 01 M2 0 1 0 0 10 M3 0 0 1 0 T[3] MUX2 8 8 11 M4 0 0 0 1 P[15:8] The third stage is responsible for the accumulation of the output values of MUL1, MUL2, MUL3 and selecting the output of final product according to the four configuration modes. As shown in Fig. 1, ADD1 adds the output of MUL1 and MUL2. The output bits of ADD1 only include carryout and ignores the LSB due to the fixed-width output of the multiplier. The output of ADD1 and the output of the MUX from the second stage are added using ADD2. A control Fig. 1 Pipelined modified baugh-wooley multiplier. 3836

yo y1 nd 1 nha x7 x6 x5 x4 x3 y4 y5 ANSO y2 nha y6 y3 rp1 rp3 SCCI y7 nx HA M2[7] Km2 M1[12] M1[11] M1[10] M1[9] M1[8] M1[7] M2[12] M2[11] M2[10] M2[9] M2[8] Fig. 2. Architecture for MUL1 Fig. 4. Architecture for MUL2 Fig. 3. Building blocks of MUL1 x3 x2 x1 x0 Fig. 5. Building blocks of MUL2 3837

x7 x6 x5 x4 III. PERFORMANCE EVALUATION FOR DIFFERENT MILIES OF FPGAS y4 y5 y6 y7 rp4 rp4 nd aha 1 ao rp5 aha rp7 rp8 and rp6 rp7 rp8 M3[8] M3[9] M3[10] M3[11] T[1] The modular 8 8 fixed-width Baugh-Wooley multiplier design is coded in VERILOG and synthesized using high-end state-of-the-art Xilinx 7 series FPGAs. Kintex-7 (xc7k70tfbg676) and Zynq-7000 (xc7z010-1clg400) devices are selected as the target architectures synthesized and performance evaluation of the multiplier. At the logic level, these FPGA families are all based on 6-input LUTs and fabricated in an advanced 28 nm CMOS technology. Each block of the multiplier is verified through simulation using ISE 14.7 simulator. The RTL schematic of the multiplier is shown in Fig. 8. Tables II, III, IV and V summarize the FPGA implementation results of the multiplier using different families of devices. TABLE.II. FPGA DEVICE UTILIZATION SUMMARY FPGA Resources Kintex-7 Artix-7 M3[15] M3[14] M3[13] M3[12] Fig. 6. Architecture for MUL3 Slice LUTs 73/41000 59/63400 Delay (ns) 3.339 5.839 M frequency(mhz) 299.49 171.36 Power consumption (mw) 80 82 TABLE.III FPGA Resources Spartan=6 Zync-7000 Slice LUTs 59/2400 59/17600 Delay (ns) 10.777 4.489 M frequency(mhz) 92.79 222.76 Power consumption (mw) 14 100 TABLE.IV FPGA Resources Spartan-3a Sparatan-3E Slice LUTs 118/1408 67/970 Delay (ns) 15.054 14.685 M frequency(mhz) 66.42 68.09 Power consumption (mw) 10 34 Fig. 7. Building blocks of MUL3 3838

IV.CONCLUSIONS In this paper mainly design and synthesized fixed width modified baugh-wooley multiplier using different families of FPGAs such as Kintex-7,zync-7000,Spartan- 6,Spartan-3a and Spartan-3E. the design has been coded in VERIOG with Software platform of Xilinx ISE 14.7. from the synthesized results evaluated different characteristics of multiplier such as power, speed, area. As future extension instead of fixed point arithmetic using of floating point arithmetic which is better accurate results for mostly advanced digital signal processors (DSP) applications. REFERENCES [1] L.-D.Van and J.-H.Tu, Power-efficient pipelined reconfigurable fixedwidth baugh-wooley multipliers, IEEE Trans. Computers, Vol. 58, No. 10, pp. 1346-1355, Oct. 2009. [2] R. C. Baugh and A. B. Wooley, A two s complement parallel array multiplication algorithm, IEEE Trans. Computers, Vol. C-22, No. 12, 1045-1047, Dec. 1973. [3] J. M. Jou, S. R. Kuang, and R. D. Chen, Design of low-error fixedwidth multipliers for DSP applications, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., Vol. 46, No. 6, pp. 836-842, June 1999. [4] S. Gao, D. Al-Khalili and N. Chabini, Efficient realization of large size two's complement multipliers using embedded blocks in FPGAs, J. Circuits Syst. Signal Process., Vol. 27, No. 5, pp. 713-731, Oct. 2008. [5] T. -B. Juang and S. -F. Hsiao, Low-error carry-free fixed-width multipliers with low-cost compensation circuits, IEEE Trans. Circuits Syst. II, Vol. 52, No. 6, pp. 209-303, June 2005. [6] K. -J. Cho, K. -C. Lee, J. -G. Chung and K. K. Parhi, Design of lowerror fixed-width modified booth multiplier, IEEE Trans. Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 5, pp. 522-531, May 2004. [7] S. -M. Kim, J. -G. Chung and K. K. Parhi, Low error fixed-width CSD multiplier with efficient sign extension, IEEE Trans. Circuits Syst. II, Vol. 50, No. 12, pp. 984-993, Dec. 2003 [8] http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/ hh_goto.htm#ise_c_design_strategies.htm [Online, Accessed: 2 Feb, 2015] [9] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs. Oxford, U.K.: Oxford Univ. Press, 2000, pp. 93 96 [10] E. J. King and E. E. Swartzlander, Jr., Data-dependent truncation scheme for parallel multipliers, in Proc. 31st Asilomar Conf. Signals, Systems, and Computers, Pacific Grove, CA, 1997, pp. 1178 1182 [11] E. E. Swartzlander, Jr., Truncated multiplication with approximate rounding, in Proc. 31st Asilomar Conf. Signals, Systems, and Computers, Pacafic Grove, CA, 1999, pp. 1480 1483 [12] O.L. MacSorley, High-Speed Arithmetic in Binary Computer, Proc. Conf. Institute of Radio Engineers (IRE 61), vol. 49, pp. 67-91, 1961. About authors: 1) K.Durgarao, M.tech student of Amrita sai institute of science and technology. 2) B.suresh, Assistant professor in the dept ECE, Amrita sai institute of science and technology, Paritala. 3) G.Sivakumar, Assistant professor in the dept ECE, Amrita sai institute of science and technology, Paritala. 4) M.Divaya manasa, Assistant professor in the dept ECE, Amrita sai institute of science and technology, Paritala. Fig.8. RTL Schematic of fixed width-modified Baugh wooley multiplier. 3839