Synthesis and Simulation of Floating Point Multipliers Dr. P. N. Jain 1, Dr. A.J. Patil 2, M. Y. Thakre 3

Similar documents
Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

REALIZATION OF FPGA BASED Q-FORMAT ARITHMETIC LOGIC UNIT FOR POWER ELECTRONIC CONVERTER APPLICATIONS

An Optimized Design for Parallel MAC based on Radix-4 MBA

Design of Efficient 64 Bit Mac Unit Using Vedic Multiplier

Tirupur, Tamilnadu, India 1 2

Research Journal of Pharmaceutical, Biological and Chemical Sciences

FPGA Implementation of Digital Modulation Techniques BPSK and QPSK using HDL Verilog

Design and Simulation of 16x16 Hybrid Multiplier based on Modified Booth algorithm and Wallace tree Structure

DESIGN OF LOW POWER / HIGH SPEED MULTIPLIER USING SPURIOUS POWER SUPPRESSION TECHNIQUE (SPST)

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Performance Enhancement of the RSA Algorithm by Optimize Partial Product of Booth Multiplier

International Journal of Scientific & Engineering Research Volume 3, Issue 12, December ISSN

IJSER HIGH PERFORM ANCE PIPELINED SIGNED 8* 8 -BI T M ULTIPLIER USING RADIX-4,8 M ODIFIED BOOTH ALGORITHM

Synthesis and Analysis of 32-Bit RSA Algorithm Using VHDL

Fpga Implementation Of High Speed Vedic Multipliers

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

DESIGN OF A HIGH SPEED MULTIPLIER BY USING ANCIENT VEDIC MATHEMATICS APPROACH FOR DIGITAL ARITHMETIC

Implementation of FPGA based Design for Digital Signal Processing

Performance Analysis of Multipliers in VLSI Design

International Journal of Advanced Research in Computer Science and Software Engineering

EFFICIENT FPGA IMPLEMENTATION OF 2 ND ORDER DIGITAL CONTROLLERS USING MATLAB/SIMULINK

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

An area optimized FIR Digital filter using DA Algorithm based on FPGA

Single Chip FPGA Based Realization of Arbitrary Waveform Generator using Rademacher and Walsh Functions

VHDL based Design of Convolutional Encoder using Vedic Mathematics and Viterbi Decoder using Parallel Processing

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

The Comparative Study of FPGA based FIR Filter Design Using Optimized Convolution Method and Overlap Save Method

A Survey on Power Reduction Techniques in FIR Filter

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Optimum Analysis of ALU Processor by using UT Technique

OPTIMIZATION OF LOW POWER USING FIR FILTER

CHAPTER III THE FPGA IMPLEMENTATION OF PULSE WIDTH MODULATION

Architecture for Canonic RFFT based on Canonic Sign Digit Multiplier and Carry Select Adder

High Performance Vedic Multiplier Using Han- Carlson Adder

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

Design and Implementation of High Speed Carry Select Adder

A Survey on Design of Pipelined Single Precision Floating Point Multiplier Based On Vedic Mathematic Technique

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

Implementation and Performance Analysis of different Multipliers

Analysis Parameter of Discrete Hartley Transform using Kogge-stone Adder

Comparative Analysis of 16 X 16 Bit Vedic and Booth Multipliers

DESIGN OF LOW POWER MULTIPLIERS

Area Efficient Modified Vedic Multiplier

DESIGN OF HIGH SPEED MULTIPLIERS USING NIKHIALM SUTRA ALGORITHM

Implementation and Performance Evaluation of Prefix Adders uing FPGAs

FPGA based Asynchronous FIR Filter Design for ECG Signal Processing

FPGA Implementation of Complex Multiplier Using Urdhva Tiryakbham Sutra of Vedic Mathematics

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

PIPELINED VEDIC MULTIPLIER

Modified Design of High Speed Baugh Wooley Multiplier

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

Midterm Exam ECE 448 Spring Thursday Section. (15 points)

Design of Roba Mutiplier Using Booth Signed Multiplier and Brent Kung Adder

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

Design of High Speed Power Efficient Combinational and Sequential Circuits Using Reversible Logic

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

DIGITAL DESIGN WITH SM CHARTS

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

High Speed and Reduced Power Radix-2 Booth Multiplier

PERFORMANCE COMPARISION OF CONVENTIONAL MULTIPLIER WITH VEDIC MULTIPLIER USING ISE SIMULATOR

S.Nagaraj 1, R.Mallikarjuna Reddy 2

International Journal of Advance Research in Engineering, Science & Technology

International Journal of Advance Engineering and Research Development

Design and Simulation of Low Power and Area Efficient 16x16 bit Hybrid Multiplier

International Journal of Modern Engineering and Research Technology

Power Efficient Optimized Arithmetic and Logic Unit Design on FPGA

Hardware Implementation of 16*16 bit Multiplier and Square using Vedic Mathematics

Design and Analysis of RNS Based FIR Filter Using Verilog Language

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

DESIGNING OF MODIFIED BOOTH ENCODER WITH POWER SUPPRESSION TECHNIQUE

Design of Multiplier Less 32 Tap FIR Filter using VHDL

Hardware Implementation of BCH Error-Correcting Codes on a FPGA

An Efficient Method for Implementation of Convolution

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

International Journal of Modern Engineering and Research Technology

International Journal of Modern Trends in Engineering and Research

Design and Implementation of a delay and area efficient 32x32bit Vedic Multiplier using Brent Kung Adder

Design of Signed Multiplier Using T-Flip Flop

Comparative Analysis of Various Adders using VHDL

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

Webpage: Volume 3, Issue V, May 2015 ISSN

FPGA Implementation of MAC Unit Design by Using Vedic Multiplier

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Area Efficient and Low Power Reconfiurable Fir Filter

CHAPTER 5 IMPLEMENTATION OF MULTIPLIERS USING VEDIC MATHEMATICS

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

VLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K.

FPGA Implementation & Performance Comparision of Various High Speed unsigned Binary Multipliers using VHDL

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

FPGA Implementation of Low Power and High Speed Vedic Multiplier using Vedic Mathematics.

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

FPGA Implementation of a 4 4 Vedic Multiplier

BCD Adder. Lecture 21 1

FPGA Implementation of High Speed Linear Convolution Using Vedic Mathematics

Transcription:

Synthesis and Simulation of Floating Point Multipliers Dr. P. N. Jain 1, Dr. A.J. Patil 2, M. Y. Thakre 3 1Professor and Academic Dean, Department of E&TC, Shri. Gulabrao Deokar College of Engineering, Jalgaon, India. 2Principal, Department of E&TC, Shri. Gulabrao Deokar College of Engineering, Jalgaon, India. 3PG Student,Department of E&TC, Shri. Gulabrao Deokar College of Engineering, Jalgaon, India Email- jainpnj@gmail.com, principal@sgdcoejalgaon.org, thakre.mayuri23@gmail.com A B S T R A C T Performance of floating point arithmetic units is of prime importance in several areas of computing, signal processing and medical imaging. The binary representation of decimal floatingpoint numbers permits an well organized application of the advanced radix independent IEEE standard for floating-point arithmetic. Multiplication is a representative and core operation which demands high performance and area efficient implementation. A Binary multiplier is an integral part of the arithmetic logic unit (ALU) scheme found in various processors. Integer multiplication is likely to be inefficient and costly, in time and hardware, depending on the illustration of signed numbers. In this project, methods of implementing floating point multiplication are explored. 754 standards. The prime objective is to develop a multiplier which is compliant with IEEE floating-point standard. For this purpose two different algorithms were studied and the design was developed for a serial multiplier and parallel multiplier. The concept of control circuit to control the arithmetic network was used for implementation of multiplier. This work is primarily aimed for implementation of floating-point multiplication on FPGA platform using VHDL due to the logic resources available and flexibility of implementation. The Xilinx ISE 9.2i was used for the purpose. Use of VHDL provided a technology independent hardware design. The design was targeted for the SPARTAN3 FPGA and the device chosen was XC3S500e-5pq208. A test-bench was designed and the results were verified by hands-on calculations and published results. The workings of these two multipliers were compared on basis of device utilization, timing summary and throughput of the design. Improvement in performance of present multiplier will be possible by pipelining with trade-off related to area and power. I. FLOATING POINT MULTIPLICATION A clean floating point multiplication for two binary numbers can be seen below mathematically. (F1 X 2 E 1) X (F2 X 2 E 2) = (F1 x F2) X (2 (E 1+ E 2 ) =F x 2 E The basic steps involved in FP multiplication are as follows: 1. Add exponents 2. Multiply fractions 3. If product is zero, adjust for proper zero 4. Normalize product fraction 5. Check for proponent overflow or underflow 6. Round product fraction 27 2015, IJAFRC All Rights Reserved www.ijafrc.org

Thus multiplication includes the addition of exponents and multiplication of fractions. The product fraction is then normalized and exponent overflow or underflow is observed. Finally the product fraction is rounded. The flow chart diagram for floating point multiplication is shown in Figure 1 [1]. Figure 1: Flow Chart Diagram for Floating Point Multiplication II. SYSTEM IMPLEMENTATION The concept of control circuit to control the arithmetic network was used for implementation of multiplier. This concept is extremely important to coordinate the behavior of the surrounding subsystems. The use of this control unit or system controller idea leads to systematic and structured approaches to digital system design [2]. Two different algorithms for high speed hardware multipliers were studied and chosen for implementation. 1. Serial multiplier 2. Parallel multiplier 2.1 SERIAL MULTIPLIER: The hardware required to implement the FP multiplier consists of exponent adder and a fraction multiplier. The project work started with the development of a simple multiplier unit and an adder circuit for floating point number represented by 4 bit fraction and 4 bit exponent [1]. The basic multiplier 28 2015, IJAFRC All Rights Reserved www.ijafrc.org

unit was modified to function for multiplication of signed binary number. The design of serial adder with accumulator forms section below. The binary multiplier and signed binary multiplier are discussed [1]. This 4 bit multiplier is synthesized, simulated and tested for various combinations and the same system is extended for 32 bit floating point multiplication. 2.2 Design of an adder for serial multiplier: A serial adder with accumulator as in [1]was used for this purpose. The block diagram for a 4 bit serial adder with control circuit is indicated in Figure 2.2. Figure 2-2: Block diagram for a 4 bit serial adder Two shift registers are used to control the 4-bit numbers to be added, X and Y. The box at the left end of each shift register represents the inputs: Sh (shift), SI (serial input), and Clock. When Sh =1 and the clock is pulsed, SZis entered into x, (or y,) as the contents of the register are shifted right one position. The X- register serves as the accumulator, and after four shifts, the number X is replaced with the sum of X and Y. The addend register is connected as a cyclic shift register, so after four shifts it is back to its original state and the number Y is not lost. The serial adder consists of a full adder and a carry flip-flop. At each clock time, one pair of bits is added. When Sh = 1, the falling edge of the clock shifts the sum bit into the accumulator, stores the carry bit in the carry flip-flop, and causes the addend register to rotate right. Additional connections needed for initially loading the X and Y registers and clearing the carry flipflop[5]. 2.3 Design of a Multiplier unit: The Figure 2.3 shows the hardware required to multiply two 4 bit fractions. Multiplication of two 4-bit numbers needs a 4-bit multiplicand register, a 4-bit multiplier register, a 4-bit full adder, and an 8-bit register for the product. The product register present as an accumulator to accumulate the sum of the partial products. The contents of the product register are shifted to the right each time, as shown in the block diagram of Figure 2.4 This type of multiplier is sometimes mention as a serial-parallel multiplier, since the multiplier bits are processed serially, but the addition takes place in parallel. Depending on the multiplier bit, whether 0 or 1, shift or shift and add operation takes place. Finally the ACC register contains the product when multiplication is complete. 29 2015, IJAFRC All Rights Reserved www.ijafrc.org

2.4 Design description and testing Figure 2-3: 4 bit multiplier The VHDL code for the system uses behavioral style description. Two processes are used in this description. The main process generates control signals for the system. A second process generates the control signals for the fraction multiplier. Generation of unwanted latches was avoided by initializing of output signals. The testing for the 4 bit FP multiplier unit is done to account for all for all the special cases in combination with positive and negative fractions, as well as positive and negative exponents [8]. III. PERFORMANCE ANALYSIS This presents synthesis and simulation outcomes of a floating point multipliers. 3.1 RESULTS FOR OF SERIAL FLOATING POINT MULTIPLIER A 4 bit multiplier is synthesized and tested thoroughly and the same design is extended for a 32 bit multiplier. 3.1.1 Synthesis Outcomes: The Table 3.1 and Table 3.2 show the device utilization and timing summary for the 4 bit and 16 bit serial floating multiplier. It can be seen that the 16 bit multiplier has much higher device utilization which is obvious. The devices utilization is proportional to bit size of multiplier. The timing summary shows that the maximum clock frequency at which multiplier will work is almost same [6]. It must be recollected that since this is serial multiplier, the final output i.e. product will be generated only after complete multiplication. Thus if it is N bit multiplier, it is likely to take (2N+1) cycle for final product. Thus throughput of the machine is low [7]. Table 3.1: Device Utilization for 4 bit and 32 bit serial Multiplier Logic Utilization 4 bit multiplier 32 bit multiplier Number of Slice Flip Flops 18 61 30 2015, IJAFRC All Rights Reserved www.ijafrc.org

Number of 4 input LUTs 61 223 Number of occupied Slices 33 113 Number of bonded IOBs 32 223 Total equivalent gate count for design 628 2316 Table 3.2: Timing Summary for 4 and 32 bit Serial Multiplier Timing Parameter 4 bit multiplier 32 bit multiplier Minimum period: 8.004 ns 8.076 ns Maximum Frequency 124.938 MHz 123.823MHz 3.1.2 Simulation waveforms: The Figure 3.1. Indicates the results obtained after simulation of the 4 bit multiplier. The inputs are tested for the combinations [3][4]. The signals have the interpretation as in Table 3.1.The start signal when high is generated to start the multiplication which takes 9 cycles to complete the process. The done signal is asserted when the multiplication is complete [5]. IV. PARALLEL MULTIPLIER Figure 3-1: Simulation Results for 4 bit serial Multiplier The goal of the design, to develop an IEEE compliant 32-bit floating point multiplier core was satisfied using a simple methodology where VHDL operators were directly used [11]. The core was desired to implement all four rounding modes, round to nearest, round into +inf, round into -inf and round to zero. All exceptions had to be handled and reported according to the IEEE standard. An arrangement is made using a generic constant K which can be set to 32 or 64, so that this core can be extended to work with double precision format [12][13]. 4.1 Microarchitecture of Parallel Multiplier Figure 4.1 indicates a simple FP Multiplier which is developed based on parallel. It consists of Pre Normalize block, Multiplier, Post Normalize - Round unit and an exception unit [14]. The two floating point numbers opa and opb serve as input to the floating point multiplier. 31 2015, IJAFRC All Rights Reserved www.ijafrc.org

A pre-normalization process is carried out in which the sum/difference of exponents is computed; checking for exponent overflow, underflow condition and INF value on an input is done. 4.1 Design description and testing Figure 4-1: Block Diagram for Floating Point Multiplier This multiplier also adopts VHDL for design entry code. The system uses code which is mixed type description, It is combination of dataflow and behavioral style description. Multiple processes are used in this design. The process are used to load the input, adjust the pre- normalize the input, unpack the floating point number. After initial checking and multiplication, the process is used to post- normalize and pack the generated output. The round-off and exception handling is also done. The simple strategy is used for testing where a combination of inputs is given. This allows exploring features of the design. V. RESULTS FOR PARALLEL FLOATING POINT MULTIPLIER: The single precision IEEE compliant multiplier referred as Parallel Multiplier. 5.1.1Synthesis Outcomes: Table 5.1 presents the device utilization summary for the Parallel Multiplier. Table 5.1: Device Utilization Summary for Parallel Multiplier 32 2015, IJAFRC All Rights Reserved www.ijafrc.org

Logic Utilization Parallel Multiplier Number of Slice Flip Flops 96 Number of 4 input LUTs 256 Number of occupied Slices 148 Number of bonded IOBs 104 Total equivalent gate count for design 2985 Table 5.2 presents the timing summary indicating the highest possible speed for the Parallel Multiplier can work for the XC3S500e. Table 5.2: Timing Summary for Parallel Multiplier Timing Parameter Minimum period: 17.323 ns Maximum Frequency 57.725 MHz 5.1.2 Simulation waveforms The Figure 5.1 is the simulated output for the Parallel Multiplier. The inputs are tested for the combinations as in Table 4.8. The results in simulation waveform are similar to that of hand calculations which are verified [9][10] and presented. The signals have the interpretation as in the Table. Every time the ce signal is made high, the operands are loaded into the multiplier on rising edge of clock. The computation takes place using multiplication operator in a single cycle. The done signal is asserted only after successful, valid multiplication operation[15][16]. In case of exception handling, input operand being zero or NaN, this done signal remains low. Figure 5-1: Simulation waveform for the Parallel Multiplier VI. COMPARISON BETWEEN SERIAL AND PARALLEL MULTIPLIER 33 2015, IJAFRC All Rights Reserved www.ijafrc.org

Table 6.1 and Table 6.2 presents the comparison of device utilization and timing summary for the serial and the parallel multiplier. The serial multiplier is a simple multiplier which works with serial adder and a serial parallel multiplier. The parallel multiplier is a single precision IEEE compliant multiplier. It makes use of addition and multiplication operands to serve the purpose. Both work with 32 bits, have exception handlers, so not much difference is observed between device utilization resources. Table 6.1: Logic Utilization for Serial multiplier and Parallel multiplier VII. CONCLUSION Logic Utilization Serial multiplier Parallel multiplier Number of Slice Flip Flops 61 96 Number of 4 input LUTs 223 256 Number of occupied Slices 113 148 Number of bonded IOBs 223 104 Total equivalent gate count for design 2316 2985 Table 6.2: Timing Summary for Serial multiplier and Parallel multiplier Logic Utilization Serial multiplier Parallel multiplier Minimum period: 8.076 ns 17.323 ns Maximum Frequency 123.823 MHz 57.725 MHz This work presented design, synthesis and simulation of a 32 bit floating point multipliers. Two different algorithms, serial and parallel for high speed hardware multipliers were studied and chosen for implementation. The serial multiplier used an add and shift algorithm for multiplication, while parallel multiplication uses multiplication operand. The parallel multiplier is a IEEE compliant 32-bit floating point multiplier satisfying all the requirements of rounding and exception handling. The concept of control circuit to control the arithmetic network was used for implementation of multiplier. The design were developed using Xilinx ISE environment and VHDL was used for design entry. The modules were targeted for FPGA implementation and XC3S500e was chosen for this purpose. The proposed designs were exhaustively tested and the calculations were verified with previous results. A comparative analysis was done for both the multipliers. The device utilization is almost the same considering the features of parallel multiplier. The serial multiplier operates at high speed compared with the parallel configuration, but the throughput is less. VIII. REFERENCES [1] [2] Roth Charles H., Digital System Design Using VHDL. singapore: Thomson, 2001. William Fletcher, An engineering Approach to Digital Design.: Prentice Hall, 2005. [3] P Addanki and M Avana Venkat A., "An FPGA based high speed IEEE-754 Double Precision Floating Point Adder/Substractor and Multiplier Using Verilog," International Journal of Advanced Science and Technology, vol. 52, pp. 61-74, March 2013. [4] Alvaro Vazquez, "High Performance Decimal Foating point Units," University of Santiago, PhD thesis Jan 2009. 34 2015, IJAFRC All Rights Reserved www.ijafrc.org

[5] Surapong Pongyupinpanich, "Optimal Design of Fixed-Point and Floating-Point Arithmetic Units for Scientific Applications," Darmstadt Univeristy, Germany, PhD Thesis 2012. [6] Hossam A. H. Fahmy, "A Redundant Digit Floating Point System," Stanford University, PhD thesis 2003. [7] Anjana S and Philip Samuel Pradeep C., "Synthesize of High Speed Floating-point Multipliers Based on Vedic Mathematics," in ICICT-2014, 2015, pp. 1294-1302. [8] Galal S and M. Horowitz, "Energy-Efficient Floating-Point Unit," in IEEE Transactions on Computers, 2011, pp. 913-922. [9] Concordia University, "Floating Point Adders and Multipliers". [10] Eduardo Sanchez. Floating-Point Multipliers. [11] Sukhvir Kaur and Parminder Singh Jassal, "Synthesis Of Double Precision Float-Ing Point Multiplier Using VHDL," Journal of Research in Electrical and Electronics Engineering, vol. 2, no. 2, pp. 33-39, March 2014. [12] P.Krishna Kumari, V.Vamsi Krishna,T.S.Trivedi P.Gayatri and V.Nancharaiah, "Design of Floating Point Multiplier Using Vhdl," International Journal of Engineering Research and Development, vol. 10, no. 3, pp. 73-78, March 2014. [13] Bernie New and Bob Slous Tom Kean, "A Fast Constant Coefficient Multiplier for the XC6200," Xilinx, Application Note. [14] Baljinder Kaur and Vipasha Thakur, "Review of Booth Algorithm for Design of Multiplier," International Journal of Emerging Technology and Advanced Engineering, vol. 4, no. 4, pp. 134-137, April 2014. [15] Steve Wong, J Martin Dr David Parent, "A 6 Bit Multiplier for a DSP SOC,". [16] Prashanth, P.A. Kumar, and G. Sreenivasulu, "Design & implementation of floating point ALU on a FPGA processor," in International Conference on Computing, Electronics and Electrical Technologies (ICCEET), Kumaracoil, 2012, pp. 772-776. 35 2015, IJAFRC All Rights Reserved www.ijafrc.org