Synthesis and Simulation of Floating Point Multipliers Dr. P. N. Jain 1, Dr. A.J. Patil 2, M. Y. Thakre 3

Synthesis and Simulation of Floating Point Multipliers Dr. P. N. Jain 1, Dr. A.J. Patil 2, M. Y. Thakre 3 1Professor and Academic Dean, Department of E&TC, Shri. Gulabrao Deokar College of Engineering, Jalgaon, India. 2Principal, Department of E&TC, Shri. Gulabrao Deokar College of Engineering, Jalgaon, India. 3PG Student,Department of E&TC, Shri. Gulabrao Deokar College of Engineering, Jalgaon, India Email- jainpnj@gmail.com, principal@sgdcoejalgaon.org, thakre.mayuri23@gmail.com A B S T R A C T Performance of floating point arithmetic units is of prime importance in several areas of computing, signal processing and medical imaging. The binary representation of decimal floatingpoint numbers permits an well organized application of the advanced radix independent IEEE standard for floating-point arithmetic. Multiplication is a representative and core operation which demands high performance and area efficient implementation. A Binary multiplier is an integral part of the arithmetic logic unit (ALU) scheme found in various processors. Integer multiplication is likely to be inefficient and costly, in time and hardware, depending on the illustration of signed numbers. In this project, methods of implementing floating point multiplication are explored. 754 standards. The prime objective is to develop a multiplier which is compliant with IEEE floating-point standard. For this purpose two different algorithms were studied and the design was developed for a serial multiplier and parallel multiplier. The concept of control circuit to control the arithmetic network was used for implementation of multiplier. This work is primarily aimed for implementation of floating-point multiplication on FPGA platform using VHDL due to the logic resources available and flexibility of implementation. The Xilinx ISE 9.2i was used for the purpose. Use of VHDL provided a technology independent hardware design. The design was targeted for the SPARTAN3 FPGA and the device chosen was XC3S500e-5pq208. A test-bench was designed and the results were verified by hands-on calculations and published results. The workings of these two multipliers were compared on basis of device utilization, timing summary and throughput of the design. Improvement in performance of present multiplier will be possible by pipelining with trade-off related to area and power. I. FLOATING POINT MULTIPLICATION A clean floating point multiplication for two binary numbers can be seen below mathematically. (F1 X 2 E 1) X (F2 X 2 E 2) = (F1 x F2) X (2 (E 1+ E 2 ) =F x 2 E The basic steps involved in FP multiplication are as follows: 1. Add exponents 2. Multiply fractions 3. If product is zero, adjust for proper zero 4. Normalize product fraction 5. Check for proponent overflow or underflow 6. Round product fraction 27 2015, IJAFRC All Rights Reserved www.ijafrc.org

Thus multiplication includes the addition of exponents and multiplication of fractions. The product fraction is then normalized and exponent overflow or underflow is observed. Finally the product fraction is rounded. The flow chart diagram for floating point multiplication is shown in Figure 1 [1]. Figure 1: Flow Chart Diagram for Floating Point Multiplication II. SYSTEM IMPLEMENTATION The concept of control circuit to control the arithmetic network was used for implementation of multiplier. This concept is extremely important to coordinate the behavior of the surrounding subsystems. The use of this control unit or system controller idea leads to systematic and structured approaches to digital system design [2]. Two different algorithms for high speed hardware multipliers were studied and chosen for implementation. 1. Serial multiplier 2. Parallel multiplier 2.1 SERIAL MULTIPLIER: The hardware required to implement the FP multiplier consists of exponent adder and a fraction multiplier. The project work started with the development of a simple multiplier unit and an adder circuit for floating point number represented by 4 bit fraction and 4 bit exponent [1]. The basic multiplier 28 2015, IJAFRC All Rights Reserved www.ijafrc.org

unit was modified to function for multiplication of signed binary number. The design of serial adder with accumulator forms section below. The binary multiplier and signed binary multiplier are discussed [1]. This 4 bit multiplier is synthesized, simulated and tested for various combinations and the same system is extended for 32 bit floating point multiplication. 2.2 Design of an adder for serial multiplier: A serial adder with accumulator as in [1]was used for this purpose. The block diagram for a 4 bit serial adder with control circuit is indicated in Figure 2.2. Figure 2-2: Block diagram for a 4 bit serial adder Two shift registers are used to control the 4-bit numbers to be added, X and Y. The box at the left end of each shift register represents the inputs: Sh (shift), SI (serial input), and Clock. When Sh =1 and the clock is pulsed, SZis entered into x, (or y,) as the contents of the register are shifted right one position. The X- register serves as the accumulator, and after four shifts, the number X is replaced with the sum of X and Y. The addend register is connected as a cyclic shift register, so after four shifts it is back to its original state and the number Y is not lost. The serial adder consists of a full adder and a carry flip-flop. At each clock time, one pair of bits is added. When Sh = 1, the falling edge of the clock shifts the sum bit into the accumulator, stores the carry bit in the carry flip-flop, and causes the addend register to rotate right. Additional connections needed for initially loading the X and Y registers and clearing the carry flipflop[5]. 2.3 Design of a Multiplier unit: The Figure 2.3 shows the hardware required to multiply two 4 bit fractions. Multiplication of two 4-bit numbers needs a 4-bit multiplicand register, a 4-bit multiplier register, a 4-bit full adder, and an 8-bit register for the product. The product register present as an accumulator to accumulate the sum of the partial products. The contents of the product register are shifted to the right each time, as shown in the block diagram of Figure 2.4 This type of multiplier is sometimes mention as a serial-parallel multiplier, since the multiplier bits are processed serially, but the addition takes place in parallel. Depending on the multiplier bit, whether 0 or 1, shift or shift and add operation takes place. Finally the ACC register contains the product when multiplication is complete. 29 2015, IJAFRC All Rights Reserved www.ijafrc.org

2.4 Design description and testing Figure 2-3: 4 bit multiplier The VHDL code for the system uses behavioral style description. Two processes are used in this description. The main process generates control signals for the system. A second process generates the control signals for the fraction multiplier. Generation of unwanted latches was avoided by initializing of output signals. The testing for the 4 bit FP multiplier unit is done to account for all for all the special cases in combination with positive and negative fractions, as well as positive and negative exponents [8]. III. PERFORMANCE ANALYSIS This presents synthesis and simulation outcomes of a floating point multipliers. 3.1 RESULTS FOR OF SERIAL FLOATING POINT MULTIPLIER A 4 bit multiplier is synthesized and tested thoroughly and the same design is extended for a 32 bit multiplier. 3.1.1 Synthesis Outcomes: The Table 3.1 and Table 3.2 show the device utilization and timing summary for the 4 bit and 16 bit serial floating multiplier. It can be seen that the 16 bit multiplier has much higher device utilization which is obvious. The devices utilization is proportional to bit size of multiplier. The timing summary shows that the maximum clock frequency at which multiplier will work is almost same [6]. It must be recollected that since this is serial multiplier, the final output i.e. product will be generated only after complete multiplication. Thus if it is N bit multiplier, it is likely to take (2N+1) cycle for final product. Thus throughput of the machine is low [7]. Table 3.1: Device Utilization for 4 bit and 32 bit serial Multiplier Logic Utilization 4 bit multiplier 32 bit multiplier Number of Slice Flip Flops 18 61 30 2015, IJAFRC All Rights Reserved www.ijafrc.org

Number of 4 input LUTs 61 223 Number of occupied Slices 33 113 Number of bonded IOBs 32 223 Total equivalent gate count for design 628 2316 Table 3.2: Timing Summary for 4 and 32 bit Serial Multiplier Timing Parameter 4 bit multiplier 32 bit multiplier Minimum period: 8.004 ns 8.076 ns Maximum Frequency 124.938 MHz 123.823MHz 3.1.2 Simulation waveforms: The Figure 3.1. Indicates the results obtained after simulation of the 4 bit multiplier. The inputs are tested for the combinations [3][4]. The signals have the interpretation as in Table 3.1.The start signal when high is generated to start the multiplication which takes 9 cycles to complete the process. The done signal is asserted when the multiplication is complete [5]. IV. PARALLEL MULTIPLIER Figure 3-1: Simulation Results for 4 bit serial Multiplier The goal of the design, to develop an IEEE compliant 32-bit floating point multiplier core was satisfied using a simple methodology where VHDL operators were directly used [11]. The core was desired to implement all four rounding modes, round to nearest, round into +inf, round into -inf and round to zero. All exceptions had to be handled and reported according to the IEEE standard. An arrangement is made using a generic constant K which can be set to 32 or 64, so that this core can be extended to work with double precision format [12][13]. 4.1 Microarchitecture of Parallel Multiplier Figure 4.1 indicates a simple FP Multiplier which is developed based on parallel. It consists of Pre Normalize block, Multiplier, Post Normalize - Round unit and an exception unit [14]. The two floating point numbers opa and opb serve as input to the floating point multiplier. 31 2015, IJAFRC All Rights Reserved www.ijafrc.org

A pre-normalization process is carried out in which the sum/difference of exponents is computed; checking for exponent overflow, underflow condition and INF value on an input is done. 4.1 Design description and testing Figure 4-1: Block Diagram for Floating Point Multiplier This multiplier also adopts VHDL for design entry code. The system uses code which is mixed type description, It is combination of dataflow and behavioral style description. Multiple processes are used in this design. The process are used to load the input, adjust the pre- normalize the input, unpack the floating point number. After initial checking and multiplication, the process is used to post- normalize and pack the generated output. The round-off and exception handling is also done. The simple strategy is used for testing where a combination of inputs is given. This allows exploring features of the design. V. RESULTS FOR PARALLEL FLOATING POINT MULTIPLIER: The single precision IEEE compliant multiplier referred as Parallel Multiplier. 5.1.1Synthesis Outcomes: Table 5.1 presents the device utilization summary for the Parallel Multiplier. Table 5.1: Device Utilization Summary for Parallel Multiplier 32 2015, IJAFRC All Rights Reserved www.ijafrc.org

Logic Utilization Parallel Multiplier Number of Slice Flip Flops 96 Number of 4 input LUTs 256 Number of occupied Slices 148 Number of bonded IOBs 104 Total equivalent gate count for design 2985 Table 5.2 presents the timing summary indicating the highest possible speed for the Parallel Multiplier can work for the XC3S500e. Table 5.2: Timing Summary for Parallel Multiplier Timing Parameter Minimum period: 17.323 ns Maximum Frequency 57.725 MHz 5.1.2 Simulation waveforms The Figure 5.1 is the simulated output for the Parallel Multiplier. The inputs are tested for the combinations as in Table 4.8. The results in simulation waveform are similar to that of hand calculations which are verified [9][10] and presented. The signals have the interpretation as in the Table. Every time the ce signal is made high, the operands are loaded into the multiplier on rising edge of clock. The computation takes place using multiplication operator in a single cycle. The done signal is asserted only after successful, valid multiplication operation[15][16]. In case of exception handling, input operand being zero or NaN, this done signal remains low. Figure 5-1: Simulation waveform for the Parallel Multiplier VI. COMPARISON BETWEEN SERIAL AND PARALLEL MULTIPLIER 33 2015, IJAFRC All Rights Reserved www.ijafrc.org

Table 6.1 and Table 6.2 presents the comparison of device utilization and timing summary for the serial and the parallel multiplier. The serial multiplier is a simple multiplier which works with serial adder and a serial parallel multiplier. The parallel multiplier is a single precision IEEE compliant multiplier. It makes use of addition and multiplication operands to serve the purpose. Both work with 32 bits, have exception handlers, so not much difference is observed between device utilization resources. Table 6.1: Logic Utilization for Serial multiplier and Parallel multiplier VII. CONCLUSION Logic Utilization Serial multiplier Parallel multiplier Number of Slice Flip Flops 61 96 Number of 4 input LUTs 223 256 Number of occupied Slices 113 148 Number of bonded IOBs 223 104 Total equivalent gate count for design 2316 2985 Table 6.2: Timing Summary for Serial multiplier and Parallel multiplier Logic Utilization Serial multiplier Parallel multiplier Minimum period: 8.076 ns 17.323 ns Maximum Frequency 123.823 MHz 57.725 MHz This work presented design, synthesis and simulation of a 32 bit floating point multipliers. Two different algorithms, serial and parallel for high speed hardware multipliers were studied and chosen for implementation. The serial multiplier used an add and shift algorithm for multiplication, while parallel multiplication uses multiplication operand. The parallel multiplier is a IEEE compliant 32-bit floating point multiplier satisfying all the requirements of rounding and exception handling. The concept of control circuit to control the arithmetic network was used for implementation of multiplier. The design were developed using Xilinx ISE environment and VHDL was used for design entry. The modules were targeted for FPGA implementation and XC3S500e was chosen for this purpose. The proposed designs were exhaustively tested and the calculations were verified with previous results. A comparative analysis was done for both the multipliers. The device utilization is almost the same considering the features of parallel multiplier. The serial multiplier operates at high speed compared with the parallel configuration, but the throughput is less. VIII. REFERENCES [1] [2] Roth Charles H., Digital System Design Using VHDL. singapore: Thomson, 2001. William Fletcher, An engineering Approach to Digital Design.: Prentice Hall, 2005. [3] P Addanki and M Avana Venkat A., "An FPGA based high speed IEEE-754 Double Precision Floating Point Adder/Substractor and Multiplier Using Verilog," International Journal of Advanced Science and Technology, vol. 52, pp. 61-74, March 2013. [4] Alvaro Vazquez, "High Performance Decimal Foating point Units," University of Santiago, PhD thesis Jan 2009. 34 2015, IJAFRC All Rights Reserved www.ijafrc.org

[5] Surapong Pongyupinpanich, "Optimal Design of Fixed-Point and Floating-Point Arithmetic Units for Scientific Applications," Darmstadt Univeristy, Germany, PhD Thesis 2012. [6] Hossam A. H. Fahmy, "A Redundant Digit Floating Point System," Stanford University, PhD thesis 2003. [7] Anjana S and Philip Samuel Pradeep C., "Synthesize of High Speed Floating-point Multipliers Based on Vedic Mathematics," in ICICT-2014, 2015, pp. 1294-1302. [8] Galal S and M. Horowitz, "Energy-Efficient Floating-Point Unit," in IEEE Transactions on Computers, 2011, pp. 913-922. [9] Concordia University, "Floating Point Adders and Multipliers". [10] Eduardo Sanchez. Floating-Point Multipliers. [11] Sukhvir Kaur and Parminder Singh Jassal, "Synthesis Of Double Precision Float-Ing Point Multiplier Using VHDL," Journal of Research in Electrical and Electronics Engineering, vol. 2, no. 2, pp. 33-39, March 2014. [12] P.Krishna Kumari, V.Vamsi Krishna,T.S.Trivedi P.Gayatri and V.Nancharaiah, "Design of Floating Point Multiplier Using Vhdl," International Journal of Engineering Research and Development, vol. 10, no. 3, pp. 73-78, March 2014. [13] Bernie New and Bob Slous Tom Kean, "A Fast Constant Coefficient Multiplier for the XC6200," Xilinx, Application Note. [14] Baljinder Kaur and Vipasha Thakur, "Review of Booth Algorithm for Design of Multiplier," International Journal of Emerging Technology and Advanced Engineering, vol. 4, no. 4, pp. 134-137, April 2014. [15] Steve Wong, J Martin Dr David Parent, "A 6 Bit Multiplier for a DSP SOC,". [16] Prashanth, P.A. Kumar, and G. Sreenivasulu, "Design & implementation of floating point ALU on a FPGA processor," in International Conference on Computing, Electronics and Electrical Technologies (ICCEET), Kumaracoil, 2012, pp. 772-776. 35 2015, IJAFRC All Rights Reserved www.ijafrc.org