International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages-3529-3538 June-2015 ISSN (e): 2321-7545 Website: http://ijsae.in Efficient Architecture for Radix-2 Booth Multiplication Using 4:2 Compressors Authors Deepak Patidar 1, J. Chitode 2 1 Dept. of Electronics BVDUCOEP Pune, India ABSTRACT 2 Dept. of Electronics, BVDUCOEP Pune, India Email- Patidar.deepak411@gmail.com j.chitode@gmail.com Binary multiplication is the integral part of the Multiplication and Accumulation unit (MAC). MAC has a larger delay than simple ALU operations. So this paper focuses on design of multiplication unit using Modified Radix-2 Booth multiplication algorithm. This paper contains the basic half adder, full adder, 4:2 compressors and Carry Look-ahead adder. The combination of all this adders makes the architecture for the proposed paper. To demonstrate the design method, an 8 by 8 modified radix-2 booth recoded multiplier was implemented on reconfigurable hardware. The design architecture is written using VHDL code by Xilinx ISE14.7 tool and is implemented on Spartan-6 device. Keyword MAC, Radix-2 Booth multiplication algorithm, 4:2 Compressors 1. INTRODUCTION Day by day the speed demands of the processors are increasing, so according to the speed and area multipliers can be categories as follows: Shift and add multipliers: Multiplication is the process of adding and shifting, so these are the simplest type of multipliers. These are also known as serial multipliers because these multipliers calculate the multiplication bit by bit. So for 8 bit multiplication it takes minimum 8 clock cycles, but these multipliers take very less area. Array multipliers: As the name indicates these types of multipliers has array of full adders, which are arranged in appropriate manner to calculate the multiplication, so array multipliers has regular structure. Due to regular structure of full adders array multipliers has high performance but due to same reason it takes very large area Tree multipliers: If multiplication of higher number of binary bits is requires, then this type of multipliers are used. It uses the same structure of array multiplier but iteratively. So interconnections of such array multipliers becomes very complicate and overall structure also become very large. But its advantage is that it can give high performance. Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3529

So in a efficient multiplier should contains the advantages of all three types of multiplier, so the architecture can become fast and area efficient. This project presents a new multiplier architecture which is smaller and faster than linear array multipliers, and more regular than traditional multiplier trees. 4:2 compressors (adder) are used as main component of the architecture, which takes four inputs and give two outputs as sum and carry. Half adders and Full adders are also used in this architecture. In contrast, the carry Look-ahead adder is used for adding 16 bits. Carry Look-ahead adder give 16 bit result, which is the final result of the multiplier. There are mainly three processes in any binary multiplication partial product generation, reduction of partial products and addition of these partial products. So by using modified Radix-2 Booth multiplication algorithm, partial products are generated. In terms of algorithm this process is known as recoding of multiplier bits. Reduction of partial products is done by half adders, full adders and 4:2 compressors. And delay due to addition of partial products is reduced by Carry look-ahead adder. 2. MODIFIED RADIX-2 BOOTH ALGORITHM Radix-2 means the algorithm takes 2 bits together and performs the appropriate operation. In this algorithm only a 0 bit is added after the multiplier and then makes the pair of two bits from LSB. Depending on the pair the, following operations are done: Modified Booth Recoding Radix-2 Where A = Multiplier B = Multiplicand 0 = put 0000 +1 = Multiplicand bits -1 = 2 s compliment of multiplicand bits Multiplier bits A( two bit pair) Recoded digits Operation on B 00 0 0*B 01 +1 +1*B 10-1 -1*B 11 0 0*B If bit pair is 00 or 11 then direct put 0000 on lower half of the product ( initially product is 0000 0000 ) If bit pair is 01 then put the value of multiplicand on lower half of the product If bit pair is 10 then put the 2 s compliment value of multiplicand on lower half of the product (After every cycle shift the product bits to the left by one) Let s take an example: A= +4 (0100) multiplier and B= +3 (0011) multiplicand and initially take product p = 0000 0000 Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3530

Now put a 0 after multiplier bits and pair of 2 bits from the LSB. First pair 00, so 0*B put 0000 Second pair is also 00, so 0*B put 0000 Third pair is 10, so -1*B put 1101 (2 s compliment of multiplier bits) Fourth pair is 01, so +1*B put 0011 (multiplier bits) The number of partial products (summands) is reduced by bit pair recoding of the multiplier bits. 3. DIFFERENT ADDERS USED IN PROPOSED ARCHITECTURE Below some adders are given which are used in our paper for binary multiplication. Half Adder: It is used for adding two binary numbers according to the following equations: S = A xor B C = A and B Where A and B are inputs and S (sum) and C (carry) are outputs. Fig. 1 Half Adder Full Adder: It is used for adding three binary numbers according to the following equations: S = A xor B xor C in C = (A and B) or (B and C in ) or (A and C in ) Where A, B and C in are inputs and S (sum) and C (carry) are the outputs. Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3531

Fig. 2 Full Adder 4:2 Compressors: It is used for adding four binary numbers simultaneously. Its architecture is shown below, in which it takes four inputs(i1, I2, I3, I4) with one carry in (C in ) and gives two outputs SUM and CARRY with one carry out (C out ). Fig. 3 4:2 compressor Detail diagram of 4:2 compressor is shown below in figure (d), which is comprised of two full adders. [4] There are many architectures of 4:2 compressors, but this architecture (using full adders) produces lowest delay comparing to others. Fig. 4 4:2 compressor using full adders Here SUM and CARRY are used as result of 4:2 compressors and C out is forwarded to next 4:2 compressor as C in. Carry Look-ahead Adder (CLA): The difference between above three adders (half adder, full adder and 4:2 compressor) and Carry Look-ahead adder is that CLA is used for adding same number of input bits and give same number of output bits, like for 16 bit inputs it gives 16 bit output. While half adder, full adder and 4:2 compressors takes 2, 3 and 4 input bits and gives two bits output (sum and carry). Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3532

There are many types of adder for adding 16 bits like Ripple carry adder (RCA), Carry save adder (CSA) etc. But CLA is the fastest in all of them.[7] In our paper we have used 16 bit CLA of structural type, in which first 4 bit CLA is used as a component and then such 4 CLAs are cascaded in appropriate manner to make 16 bit CLA. Fig. 5 4 bit CLA Above figure 5 shows the 4 bit CLA in which sum bits are generated as in full adder according to the following equation: S i = A i B i c i In CLA carry is generated differently, that s why it is known as Carry look-ahead adder. [5] The equations for carries are below: c 1 = G 0 + P 0 c 0 c 2 = G 1 + P 1 G 0 + P 1 P 0 c 0 c 3 = G 2 + P 2 G 1 + P 2 P 1 G 0 + P 2 P 1 P 0 c 0 c 4 = G3 + P 3 G 2 + P 3 P 2 G 1 + P 3 P 2 P 1 G 0 + P 3 P 2 P 1 P 0 c 0 where P and G are the generate and propagate functions. G i = x i y i and P i = (x i + y i ) 1 bit full adder for 4 bit CLA is shown below in figure 6. Fig. 6 Bit cell of CLA 4. BLOCK DIAGRAM There are mainly three steps of multiplication using modified booth algorithm: i) Partial product generation ii) Reduction of partial products iii) Final sum of partial products Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3533

Thus the process of multiplication is done using above three steps. We can see these steps in the following block diagram. Fig. 7 Multiplication blocks for Modified booth multiplier 5. ARCHITECTURE All the three process of multiplication, partial product generation, partial product reduction and summation of partial production is shown in figure 8 and figure 9. There are some registers of size 8 bits, 16 bits and 17 bits are used to store the result of above three processes. Partial Product Generation: Figure 9 shows an example of 8 by 8 bit multiplication process, in which first part is made of partial products. This process of generation of partial product is done by above explained modified Radix_2 Booth multiplication algorithm. There are 8 registers are used to store these 16 bit partial products. Fig. 8 Register view of Architecture Partial Product Reduction: This process is divided into two stages. In first stage partial products are grouped in pair of two three and four as shown in figure 9. Half adder is used for pair of twos, full adder for Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3534

pair of threes and 4:2 compressor for pair of fours. [7] This entire adder like half adder, full adder and 4:2 compressors produces a sum and a carry which are given to the second stage. So now there are 4 registers of sum and carry in the second stage. If there any bit is unpaired, it is given to the second stage as it is. In second stage we can say that partial products are reduced from 8 registers to 4 registers of 16 bits as shown in figure 9. Same process of making pair of twos, threes and fours and using half adders, full adders and 4:2 compressors is repeated here and the results (sum and carry) are stored in 2 registers of 16 bits. So now the partial products are reduced from 4 registers to 2 registers of 16 bits. Summation of partial Products: In last stage of multiplication a 16 bit carry Look-ahead adder is used to add remaining partial products. The result of this adder is of 17 bits, in which carry bit is discarded if any there and the remaining 16 bits are the result of the whole multiplication process. Note: If there is any carry in process of CLA, then it is discarded as shown in figure 9. Fig. 9 Detailed Architecture 6. RESULT Below figure 10 shows the RTL schematic of 8 by 8 multiplier and figure 11 shows all the stages of multiplication like twos compliment stage, partial product generation, partial product reduction (stage1 and stage 2), and final summation stage(big adder). It is synthesized using Xilinx ISE 14.7 tool. Fig. 10 RTL view of 8 by 8 Multiplier Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3535

Fig. 11 RTL view of all stages of multiplication Figure 12 and 13 showing some results of multiplication: Fig. 12 multiplication Example 1 Where multiplicand A = 00010111 Multiplier B= 00100100 And Result = 0000001100111100 Fig. 13 Multiplication Example 2 Where multiplicand A = 00110101 Multiplier B= 00010101 And Result = 0000010001011001 Below table shows the comparison of proposed architecture with Shift and add multiplier, Wallace tree multiplier and Wallace tree multiplier using 4:2 compressors. Structure (8 by 8 multiplier) Logic Delay (ns) Route Delay (ns) Total Delay (ns) Shift and Add Multiplier 11.884 12.432 24.876 Wallace Tree Multiplier 14.346 8.141 22.487 Wallace Tree multiplier using compressors 13.580 8.131 21.711 Proposed Architecture 5.310 9.967 15.277 7. CONCLUSION The aim of this paper is to design an 8 by 8 bit multiplier, was successfully achieved using structural design on Spartan-6 FPGA. The multiplication delay was 15.227ns. The circuit was broken down into basic blocks in order to eventually combine them and generate the final multiplier schematic design. There are many Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3536

structures of 4:2 compressors, among them best one is used in terms of speed. The proposed design is compared with previously made design in terms of area, speed and power, which concludes that proposed design is better than others. This paper work is relevant to the software programming skills, arithmetic algorithms, logic styles, delay reduction techniques. The delay reduction techniques have larger effect on the performance of the multiplier. Therefore it is important to understand their advantages and drawbacks at the beginning of the design. The algorithm selection for implementing a parallel multiplier is also important, the modified radix-2 Booth algorithm is the best choice for the multiplier. The resulting 8 by 8 bit multiplier works correctly and critical path delay satisfies the specification on the speed. REFERENCES 1. Computer Organization by Safwat Zaky and Zvonko Vranesic, 5 th Edition, (ISBN: 0-07-112218-4), 2002. 2. Mr.M.V.Sathish and Mrs. Sailaja, VLSI Architecture of Parallel-Accumulator based on Radix 2 Modified Booth Algorithm, International Journal of Electrical and Electronics Engineering (IJEEE), Volume 1, Issue 1, 2011. 3. R. Santhosh Kumar and P. Kalpana Reddy, A New VLSI Architecture of Parallel Multiplier- Accumulator Based on Radix-2 Modified Booth Algorithm, ISSN 22501991, Volume 2, issue 9, Sept 2013 4. Naveen Kr. Gahaln, Prrbhat Shukla Implementation of Wallace Tree Multiplier using Compressor ISSN: 2229-6093, vol 3(3)1194-1199. 5. Rajender Kumar and Sandeep Dahiya, Performance Analysis of Different Bit Carry Look Ahead Adder Using VHDL Environment, ISSN: 2319-5967, International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 4, July 2013. 6. Dakupati.Ravi Sankar1, Shaik Ashraf Ali, Design of Wallace Tree Multiplier by Sklansky Adder, ISSN: 2248-9622, International Journal of Engineering Research and Applications (IJERA), Vol. 3, Issue 1, January -February 2013. 7. VLSI Digital Signal Processing Systems Design and Implementation by Keshab K. Parhi, ISBN: 978-0-471-24186-7, Dec 1998. 8. I.Hussain, R. K. Sah, M. Kumar performance comparision of Wallace Multiplier Architeures IJIRSET, ISSN:2319-8753, vol. 4, issue-1, jan 2105. 9. A. Dandapat, S. Goshal, P. Sarkar, D. Mukhopadhayay A 1.2 ns16 by 16 Binary Multiplier using High Speed Compressors International Journal of Electrical and Electronics Engineering, 2010. Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3537

10. http://vedic-maths.com/delay-power-performence-comparision-of-multipliers-in-vlsi-circuit-design www.ieeexplore.org. Deepak Patidar, J. Chitode IJSRE Volume 3 Issue 6 June 2015 Page 3538