Hardware Implementation of 16*16 bit Multiplier and Square using Vedic Mathematics Abhijeet Kumar Dilip Kumar Siddhi Lecturer, MMEC, Ambala Design Engineer, CDAC, Mohali Student, PEC Chandigarh abhi_459@yahoo.co.in dilipkant@rediffmail.com siddhisri_21@yahoo.com ABSTRACT Multiplication and square are elementary mathematical operations extremely important for core computing process. Also exponentiation, the process of raising a base number to a power is an important operation in many numerical computations. To keep pace with the technology, high speed applications require faster methods of multiplication and Square architecture. This paper reports a new faster algorithm for multiplication and square based on ancient Indian mathematics, called Vedic Mathematics. The design for the architecture of 16*16 bit multiplier and square is proposed and described using VHDL hardware description language. The code description is simulated using ModelSim SE 5.7f and synthesized using ISE Xilinx 9.2i for the FPGA device Spartan XC3S500e-fg320, Speed Grade-4. The synthesis showed reduced time delay for the multiplier and square. The proposed design is also compared with other existing methods, resulting in improved efficiency in both speed and area. I. INTRODUCTION With the rapid development and wide application of computer technology, high performance applications have become extremely popular in modern computer systems, requiring enhanced computation capabilities at low cost and power consumption. Also, in contemporary signal processing and communication applications, a high throughput rate and numerical accuracy is often demanded. Since multiplication is the most critical function to be carried out by the processor which requires more number of steps for the computation, and limits the overall performance of the system. An improvement in the multiplier architecture is therefore in need for high speed applications. There are several algorithms for multiplication such as: Booth, carry-save, array, modified Booth and Wallace tree. A large number of possible architectures have been developed in accordance with these algorithms indicating good performance efficiency. In an array multiplier, a combinational circuit is utilized to multiply two binary numbers. The architecture resembles an array which is also an efficient layout of combinational architecture. All of the product bits are obtained simultaneously resulting in a faster method. However, it requires a large number of gates and for this reason it is less economical. Another method to have improved efficiency of multiplier is the arrangement of adders, that is, tree method. There are two algorithms based on this method: carry-save array (CSA) and Wallace tree. In CSA method, bits are added one by one and give the carry input signal to another adder at one bit higher position, thus forming a shape of a tree. Its limitation arises due to the fact that the execution time depends upon the number of bits of the multiplier. For this reason it is not suited to high speed operation. In the Wallace tree method, a three input Wallace tree circuit supplies the output signal to the next stage full adder of the same bit, and the carry output signal to the next stage of the full adder located at one bit higher position. The drawback of this method is that the layout is not easy; also the circuit is quite irregular. Another improvement in the multiplier is by reducing the numbers of partial products generated. One such multiplier is Booth recoding multiplier. It examines three bits; two from the present pair and one from the high order bit of an adjacent lower order pair; and converts them by Booth logic into a set of five control signals used by the adder cells in the array to control the operations performed by the adder cells. The high performance of the Booth multiplier comes with the drawback of power consumption. The reason is the large number of adder cells. This paper presents a novel architecture for 16*16 bit multiplier and square attempting to provide the solution of the aforesaid problems by adopting the aphorisms of Vedic Mathematics called Urdhva Tiryakbhyam and duplex property. This paper is organized in the following way: Section 2 presents related work done on the implementation of Vedic Sutras. Section 3 provides the brief description of the Vedic Mathematics and its aphorisms or sutras. Section 4 provides an overview of the multiplication method and architecture of the proposed multiplier and square also. International Conference on Signal, Image and Video Processing (ICSIVP) 2012 309
Section 5 demonstrates the results from the synthesis tool and the comparative study of the proposed architecture and other existing architectures. Section 6 concludes the paper. II. RELATED WORK Similar work was presented by Chidgupkar et. al [1] for multiplication of two decimal numbers using Vedic Sutra. But the design was implemented in assembly language on 8085 and 8086 microprocessors for the exploration in Digital Signal Processing.. Another work reported in the literature by Thapliyal et. al [2] was the development of a time-area-power efficient multiplier architecture based on Vedic Mathematics. The design implementation was described in both at gate level and high level RTL code using Verilog hardware description language and tested using Veriwell Simulator. Another work discussed by Singh et. al[3] gave an introduction of various Vedic sutras and their specific utility and proposed a general method to perform any multiplication. This work attempts to formulate an interactive general strategy for the design and hardware implementation of an 8*8 bit multiplication method based on principles of Vedic Mathematics. III. VEDIC MATHEMATICS Vedic mathematics is the name given to the ancient system of mathematics, or, to be precise, a unique technique of calculations based on simple rules and principles with which any mathematical problem can be solved be it arithmetic, algebra, geometry or trigonometry. The system is based on 16 Vedic sutras or aphorisms, which are actually word formulae describing natural ways of solving a whole range of mathematical problems. Vedic mathematics was rediscovered from the ancient Indian scriptures between 1911 and 1918 by Sri Bharati Krishna Tirthaji (1884-1960), a scholar of Sanskrit, mathematics, history and philosophy [5]. He studied these ancient texts for years and, after careful investigation, was able to reconstruct a series of mathematical formulae called sutras. Vedic mathematics is the easy and natural way to do mathematics. It helps increase speed, accuracy and analytical power and answers appear in one line. One other advantage of Vedic mathematics is that it offers choices. The same calculation can be done by different methods. This way, Vedic mathematics actually helps in holistic development of the brain and children become more creative, inventing their own methods and understanding what they are doing. There is also often a choice about whether to calculate from left to right or from right to left. The Vedic mathematics approach is totally different and considered very close to the way a human mind works. A large amount of work has so far been done in understanding various methodologies. The Sutras apply to and cover each and every part of each and every chapter of each and every branch of mathematics (including arithmetic, algebra, geometry, trigonometry, astronomy, calculus etc. In fact, there is no part of mathematics, pure or applied, which is beyond their jurisdiction. Even if applications of the Sutras were demonstrated in all the main areas of modern mathematics we would still probably not understand why and how the Sutras form a basis for mathematics in general. The only option is to show that the Sutras themselves have some sort of universality and can thereby form a set of principles that inevitably cover all of mathematics. In fact we can go further and say that if sixteen Sutras cover all of mathematics they must express universal principles and they must in some way form a complete set. With so many advantages, Vedic Mathematics provides with the possibility of solving the same problem in different alternative ways. A. Multiplication Method IV. METHODLOGY One of the aphorisms of Vedic Mathematics implied for multiplication is Urdhva Tiryakbhyam (Vertical and Crosswise) which is also the foundation of the proposed design. It is based on a novel concept through which the generation of all partial products can be done with the concurrent addition of these partial products. The parallelism in generation of partial products and their summation is obtained by vertical and crosswise multiplication and addition. According to this algorithm, 4*4 bit multiplication can be carried out in the following way: Firstly, least significant bits are multiplied which give s the least significant bit of the product (vertical). Then, the LSB of the multiplicand is multiplied with the next higher bit of the multiplier and added with the product of LSB of multiplier and next higher bit of the multiplicand (crosswise). The sum gives second bit of the product and the carry is added in the output of next stage sum obtained by the crosswise and vertical multiplication and addition of three bits of the two numbers from least significant position. Next, all the four bits are processed with crosswise multiplication and addition to give the sum and carry. The sum is the corresponding bit of the product and the carry is again added to the next stage multiplication and addition of three bits except the LSB. The same operation continues until the multiplication of the two MSBs to give the MSB of the product. To make the methodology more clear, an alternate illustration is given with the help of line diagrams in figure 1 where the dots represent bit 0 or 1. International Conference on Signal, Image and Video Processing (ICSIVP) 2012 310
FIGURE 1. Line diagrams for four bits. According to this example, the digits on the two sides of line are multiplied and the result is added in the previous carry. When more than one line is in the step, all the results are added with the previous carry and the process is thus continued. Initially, the previous carry is equal to zero. A unit place digit of addition result is one of the digits in the answer; this is derived from full multiplication, while the remaining digits act as a carry. If the numbers of the digits are not same in the multiplier and multiplicand, then the bigger number has to be determined. The number of digits then needs to be counted. The smaller number should be pre-pended with 0s so that both numbers will be of the same digits [6]. FIGURE 2. 8 X 8 Bits decomposed vedic multiplier Architecture for 16x 16 bit multiplier The 16X16 bit multiplier can be structured using 8X8 bit blocks as shown in Figure 3. In this Figure 3 the 16 bit multiplicand A can be decomposed into pair of 8 bits AH-AL. Similarly multiplicand B can be decomposed into BH-BL. Proposed Architecture From the previous discussion, it is clear that the basic building blocks of this multiplier are one bit multipliers and adders. One bit multiplication can be performed through two input AND gate and for addition, full adder can be utilized. The 8x 8 bit multiplier can be structured using 4X4 bit blocks as shown in figure 2. In this figure the 8 bit multiplicand A can be decomposed into pair of 4 bits AH- AL. Similarly multiplicand B can be decomposed into BH- BL. The 16 bit product can be written as: P= A x B= (AH-AL) x (BH-BL) = AH x BH+AH x BL + AL x BH+ AL x BL The outputs of 4X4 bit multipliers are added accordingly to obtain the final product. Thus, in the final stage two adders are also required. FIGURE 3. 16 X 16 Bits decomposed vedic multiplier International Conference on Signal, Image and Video Processing (ICSIVP) 2012 311
The outputs of 8X8 bit multipliers are added accordingly to obtain the 32 bits final product. Thus, in the final stage two adders are also required. B. Square Method In most of the computations, the multiplier unit is used to compute the square of an operand. Since square is a special case of multiplication, a dedicated square hardware will significantly improve the computation time. The squaring algorithm makes use of the Duplex or Dwandwa (D) operator. In the Duplex, we take twice the product of the outermost pair, and then add twice the product of the next outermost pair, and so on till no pairs are left. When there are odd number of bits in the original sequence there is one bit left by itself in the middle, and this enters as its square. The outputs of 4X4 bit multiplier and squares are added accordingly to obtain the final product of 16 bits. Thus, in the final stage two carry save adders are also required. Architecture for 16x 16 bit square The 16X16 bit square can be structured using 8X8 bit blocks as shown in Figure 5. In the Figure 5 the 16 bit multiplicand A can be decomposed into pair of 4 bits AH-AL. For a 1 bit number D is its square. For a 2 bit number D is twice their product For a 3 bit number D is twice the product of the outer pair + the square of the middle bit. For a 4 bit number D is twice the product of the outer pair + twice the product of the inner pair. Architecture for 8x 8 bit square The 8X8 bit square can be structured using 4X4 bit blocks as shown in Figure 4. In this Figure 4 the 8 bit multiplicand A can be decomposed into pair of 4 bits AHAL. The 16 bit product can be written as: P = A x A= (AH-AL) x (AH-AL) = AH x AH + 2*(AL x AH) + AL x AL FIGURE 4. 8 X 8 Bits decomposed vedic square FIGURE 5. 16 X 16 Bit decomposed vedic square V. RESULTS AND COMPARISONS The main objective of any design to be implemented on FPGA is the minimum chip area together with reasonable speed. In this study, the algorithm for the proposed design is described in VHDL Hardware Description Language and the logic is tested in ModelSim SE 5.7f simulator. The simulated design is synthesized to gate level, and optimized for speed and area using Xilinx family for the device XILINX: SPARTAN 3E:XC3S500e fg320; Speed Grade: - 4. The proposed architecture shows a faster response than Booth, Array and Wallace tree multipliers which are implemented on the same device. The results for 16X16 bit multipliers are shown in the table 1 and for square and multiplier are shown in table 2. It has been found that for 16X16 bit multiplication, Vedic multiplier is the fastest. For the Xilinx, Spartan 3E family the maximum combinational path delay is found to be 32.850 ns while it is 43.946 ns for array, 47.046 ns for Wallace tree, and 37.041 ns for Booth. As far as the device chip area is concerned, the proposed Vedic multiplier demonstrates a good reasonable area utilized of the FPGA device. The implementation takes a total of 799 logic cells which is better evidence to an efficient implementation. A International Conference on Signal, Image and Video Processing (ICSIVP) 2012 312
comparison histogram of timing delays of the multiplier for 16X16 bit is given in FIGURE 6. Table 1 Comparison results of the multiplier It has been found that for 16X16 bit square and multipliers, Vedic Square is the fastest. For the Xilinx, Spartan 3E family the maximum combinational path delay is found to be 28.357 ns while it is 32.850 ns for Vedic multiplier, 43.946 ns for array, 47.046 ns for Wallace tree, and 37.041 ns for Booth. As far as the device chip area is concerned, the proposed Vedic square demonstrates a good reasonable area utilized of the FPGA device. The implementation takes a total of 397 logic cells which is better evidence to an efficient implementation. A comparison histogram of timing delays of the square with respect to multipliers for 16X16 bit is given in Figure 7. The result shows that Vedic multiplier dominates over Array, Booth and Wallace tree multipliers. Figure 7. Comparison of square with multipliers with respect to delay in spartan 3E FPGA The result shows that Vedic square dominates over Vedic Multiplier, Array, Wallace tree and Booth multipliers. VI. CONCLUSIONS FIGURE 6. Comparison of multipliers with respect to time delay in spartan 3E FPGA Table 2. Comparison of square with multipliers The need for high speed processing has been increasing as a result of expanding signal processing and computer applications. Since in performing multiplication a computer spends a considerable amount of its processing time, an improvement in the speed for performing multiplication is highly required. Compared to other conventional methods, Vedic mathematical methods, derived from ancient systems of computations, are computationally faster and easy to perform. This work concludes that a 16X16 multiplier and square based on Vedic algorithms is more efficient in performance than the array, Wallace tree and Booth multipliers. The performance parameters are timing delay and the area of the target device utilized in the design. The speed improvements are gained by parallelizing the generation of partial products with their concurrent summations. It is demonstrated that this design is quite efficient in terms of silicon area/speed. Such a design should enable substantial savings of resources in the FPGA when used for image/video processing applications. Thus, we have shown that the proposed design of the multiplier and square successfully implemented on FPGA. International Conference on Signal, Image and Video Processing (ICSIVP) 2012 313
REFERENCES [1] Chidgupkar, P. D. and Karad, M.T., The Implementation of Vedic Algorithms in Digital Signal Processing, Global Congress on Engineering Education, Vol. 8, No.2, 2004. [2] Thapliyal, H. and Arabnia H.R., A time-area-power Efficient Multiplier and Square Architecture Based on Ancient Indian Vedic Mathematics. [3] Singh, B., Kumar, L. and Rana, D.R., Design and Hardware Implementation of 8 bit * 8 bit Multiplication Algorithm. [4] Sjoholm, S. and Lindh, L., VHDL for Designers, Prentice- Hall PTR (1997). [5] Jagadguru Swami Sri Bharati Krisna Tirthaji Maharaja, Vedic Mathematics: Sixteen Simple Mathematical Formulae from the Veda. Delhi (1965). [6] VedicMaths.org (2004) http://www.vedicmaths.org [7] Deschamps, Jean-Pierrie and Sutter, D. Gustavo, Synthesis of Arithmetic Circuits, FPGA, ASIC and Embedded Systems, John Wiley & sons Inc. Publication (2006). [8] Perry, Douglas, VHDL Programming by Example, McGraw Hill Publication (2002). [9] De Mori,R., and Cardin, R.:" Iterative Parallel Multipliers Based On Multiplexers", Signal Processing,1984,6, pp,213-223. [10] De Mori,R., and Cardin, R.:" A Recursive Algorithm for Binary Multiplication and Its Implementation", ACM Trans. Comput. Syst., 1985, 3, (4), pp, 294-314. [11] Vishal Verma and Himanshu Thapliyal, High Speed Efficient N X N Bit Multiplier Based On Ancient Indian Vedic Mathematics,Proceedings of the 2003 International Conference on VLSI (VLSI03), Las Vegas Nevada, June 2003 [12] Himanshu Thapliyal and Vishal Verma, High Speed Efficient Signed/Unsigned N X N Bit Multiplier Based On Ancient Indian Vedic Mathematics proceedings of the 7th IEEE VLSI Design & Test Workshop, Bangalore, August 2003. [13] Beiu, Microprocessor and a digital signal processor including adder and multiplier circuits employing logic gates having discrete and weighted inputs, United States Patent, 6,516,331, February 4, 2003. [14] A.P. Nicholas, K.R Williams, J. Pickles, Application of Urdhava Sutra, Spiritual Study Group, Roorkee (India), 1984. [15] A.P. Nicholas, K.R Williams, J. Pickles, Lectures on Vedic Mathematics, Spiritual Study Group, Roorkee (India),1982. [16] Ait-Boudaoud, D., Ibrahim, M.K., and Hayes- Gill,B.R.: Novel Pipelined Serial/Parallel Multiplier", Electron. Lett., 1990,26,pp, 582-583. [17] Santoro, M.R., and Horowitz., M.: "Spim : A Pipelined 64 x64 -Bit Iterative Multiplier", IEEE J. Solid-State Circuits,1989,24,pp.487-493. [18] Ciminiera, L, and Valenzano, A.:" Low cost Serial Multiplier of High Speed Specialized Processors", IEE Proc. E. 1988, 135,(5), pp-259-265. International Conference on Signal, Image and Video Processing (ICSIVP) 2012 314