49 CHAPTER 5 IMPLEMENTATION OF MULTIPLIERS USING VEDIC MATHEMATICS 5.1 INTRODUCTION TO VHDL VHDL stands for VHSIC (Very High Speed Integrated Circuits) Hardware Description Language. The other widely used hardware description language is Verilog. Both are powerful languages that allow description and simulation of complex digital systems A hardware description language is inherently parallel, i.e. commands, which correspond to logic gates, are executed (computed) in parallel, as soon as a new input arrives. A HDL program mimics the behavior of a physical, usually digital system. It also allows incorporation of timing specifications (gate delays) as well as to describe a system as an interconnection of different components. VHDL allows one to describe a digital system at the structural or the behavioral level. The behavioral level can be further divided into two kinds of models: Data flow and Algorithmic. The dataflow representation describes how data moves through the system. This is typically done in terms of data flow between registers (Register Transfer level). The data flow model makes use of concurrent statements that are executed in parallel as soon as data arrives at the input. VHDL allows both concurrent and sequential signal assignments that will determine the manner in which they are executed.
50 5.2 VEDIC IMPLEMENTATION OF MULTIPLIERS The Carry Save Array and the Wallace Tree Structural multipliers and Booth Algorithmic multiplier were implemented first independently. Then the Dvandva Yoga - Urdhva Tiryakbhyam sutra of Vedic Mathematics was used to obtain the product terms in the multiplication modules of the Carry Save Array, Wallace Tree and Booth multipliers separately. The synthesis and FPGA implementation results were obtained using Xilinx navigator. Similarly in wide bit multipliers the Karatsuba Ofman algorithm was implemented first independently and then superimposed with the Urdhva Tiryakbhyam sutra of Vedic Mathematics. Both the simulation results as well as the implementation results are given in the following sections separately. 5.3 IMPLEMENTATION RESULTS Figures 5.1 to 5.8 show the simulation results of 8 bit and 16 bit karatsuba multiplication without and with Vedic Mathematics. The first two signals are the inputs multiplicand and multiplier and the third one is the final product. The intermediate signals indicate the partial products. The values 600 ns, 800 ns etc indicate the scale for x axis ie up to which the output is observed.
51 Figure 5.1 8 Bit Karatsuba Multiplication Figure 5.2 8-Bit Vedic Karatsuba Multiplication
52 Figure 5.3 16 Bit Karatsuba Multiplication Figure 5.4 16-Bit Vedic Karatsuba Multiplication
53 Figure 5.5 32-Bit Karatsuba Multiplication Figure 5.6 32-Bit Vedic Karatsuba Multiplication
Figure 5.7 64-Bit Karatsuba Multiplication 54
Figure 5.8 64-Bit Vedic Karatsuba Multiplication 55
56 5.4 OBSERVATIONS AND RESULTS The multipliers were implemented using the target device Virtex E XCV300e-bg432-8 speed grade 8 and the implementation results are presented here. Tables 5.1 to 5.3 give the power consumption (in mw) and delay (in ns) for the Carry Save, Wallace and Booth multipliers with and without Vedic architecture respectively. Delay and Power consumption of Karatsuba multiplier with and without Vedic multiplication is shown in Table 5.4. Figures 5.9 and 5.10 depict the results of table 5.4 graphically. Target Device: Virtex E XCV300e-bg432-8 Table 5.1 Comparison of Carry Save and Vedic Carry save multipliers Sl.No Type of multipliers 8 bit Delay (in ns) Power (in mw) 1 Carry-save Array 25.487 81.10 2 Vedic Carry Save Array 28.251 69.98 Table 5.2 Comparison of Wallace and Vedic Wallace algorithm Sl.No Type of multipliers 8 bit Delay (in ns) Power (in mw) 1 Wallace Tree 25.487 80.43 2 Vedic Wallace 29.949 67.00
57 Table 5.3 Comparison of Booth and Vedic Booth Algorithm Sl.No Type of multipliers 8 bit Delay (in ns) Power (in mw) 1 Booth 31.241 83.61 2 Vedic Booth 26.081 83.18 Table 5.4 Delay and Power consumption of conventional Karatsuba multipliers and Vedic Karatsuba multiplier S No Algorithm Delay (in ns) Power (in mw) 1 Karatsuba Algorithm I (8 bit) 24.963 ------- 2 Karatsuba Algorithm II(8 bit) 31.029 63.18 3 Vedic Karatsuba Algorithm(8 bit) 18.695 58.42 4 Karatsuba Algorithm I(16 bit) 39.766 ------- 5 Karatsuba Algorithm II(16 bit) 46.811 208.2 6 Vedic Karatsuba Algorithm(16 bit) 27.810 129.95 7 Karatsuba Algorithm II(32 bit) 82.834 433.53 8 Vedic Karatsuba Algorithm(32 bit) 49.864 216.72 9 Karatsuba Algorithm II(64 bit) 150.922 923.65 10 Vedic Karatsuba Algorithm(64 bit) 92.448 429.24 Note: The Conventional Karatsuba multipliers were implemented by two different coding techniques. The first one involves coding the 8 bit and 16 bit multipliers independently, which is the conventionally used method. The second method involves calling the smaller bit modules to build up the wide
58 bit modules. In addition, the Karatsuba Ofman algorithm was implemented using the Vedic sutra for 8 and 16 bit multiplication. Figure 5.9 Power comparison of Karatsuba Multipliers Figure 5.10 Delay comparison of Karatsuba Multiplier
59 5.5 INFERENCE When implemented using Vedic Mathematics, structural multipliers show reduction in power consumption while delay is not improved much. In fact, the delay of the Carry Save array and Wallace tree multiplier is increased when using Vedic Mathematics but the power consumption reduces considerably. In case of Booth algorithm, power consumption improves only very slightly but the delay reduces considerably. The 8 bit multiplier using conventional Karatsuba algorithm with the two techniques used, has a delay of 24.963 ns and 31.029 ns respectively. For a 16 bit Karatsuba multiplier, using the same techniques, the delay increases almost double fold i.e 39.766 ns and 46.811 ns respectively. The Karatsuba algorithm when implemented using Vedic Sutra, has a delay of 18.695 ns for 8 bit multiplier and has a delay of 27.810 ns, for 16 bit multiplication. Similarly it is evident that the power consumption of Vedic Karatsuba multiplier also reduces when compared with conventional Karatsuba multiplier. Also we can see that as the bit width increases, the increase in delay as well as power consumption is very much reduced, proving the scalability of the Vedic multiplier. This proves that this multiplication is highly suitable for wide bit multiplication such as in public key cryptosystems. The Vedic architecture used for implementing the Vedic multipliers can be used for any target device and is technology independent. The entire partial sub-product is obtained in one clock cycle since all the 4 x 4 modules operate in parallel. This removes the race and data dependency problem found in other multipliers.