Low Power and Area Efficient Implementation of B CD Adder on FPGA * Shambhavi Mishra#l, Gaurav Verma #M.Tech. Scholar, Department of Electronics & Communication, * Assistant Professor, Department of Electronics & Communication, Jaypee University, A- l O, Sector-6, Noida (U.P.), India. 1 shambhavi 1 O@gmail. com, gaurav.iitkg@gmail. com cost. Many architectures and algorithms have been proposed to date for BCD addition [] [ l O] but nobody has focused on the above issue. Power reduction can be attempted at all levels of design hierarchy-algorithm, architecture, logic and circuit levels [7]. In the following subsections, we have discussed some techniques to reduce the power consumption at the architectural level to make BCD more efficient in terms of area and power. Abstract- Decimal adders and multipliers are the basic building block for arithmetic and logical unit and barrel shifters in today's high end processors and controllers. In this paper, an efficient BCD adder is designed based on low power synthesis technique at the architectural level. There are different levels of abstraction at which the power can be minimized but the low power technique at the architectural level has more impact than that of circuit level approaches. Two different approaches have been discussed i.e. pipelining and parallelism, so as to minimize the power consumption at architectural level. The proposed designs are tested and implemented using VHDL and the Xilinx ISE 10.1 targeting Xilinx XC5VLX30-3 FPGA. The result shows the optimization of power, delays and the area for different designs and a comparison analysis is provided based on the existing designs in the literature. II. In electronic systems, BCD is an encoding for decimal numbers in which each digit is represented by its own binary sequence. It allows easy conversion to digits and results in faster calculations. When BCD numbers are added, each sum digit should be adjusted to skip the six unused codes. For instance, the addition of two decimal digits in BCD, together with a possible carry from a previous least significant pair of digits (assuming the maximum value for input digits) viz., 9 + 9 + 1 would result in 1 9. The equivalent binary sum will be in the range 0 to 1 9 represented in binary as 0000 to 1 00 1 1 and BCD as 0000 to 1 1 00 1 (the first 1 being carry and next four bits being BCD digit sum). For the binary sum equal to or less than 1 0 0 1 the corresponding BCD digit is correct. However when the binary sum exceeds 1 00 1, the result is invalid BCD digit. The addition of 6(0 1 1 0) to the binary sum converts it to the correct digit and also produces carry. Fig. l shows the block diagram of a 1 digit BCD adder based on the above methodology. Keywords- Low power, Pipelining, Parallelism, VHDL, BCD Adder. I. INTRODUCTION Addition operation has a significant role in digital arithmetic operations like multiplication and division. Optimization of adders' speed, power, and area is a challenging task. Compared to hardware speeds, the performance of existing decimal arithmetic software libraries is very poor. Software emulation is slower than a hardware implementation by 1 00 to l OOO times [ 1 ]. Currently, decimal arithmetic i s implemented using software while binary arithmetic is usually implemented by the hardware []. The input digits in binary are A3AA I AO and B3BB l B. S3 ' S ' S I ' SO ' are the outputs of the first stage 4 bit adder, to which correction bits 0 I l O(6) is added at the second stage to produce the BCD number S3 SS 1 S0 shown in equation ()-(5) along with carry output CN shown in equation( 1). The carry CN will be one for digits exceeding 9 or else it will be O. In most of the papers the prime focus has been given to increase the speed of the computation of the BCD Adder. Designers have proposed several enhancements to the basic BCD addition algorithm. Direct decimal addition [3 ], decimal speculative addition [4] [5] and conditional speculative decimal addition [6], are examples of such refmements. But the increase in achievable speed is only possible due to increased hardware which leads to high power consumption that keeps on increasing due to the development in VLSI Technology. Apart from speed increase in performance also increases power consumption due to which we have to develop a suitable packaging and cooling technique so that the heat from the processor can be taken out but it leads to increase in 978-1 -4799-1 607-8/1 3/$31.00 01 3 I E E E OVERVIEW OF BCD ADDITION CN = COUT + S3 ' S ' + S3 ' S I ' SO = BO SI 461 = B3 'B I + B3BB l (1) () (3 )
S= B3 'B + BB l (4) S3= B3B 'B l ' (5) improving performance, it can also be used to reduce power. Unfortunately, the savings in power come at the expense of performances or, more precisely, maximum operating frequency. This follows from the equation: fmax oc ( Vdd_Vt) Vdd ) =Vdd [1- ] Vdd (6) o Fig I : Block Diagram of BCD Adder III. PROPOSED BCD ADDER A. Power Efficiency:The proposed BCD adder in this paper is designed efficiently on architectural basis [8]. Architecture level refers to Register Transfer Level (RTL), where circuit is represented in terms of building blocks such as adders, multipliers, ROMs, register files etc. High level synthesis technique transforms a behavioural level specification to an RTL-level realization. It is envisaged that low power synthesis technique on the architectural level can have greater impact than that of gate- level approaches. Possible architectural approaches are parallelism and pipelining. The individual discussion and comparison is in following subsections. Fig : Parallel Realization of BCD Adder If threshold voltage is scaled by the same factor as the supply voltage, the maximum frequency of operation is roughly linearly dependent on the power supply voltage. Reducing the supply voltage forces the circuit to operate at lower frequency. In simple terms, if supply voltage is reduced by half, the power is reduced by one fourth and performance is lowered by half. The loss in performance can be compensated by parallel processing. This involves splitting the computation in two independent tasks running in parallel. This has the potential to reduce the power by half without reduction in the performance. Here the basic approach is to trade area for power while maintaining the same throughput. The reference architecture and all the parameters, such as power supply voltage, frequency of operation; power dissipation etc. of this architecture is referred by ref notation [8]. 1. Parallelism: Parallel processing is traditionally used for improvement of performance at the expense of larger chip area and higher power dissipation. Basic idea is to use multiple copies of hardware resources such as ALUs and processors to operate in parallel to provide higher performance. Instead of using parallel processing for 46
Table l : Synthesis Result from XPower Analyser Tool POWER SUMMARY I(Ma) Total Estimated Power Consumption Vccint 1. 80V: Vcco3 3 3.3 0V: Quiescent Vccint1.8V: Quiescent Vcco3 3 3.3 0V: 10 10 THERMAL SUMMARY 6 u C 5 v C 6 u C 3-33 Vc IW all the parameters, such as power supply voltage, frequency of operation; power dissipation etc. of this architecture is referred by ref notation [8]. Table : Estimated Power dissipation Frequency (MHz) Reference Voltage Switching Power (mw) 5V 00 90.5 5 1 00.5V (7) Where, Cerf is the total effective switching capacitance, which is the sum of products of the switching activities with the node capacitances. Capacitance has come out to be 1. 805 X 1 0-1 1 farad as seen from the design summary of the XPower Analyser software. Without reducing the clock frequency, the power dissipation cannot be reduced by reducing the supply voltage. However, same throughput (number of operations per unit time) can be maintained by the parallel architecture show in figure. Here the adder has been duplicated twice, but the input registers have been clocked at half the frequency of fref. This helps to reduce the supply voltage. With the same adder, the power supply can be reduced to about half the Vref13 ]. Because of duplication of the adder, the capacitance increases by a factor of two.. Pipelining: Instead of reducing the clock frequency, in pipelined approach the delay through the critical path of the functional unit is reduced such that the supply voltage can be reduced to minimize power. In this realization, instead of 4 bit addition, bit addition is performed in each stage. Therefore, the -bit adder will operate with a reduced power supply voltage of Vref/. It may be noted that in this realization the area penalty is much less than the parallel implementation leading to Cpipe = 1. 1 5.Cref. (9) According to the above said realizations (9) becomes However, because of extra routing to both the adders, the effective capacitance would be about. times of Cref. Therefore, the estimated power dissipation of the parallel implementation is- Ppipe= 0.8 Pref [ ] [ ].Pref :::::: 0.77.Pref 8 ( 1 0) It is evident that power reduction is very close to that of parallel implementation with an additional bonus of reduced area overhead. The estimated power dissipation of the pipelining implementation can be seen from the Table 3. Vref fref Ppar=..Cref. -- ' """ ' Ppar :::::: 5 18 7 18 7 Value Estimated Junction Temperature Ambient Temperature Case Temperature Theta J-A Range Pref = Cref V ref fref P(Mw) (8) The estimated power dissipation of the parallel implementation can be seen from the Table. 463
Table 3: Estimated Power dissipation Frequency (MHz) Reference Voltage Switching Power (mw) 00 5V 90.5 00.5V 5.7 this design is verified by carrying out simulation using Xilinx ISE 10.1 targeting Xilinx XC5VLX30-3 FPGA. The analysis is given in the Table 4. Table 4: Synthesis Results for Area and Delay Total delay 6. ns No. of logic elements used 4 Utilization % Total equivalent gate count for design 95 C. Results and Discussion In Table 5, we have compared our design techniques with the other designs propped in the literature. It is evident that power reduction is very close to that of parallel implementation with an additional bonus of reduced area overhead. Next we have compared our design with other designs in terms of area (no of logic elements) and delay by using synthesis results obtained from the synthesis tool, which shows that the proposed BCD adder taking comparable area but the delay has been reduced drastically as shown in Table 6. Table 5: Comparison of power dissipation with other designs B. Area Efficiency Fig 3: Pipelined Realization of BeD Adder In this approach [], the idea is to design a direct BCD digit adder using a nine bit input, five bit output combinational logic. The nine bit inputs are the two BCD input digits. A and B plus the decimal carry input Cin and the five bit outputs are the BCD digit of the decimal sum S plus the decimal carry out Cout. The combinational logic of this adder is constructed by extracting the Boolean expressions for the BCD addition result directly from the BCD input operands. The most significant bit is the decimal carry output generated from the addition operation, while the other bits are the BCD summation digit. Accordingly this is a correction-free technique, since the addition result is in a BCD form, and the need for correction is internally resolved through the Boolean expressions of the addition result. The truth table for all output logic functions is constructed for all possible combinations of the inputs. Since the inputs are nine the number of possible combinations is 9=51. Many of these combinations are not valid since a decimal digit is less than (10))0, while 4-bit number can take any value from 0 to (15))0. In this case when the output is not valid, the output is set to don't care. The truth table is then used to generate a VHDL description for the entire design. Functionality of BCD Adders Power(mW) Correction free BCD Adder [8] 173.06 Proposed BCD Adder using parallelism 5 Proposed BCD Adder using pipe lining 5.7 BCD Adder using CSLA [9] 175.39 BCD Adder using CSA [9] 169. Table 6: Comparison of area and delay with other design BCD Adder AREA (no. of logic Delay (ns) elements) Correction Free BCD 58 16.36 Adder[8] Proposed BCD Adder 4 6. IV. CONCLUSION In this paper, we have proposed a BCD Adder which is designed based on low power synthesis technique i.e. parallel processing and pipe lining at the architectural level. Comparison using synthesis results have been stated which shows that the proposed BCD adder outperformed other previous designs in terms of power consumption, area utilization and delay. 464
REFERENCES [6] A. Vazquez and E. Antelo, "Conditional speculative decimal addition," Nancy, France, 006, pp. 47-57. [ 1 ] M.F. Cowlishaw,"Decimal FAQ,", http :// www.hursley. ibm. com /decimal /decifaq l. html. [7] M.R. Stan, and W.P. Burleson, "Bus-invert Coding for Low- Power I/O," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.3, no. l, pp. 49-58, March 1 995 [] O.D. Al-Khaleel, N.H. Tulie, and K.M. Mhaidat, "FPGA implementation of Binary Coded Decimal Digit Adders and Multipliers" 8th International Symposium on Mechatronics and its Applications (I SMA), 0 1, pp. 1-4. [8] 0. Al-Khaleel, M. Al-Khaleel, Z. Al-Qudah, C. A. Papachristou, K. Mhaidat, and F. G. Wolff, "Fast Binary/Decimal Adder/Subtractor with a Novel Correction Free BCD Addition," 1 8th IEEE International Conference on Electronics, Circuits and Systems, pp. 455-459 01 1. [3] M.S. Schmookler and A. Weinberger., "High speed decimal addition," IEEE Transactions on Computers, vol. 0, pp. 86866, 1 97 1. [9] K.N. Vijeyakumar, V. Sumathy, A. Dinesh Babu, S. Elango, and S. Saravanakumar, "FPGA Implementation o f Low Power Hardware Efficient Flagged Binary Coded Decimal Adder", International Journal of Computer Application, vol. 46, no. 1 4, May 0 1. [4] H. Wetter, W. Bultmann, W. Haller, and A. Worner, "Binary and decimal adder unit," 00 1. [5] 1 Thompson, 1. Karra, and M.l Schulte, "A 64-bit decimal floating point adder," in Proc. of the IEEE Computer Society Annul Symposium on VLSI, 004, pp. 97-98 465