Low Power and Area Efficient Implementation of B CD Adder on FPGA

Similar documents
International Journal of Scientific & Engineering Research Volume 3, Issue 12, December ISSN

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

DESIGN OF BINARY MULTIPLIER USING ADDERS

Comparative Analysis of Various Adders using VHDL

Low Power Techniques for Digital System Design

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Energy Efficient and High Performance 64-bit Arithmetic Logic Unit using 28nm Technology

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

Low Power VLSI CMOS Design. An Image Processing Chip for RGB to HSI Conversion

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

Design and Implementation of High Speed Carry Select Adder

Low-Power Multipliers with Data Wordlength Reduction

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Design and Implementation of Complex Multiplier Using Compressors

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Leakage Power Reduction in 5-Bit Full Adder using Keeper & Footer Transistor

An Optimized Design for Parallel MAC based on Radix-4 MBA

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

FPGA Implementation of Low Power and High Speed Vedic Multiplier using Vedic Mathematics.

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

Design and Analysis of CMOS Based DADDA Multiplier

SQRT CSLA with Less Delay and Reduced Area Using FPGA

A Highly Efficient Carry Select Adder

Chapter 1 Introduction

II. LITERATURE REVIEW

An Efficent Real Time Analysis of Carry Select Adder

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

FPGA Implementation of Area-Delay and Power Efficient Carry Select Adder

Design and Comparative Analysis of Conventional Adders and Parallel Prefix Adders K. Madhavi 1, Kuppam N Chandrasekar 2

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

Implementation of Parallel MAC Unit in 8*8 Pre- Encoded NR4SD Multipliers

Design of Efficient 64 Bit Mac Unit Using Vedic Multiplier

Modified Design of High Speed Baugh Wooley Multiplier

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Design and Analysis of RNS Based FIR Filter Using Verilog Language

A New Configurable Full Adder For Low Power Applications

Airo International Research Journal March, 2016 Volume VII, ISSN:

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Reduced Area Carry Select Adder with Low Power Consumptions

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Investigation on Performance of high speed CMOS Full adder Circuits

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Design and Analysis of Improved Sparse Channel Adder with Optimization of Energy Delay

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

ADVANCES in NATURAL and APPLIED SCIENCES

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

A Novel Approach For Designing A Low Power Parallel Prefix Adders

VLSI IMPLEMENTATION OF AREA, DELAYANDPOWER EFFICIENT MULTISTAGE SQRT-CSLA ARCHITECTURE DESIGN

Implementation of High Performance Carry Save Adder Using Domino Logic

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

PV SYSTEM BASED FPGA: ANALYSIS OF POWER CONSUMPTION IN XILINX XPOWER TOOL

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

An Efficient Baugh-WooleyArchitecture forbothsigned & Unsigned Multiplication

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

International Journal of Modern Engineering and Research Technology

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

II. Previous Work. III. New 8T Adder Design

FPGA IMPLENTATION OF REVERSIBLE FLOATING POINT MULTIPLIER USING CSA

DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE

Optimum Analysis of ALU Processor by using UT Technique

By Dayadi Lakshmaiah, Dr. M. V. Subramanyam & Dr. K. Satya Prasad Jawaharlal Nehru Technological University, India

Review Paper on an Efficient Processing by Linear Convolution using Vedic Mathematics

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Design of Low power and Area Efficient 8-bit ALU using GDI Full Adder and Multiplexer

POWER OPTIMIZED DATAPATH UNITS OF HYBRID EMBEDDED CORE ARCHITECTURE USING CLOCK GATING TECHNIQUE

Design of A Vedic Multiplier Using Area Efficient Bec Adder

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

Design and Simulation of 16x16 Hybrid Multiplier based on Modified Booth algorithm and Wallace tree Structure

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

ASIC Implementation of High Speed Area Efficient Arithmetic Unit using GDI based Vedic Multiplier

Power Efficient Weighted Modulo 2 n +1 Adder

A Survey on Power Reduction Techniques in FIR Filter

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

Domino CMOS Implementation of Power Optimized and High Performance CLA adder

A Novel Low Power, High Speed 14 Transistor CMOS Full Adder Cell with 50% Improvement in Threshold Loss Problem

REALIZATION OF FPGA BASED Q-FORMAT ARITHMETIC LOGIC UNIT FOR POWER ELECTRONIC CONVERTER APPLICATIONS

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

A HIGH SPEED DYNAMIC RIPPLE CARRY ADDER

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Pipelined Linear Convolution Based On Hierarchical Overlay UT Multiplier

EFFICIENT DESIGN AND IMPLEMENTATION OF ADDERS WITH REVERSIBLE LOGIC

A Compact Design of 8X8 Bit Vedic Multiplier Using Reversible Logic Based Compressor

High Speed Non Linear Carry Select Adder Used In Wallace Tree Multiplier and In Radix-4 Booth Recorded Multiplier

Transcription:

Low Power and Area Efficient Implementation of B CD Adder on FPGA * Shambhavi Mishra#l, Gaurav Verma #M.Tech. Scholar, Department of Electronics & Communication, * Assistant Professor, Department of Electronics & Communication, Jaypee University, A- l O, Sector-6, Noida (U.P.), India. 1 shambhavi 1 O@gmail. com, gaurav.iitkg@gmail. com cost. Many architectures and algorithms have been proposed to date for BCD addition [] [ l O] but nobody has focused on the above issue. Power reduction can be attempted at all levels of design hierarchy-algorithm, architecture, logic and circuit levels [7]. In the following subsections, we have discussed some techniques to reduce the power consumption at the architectural level to make BCD more efficient in terms of area and power. Abstract- Decimal adders and multipliers are the basic building block for arithmetic and logical unit and barrel shifters in today's high end processors and controllers. In this paper, an efficient BCD adder is designed based on low power synthesis technique at the architectural level. There are different levels of abstraction at which the power can be minimized but the low power technique at the architectural level has more impact than that of circuit level approaches. Two different approaches have been discussed i.e. pipelining and parallelism, so as to minimize the power consumption at architectural level. The proposed designs are tested and implemented using VHDL and the Xilinx ISE 10.1 targeting Xilinx XC5VLX30-3 FPGA. The result shows the optimization of power, delays and the area for different designs and a comparison analysis is provided based on the existing designs in the literature. II. In electronic systems, BCD is an encoding for decimal numbers in which each digit is represented by its own binary sequence. It allows easy conversion to digits and results in faster calculations. When BCD numbers are added, each sum digit should be adjusted to skip the six unused codes. For instance, the addition of two decimal digits in BCD, together with a possible carry from a previous least significant pair of digits (assuming the maximum value for input digits) viz., 9 + 9 + 1 would result in 1 9. The equivalent binary sum will be in the range 0 to 1 9 represented in binary as 0000 to 1 00 1 1 and BCD as 0000 to 1 1 00 1 (the first 1 being carry and next four bits being BCD digit sum). For the binary sum equal to or less than 1 0 0 1 the corresponding BCD digit is correct. However when the binary sum exceeds 1 00 1, the result is invalid BCD digit. The addition of 6(0 1 1 0) to the binary sum converts it to the correct digit and also produces carry. Fig. l shows the block diagram of a 1 digit BCD adder based on the above methodology. Keywords- Low power, Pipelining, Parallelism, VHDL, BCD Adder. I. INTRODUCTION Addition operation has a significant role in digital arithmetic operations like multiplication and division. Optimization of adders' speed, power, and area is a challenging task. Compared to hardware speeds, the performance of existing decimal arithmetic software libraries is very poor. Software emulation is slower than a hardware implementation by 1 00 to l OOO times [ 1 ]. Currently, decimal arithmetic i s implemented using software while binary arithmetic is usually implemented by the hardware []. The input digits in binary are A3AA I AO and B3BB l B. S3 ' S ' S I ' SO ' are the outputs of the first stage 4 bit adder, to which correction bits 0 I l O(6) is added at the second stage to produce the BCD number S3 SS 1 S0 shown in equation ()-(5) along with carry output CN shown in equation( 1). The carry CN will be one for digits exceeding 9 or else it will be O. In most of the papers the prime focus has been given to increase the speed of the computation of the BCD Adder. Designers have proposed several enhancements to the basic BCD addition algorithm. Direct decimal addition [3 ], decimal speculative addition [4] [5] and conditional speculative decimal addition [6], are examples of such refmements. But the increase in achievable speed is only possible due to increased hardware which leads to high power consumption that keeps on increasing due to the development in VLSI Technology. Apart from speed increase in performance also increases power consumption due to which we have to develop a suitable packaging and cooling technique so that the heat from the processor can be taken out but it leads to increase in 978-1 -4799-1 607-8/1 3/$31.00 01 3 I E E E OVERVIEW OF BCD ADDITION CN = COUT + S3 ' S ' + S3 ' S I ' SO = BO SI 461 = B3 'B I + B3BB l (1) () (3 )

S= B3 'B + BB l (4) S3= B3B 'B l ' (5) improving performance, it can also be used to reduce power. Unfortunately, the savings in power come at the expense of performances or, more precisely, maximum operating frequency. This follows from the equation: fmax oc ( Vdd_Vt) Vdd ) =Vdd [1- ] Vdd (6) o Fig I : Block Diagram of BCD Adder III. PROPOSED BCD ADDER A. Power Efficiency:The proposed BCD adder in this paper is designed efficiently on architectural basis [8]. Architecture level refers to Register Transfer Level (RTL), where circuit is represented in terms of building blocks such as adders, multipliers, ROMs, register files etc. High level synthesis technique transforms a behavioural level specification to an RTL-level realization. It is envisaged that low power synthesis technique on the architectural level can have greater impact than that of gate- level approaches. Possible architectural approaches are parallelism and pipelining. The individual discussion and comparison is in following subsections. Fig : Parallel Realization of BCD Adder If threshold voltage is scaled by the same factor as the supply voltage, the maximum frequency of operation is roughly linearly dependent on the power supply voltage. Reducing the supply voltage forces the circuit to operate at lower frequency. In simple terms, if supply voltage is reduced by half, the power is reduced by one fourth and performance is lowered by half. The loss in performance can be compensated by parallel processing. This involves splitting the computation in two independent tasks running in parallel. This has the potential to reduce the power by half without reduction in the performance. Here the basic approach is to trade area for power while maintaining the same throughput. The reference architecture and all the parameters, such as power supply voltage, frequency of operation; power dissipation etc. of this architecture is referred by ref notation [8]. 1. Parallelism: Parallel processing is traditionally used for improvement of performance at the expense of larger chip area and higher power dissipation. Basic idea is to use multiple copies of hardware resources such as ALUs and processors to operate in parallel to provide higher performance. Instead of using parallel processing for 46

Table l : Synthesis Result from XPower Analyser Tool POWER SUMMARY I(Ma) Total Estimated Power Consumption Vccint 1. 80V: Vcco3 3 3.3 0V: Quiescent Vccint1.8V: Quiescent Vcco3 3 3.3 0V: 10 10 THERMAL SUMMARY 6 u C 5 v C 6 u C 3-33 Vc IW all the parameters, such as power supply voltage, frequency of operation; power dissipation etc. of this architecture is referred by ref notation [8]. Table : Estimated Power dissipation Frequency (MHz) Reference Voltage Switching Power (mw) 5V 00 90.5 5 1 00.5V (7) Where, Cerf is the total effective switching capacitance, which is the sum of products of the switching activities with the node capacitances. Capacitance has come out to be 1. 805 X 1 0-1 1 farad as seen from the design summary of the XPower Analyser software. Without reducing the clock frequency, the power dissipation cannot be reduced by reducing the supply voltage. However, same throughput (number of operations per unit time) can be maintained by the parallel architecture show in figure. Here the adder has been duplicated twice, but the input registers have been clocked at half the frequency of fref. This helps to reduce the supply voltage. With the same adder, the power supply can be reduced to about half the Vref13 ]. Because of duplication of the adder, the capacitance increases by a factor of two.. Pipelining: Instead of reducing the clock frequency, in pipelined approach the delay through the critical path of the functional unit is reduced such that the supply voltage can be reduced to minimize power. In this realization, instead of 4 bit addition, bit addition is performed in each stage. Therefore, the -bit adder will operate with a reduced power supply voltage of Vref/. It may be noted that in this realization the area penalty is much less than the parallel implementation leading to Cpipe = 1. 1 5.Cref. (9) According to the above said realizations (9) becomes However, because of extra routing to both the adders, the effective capacitance would be about. times of Cref. Therefore, the estimated power dissipation of the parallel implementation is- Ppipe= 0.8 Pref [ ] [ ].Pref :::::: 0.77.Pref 8 ( 1 0) It is evident that power reduction is very close to that of parallel implementation with an additional bonus of reduced area overhead. The estimated power dissipation of the pipelining implementation can be seen from the Table 3. Vref fref Ppar=..Cref. -- ' """ ' Ppar :::::: 5 18 7 18 7 Value Estimated Junction Temperature Ambient Temperature Case Temperature Theta J-A Range Pref = Cref V ref fref P(Mw) (8) The estimated power dissipation of the parallel implementation can be seen from the Table. 463

Table 3: Estimated Power dissipation Frequency (MHz) Reference Voltage Switching Power (mw) 00 5V 90.5 00.5V 5.7 this design is verified by carrying out simulation using Xilinx ISE 10.1 targeting Xilinx XC5VLX30-3 FPGA. The analysis is given in the Table 4. Table 4: Synthesis Results for Area and Delay Total delay 6. ns No. of logic elements used 4 Utilization % Total equivalent gate count for design 95 C. Results and Discussion In Table 5, we have compared our design techniques with the other designs propped in the literature. It is evident that power reduction is very close to that of parallel implementation with an additional bonus of reduced area overhead. Next we have compared our design with other designs in terms of area (no of logic elements) and delay by using synthesis results obtained from the synthesis tool, which shows that the proposed BCD adder taking comparable area but the delay has been reduced drastically as shown in Table 6. Table 5: Comparison of power dissipation with other designs B. Area Efficiency Fig 3: Pipelined Realization of BeD Adder In this approach [], the idea is to design a direct BCD digit adder using a nine bit input, five bit output combinational logic. The nine bit inputs are the two BCD input digits. A and B plus the decimal carry input Cin and the five bit outputs are the BCD digit of the decimal sum S plus the decimal carry out Cout. The combinational logic of this adder is constructed by extracting the Boolean expressions for the BCD addition result directly from the BCD input operands. The most significant bit is the decimal carry output generated from the addition operation, while the other bits are the BCD summation digit. Accordingly this is a correction-free technique, since the addition result is in a BCD form, and the need for correction is internally resolved through the Boolean expressions of the addition result. The truth table for all output logic functions is constructed for all possible combinations of the inputs. Since the inputs are nine the number of possible combinations is 9=51. Many of these combinations are not valid since a decimal digit is less than (10))0, while 4-bit number can take any value from 0 to (15))0. In this case when the output is not valid, the output is set to don't care. The truth table is then used to generate a VHDL description for the entire design. Functionality of BCD Adders Power(mW) Correction free BCD Adder [8] 173.06 Proposed BCD Adder using parallelism 5 Proposed BCD Adder using pipe lining 5.7 BCD Adder using CSLA [9] 175.39 BCD Adder using CSA [9] 169. Table 6: Comparison of area and delay with other design BCD Adder AREA (no. of logic Delay (ns) elements) Correction Free BCD 58 16.36 Adder[8] Proposed BCD Adder 4 6. IV. CONCLUSION In this paper, we have proposed a BCD Adder which is designed based on low power synthesis technique i.e. parallel processing and pipe lining at the architectural level. Comparison using synthesis results have been stated which shows that the proposed BCD adder outperformed other previous designs in terms of power consumption, area utilization and delay. 464

REFERENCES [6] A. Vazquez and E. Antelo, "Conditional speculative decimal addition," Nancy, France, 006, pp. 47-57. [ 1 ] M.F. Cowlishaw,"Decimal FAQ,", http :// www.hursley. ibm. com /decimal /decifaq l. html. [7] M.R. Stan, and W.P. Burleson, "Bus-invert Coding for Low- Power I/O," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.3, no. l, pp. 49-58, March 1 995 [] O.D. Al-Khaleel, N.H. Tulie, and K.M. Mhaidat, "FPGA implementation of Binary Coded Decimal Digit Adders and Multipliers" 8th International Symposium on Mechatronics and its Applications (I SMA), 0 1, pp. 1-4. [8] 0. Al-Khaleel, M. Al-Khaleel, Z. Al-Qudah, C. A. Papachristou, K. Mhaidat, and F. G. Wolff, "Fast Binary/Decimal Adder/Subtractor with a Novel Correction Free BCD Addition," 1 8th IEEE International Conference on Electronics, Circuits and Systems, pp. 455-459 01 1. [3] M.S. Schmookler and A. Weinberger., "High speed decimal addition," IEEE Transactions on Computers, vol. 0, pp. 86866, 1 97 1. [9] K.N. Vijeyakumar, V. Sumathy, A. Dinesh Babu, S. Elango, and S. Saravanakumar, "FPGA Implementation o f Low Power Hardware Efficient Flagged Binary Coded Decimal Adder", International Journal of Computer Application, vol. 46, no. 1 4, May 0 1. [4] H. Wetter, W. Bultmann, W. Haller, and A. Worner, "Binary and decimal adder unit," 00 1. [5] 1 Thompson, 1. Karra, and M.l Schulte, "A 64-bit decimal floating point adder," in Proc. of the IEEE Computer Society Annul Symposium on VLSI, 004, pp. 97-98 465