<Explanation of Improved the Quality of ALU And Ten Different Types of Designs for Decreasing Power Dissipation>

Similar documents
The Metrics and Designs of an Arithmetic Logic Function over

Energy, Time, and Space Complexity Analysis of ALU Designs Spanning from 2000 to the Present

Analyzing Metrics of ALU Designs Traversing from Years 2002 to 2015

The Configurations and Implementations of Different Adders and Multipliers in ALUs throughout the Past Decade

ALU Floating Point Design: An Exploration of Advancement

A Scientific Insight to Exemplary ALU s, Floating Point Designs, and Effective Processing Units

An Analysis of Full Adders and Floating Point Units: Optimization using beyond CMOS Technology

A Need for Speed with Reduced Power: An ideological look at how ALUs have Improved Over Time

A Study of The Advancement of CMOS ALU & Full Adder Circuit Design For Modern Design

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Parallel Prefix Han-Carlson Adder

A CASE STUDY OF CARRY SKIP ADDER AND DESIGN OF FEED-FORWARD MECHANISM TO IMPROVE THE SPEED OF CARRY CHAIN

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

Energy Efficient Memory Design using Low Voltage Complementary Metal Oxide Semiconductor on 28nm FPGA

Webpage: Volume 3, Issue V, May 2015 ISSN

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Low-Power Multipliers with Data Wordlength Reduction

To appear in IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, San Francisco, February 2002.

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Study and Analysis of Full Adder in Different Sub-Micron Technologies with an Area Efficient Layout of 4-Bit Ripple Carry Adder

Energy Efficient and High Performance 64-bit Arithmetic Logic Unit using 28nm Technology

On Built-In Self-Test for Adders

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

DESIGN OF HIGH SPEED AND ENERGY EFFICIENT CARRY SKIP ADDER

II. Previous Work. III. New 8T Adder Design

Investigation on Performance of high speed CMOS Full adder Circuits

Research Article Design of a Novel Optimized MAC Unit using Modified Fault Tolerant Vedic Multiplier

Design of Delay-Power Efficient Carry Select Adder using 3-T XOR Gate

Analysis of Parallel Prefix Adders

By Dayadi Lakshmaiah, Dr. M. V. Subramanyam & Dr. K. Satya Prasad Jawaharlal Nehru Technological University, India

Performance Analysis of High Speed CMOS Full Adder Circuits For Embedded System

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Digital Integrated CircuitDesign

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Designs of Area and Power Efficient Carry Select Adders:A Review

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

FPGA IMPLENTATION OF REVERSIBLE FLOATING POINT MULTIPLIER USING CSA

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

FPGA IMPLEMENATION OF HIGH SPEED AND LOW POWER CARRY SAVE ADDER

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

Modified Design of High Speed Baugh Wooley Multiplier

AREA-EFFICIENCY AND POWER-DELAY PRODUCT MINIMIZATION IN 64-BIT CARRY SELECT ADDER Gurpreet kaur 1, Loveleen Kaur 2,Navdeep Kaur 3 1,3

Leakage Power Reduction in 5-Bit Full Adder using Keeper & Footer Transistor

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI)

An Area Efficient and High Speed Reversible Multiplier Using NS Gate

Design and Implementation of Complex Multiplier Using Compressors

A Comparative Study on Direct form -1, Broadcast and Fine grain structure of FIR digital filter

Power Efficient Optimized Arithmetic and Logic Unit Design on FPGA

SPIRO SOLUTIONS PVT LTD

A Novel Approach For Designing A Low Power Parallel Prefix Adders

A COMPARATIVE ANALYSIS OF AN ULTRA-LOW VOLTAGE 1-BIT FULL SUBTRACTOR DESIGNED IN BOTH DIGITAL AND ANALOG ENVIRONMENTS

DESIGN AND ANALYSIS OF VEDIC MULTIPLIER USING MICROWIND

A Low Power and Area Efficient Full Adder Design Using GDI Multiplexer

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

Computer Arithmetic (2)

Unit level 4 Credit value 15. Introduction. Learning Outcomes

Design of an Energy Efficient 4-2 Compressor

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

ISSN Vol.07,Issue.08, July-2015, Pages:

Anitha R 1, Alekhya Nelapati 2, Lincy Jesima W 3, V. Bagyaveereswaran 4, IEEE member, VIT University, Vellore

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Low Power 8-Bit ALU Design Using Full Adder and Multiplexer

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

Design & Analysis of Low Power Full Adder

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

Comparator Design Analysis using Efficient Low Power Full Adder Meena Aggarwal 1, Rajesh Mehra 2 1 ME student (ECE), 2 Associate Professor

Improved Performance and Simplistic Design of CSLA with Optimised Blocks

Low-Power CMOS VLSI Design

OPTIMIZATION OF LOW POWER USING FIR FILTER

Reduced Area Carry Select Adder with Low Power Consumptions

High Performance Low-Power Signed Multiplier

A Survey on Power Reduction Techniques in FIR Filter

Design and Analysis of CMOS Based DADDA Multiplier

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Comparison among Different Adders

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Low Power VLSI Design of a modified Brent Kung adder based Multiply Accumulate Unit for Reverb Engines

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

Performance Comparison of High-Speed Adders Using 180nm Technology

MACGDI: Low Power MAC Based Filter Bank Using GDI Logic for Hearing Aid Applications

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

International Journal of Modern Engineering and Research Technology

DESIGN OF LOW POWER MULTIPLIERS

Implementation of Low Power High Speed Full Adder Using GDI Mux

Design of Low power and Area Efficient 8-bit ALU using GDI Full Adder and Multiplexer

An Area Efficient Decomposed Approximate Multiplier for DCT Applications

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

An energy efficient full adder cell for low voltage

Transcription:

<Explanation of Improved the Quality of ALU And Ten Different Types of Designs f Decreasing Power Dissipation> Jihang Li Department of Electrical and Computer Engineering University of Central Flida Orlando, FL 32816-2362 Abstract The goal of this paper is explaining what is ALU and how can we improve the quality of ALU. The topics are including the basic explanation of ALU and one bit Full Adder, different types of the adders and me imptant the requirements to increase the quality of ALU. There are several key requirements f a better ALU are width of data bus, ITRS technology node, execution time and energy consumption. Meover, this paper will also discuss about ten different ALU designs. Most of designs are design f lower energy consumption with a different length of node. F instance, design number 10 in the table; the paper is about a 32-bit ALU with a 180 nm CMOS technologies, the goal of it is decrease the power dissipation. Keywds CMOS technology node, Adder, Multiplier, Floating point, Data path width, operand, Model of chip, execution time, power dissipation, enery consumption. I. INTRODUCTION Arithmetic logic unit (ALU), it is the central processing unit and execution unit inside of the CPU. ALU is the ce part of all the central processing, and it is fmed by logical units that arithmetic by And gate and Or gate. The main function is using the binary code to calculate, such as addition, subtraction. All of the operations are coming from control unit. Basically, today, all CPU architectures are using the fm of binary code to represent. ALU is a structure by using integer arithmetic. To process the calculation, it needs circuits that inside the chip to achieve it. In another wds, ALU is the digital circuitry that is dedicated to perfm arithmetic and logical operations. ALU is the main part of the central process; even the smallest microprocess needs the counter function from ALU. Early computers could use different digital systems to operate the calculations, including anti-code, symbolic code and so on. Now, most of processs are using binary code; it is because it can simplify the operations of additions and subtractions. The vast majity of computer instructions are executed by the ALU. It pulls the data from registers, and then operates the data and sted in the output register inside of ALU. F example, add two numbers 3 and 4 together, befe the addition, the operand 3 is set on to accumulat, and 3 is in the register. When it begins to process, ALU add two numbers together become 7 and sets it back to accumulat and replace the iginal number 3. Other components are responsible f the transferring data between registers and memy. The control unit controls the ALU through the control circuits to give out the instructions. The basic two arithmetic operations are addition and subtraction. Also, the logic operations such as AND, OR, NOR and XOR. ALU are able to operate two inputs f addition, subtraction and so on. The design of ALU is based on Full Adders. One bit Full Adder is a combination of two XOR gates, two AND gates and one OR gate. It has two inputs A and B with a carry in C. One output S with a carry out C. The function f output S=A B Ci, and the Cout=A* B+Ci*(A B). There are few common adders can be used inside of the ALU, such as Half Adder, Ripple-carry Adder, Carry-save Adder and Lookahead carry unit. With Carry Select Adder design will have the fastest of speed but me gates are required. Ripple-carry Adder will require least gates, but the speed is slower. Truth table of one bit Full Adder: Ci A B S Cout 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 The strength of ALU is depends on the numbers of operations and the operating speed; meover, it evaluates the strength of computer. The basic operation is addition: a number adds with zero is simply pass this number. A number adds with a non-zero number is the same as a number subtracts with a non-zero number. The subtraction of two numbers can treat as compare the size of two numbers. Of course, the multiplication and division operations will have higher cost, and they are me complicated. Multiplication is operating based on the addition operation. F instance, add a number with number times of another number; multiply 7 by 7 is the same as add 7 to 7 by 7 times. Same thing f the division, the division operation can be processed based on subtraction operation. Therefe, to operate multiplication and division, they are slower than addition and subtraction. Multiplication and division can also operate by using shifting. Shift left is the same as multiplication, and shift right is f division. Page 1 of 5

ALU has the ability of privilege to directly access the control unit, memy and input-output. Therefe, it will increase the speed of the operations. The input-output is processing by using busses. The most common ALU has two inputs and one output. Two input operands are receiving the data from users, and it can be 1 bit, 4 bits, 64 bits and so on. Input command contains an instruction wd, which is machine instruction wd. The output is the result of operations. The goal of the ALU is to deal with inputs data from the user. A better ALU will not lack of quality of ITRS technology node, and others are the width of data bus, execution time, dissipation of power, and meover the energy consumption of processs. Data bus is a group of lines that transmit the infmation from one to another one. The wider the data bus, me infmation can transfer into ALU. The width of data bus can be 1 bit, 32 bits, 128 bits. However, the wider data bus, then the higher cost of implement. ITRS technology node is showing the length of the transist gate inside of ALU, and the smallest size of ITR technology node, the me that it can have. The length of node is decreased every two year. F instance, the length of semiconduct device fabrication node is changed from 10 um in 1971 to 14 nm in 2014. Next, the execution time is testing the speed of ALU. There are several main requirements to increase the execution time of ALU. Firstly, using multiplication and division instead of addition and subtraction will save me time. Secondly, increase the width of data bus will increase the amount of the bits that can be sending to ALU. Lastly, with a Carry Select Adder design will increase the speed of the ALU. The power dissipation is also testing the quality of ALU. The length of ITRS technology node can affect the power dissipation. Meover, the different Adder can affect the power dissipation. F instance, me gates are required in the ALU, the larger power dissipation will have. Energy consumption of processs depends on power dissipation and execution time due to the fmula E=p*t. There are several of different Arithmetic s. The one can process on all of the bits at the same time is called parallel arithmetic unit. The one can only process a bit at a time is called serial number arithmetic. Some can process 4 bits 8 bits at a time. Of course, the me bits that process at the same time the fastest speed will approach. Meover, the better ability of the ALU the better CPU is. Next section is about the ten different designs with different node and data path width and the time f operations. II. LITERATURE REVIEW The first design in the table is about a design of Floatingpoint Unit f ITRS Node with 45 nm and 15 nm. [1] The data path width is 32 bits; the operation of this design is using IEEE-754 Single Precision. The goal of this design is to analysis the power consumption of the nodes. The power consumption is calculated by total dynamic count, static count and the quantity of different size of gates. The power consumption of 45 nm ITRS node is 2.048mW and.6340mw f 15 nm. From this result, we can conclude that smaller the length of ITRS node the least power dissipation. Compare with second design Ultra-area-efficient fault-tolerant QCA full adder, the data path width is least than the first design, and it is using the Ultra-area-efficient fault-tolerant QCA full adder with 18 nm^2 of cell area, the result of this design has lower power dissipation. [2] The conclusion of the second design is that with the different testing of area of chips and small size of the operand, which decrease the power dissipation. With a different multiplier and size of operands will also affect the power dissipation and energy consumption. F instance, in design [3] and [5], design [3] used Energy- Efficient Multiplier with 128 bits operands, and the model of chip used is Virtex 6 FPGA device (XC6VLX550t). [3] The energy consumption of this design is 166.63nJ. Design [5] used 9 bits f operand and with a DSP48 multipliers, and the model of Chips used are FPGA, PRM, DCT and HWICAP. The result of design [5] also decreases the power dissipation. The power dissipation of PRM is 0.023 mw, 0.061 mw f HWICAP, 0.081 mw f DCT. Design [4] used Fine-Grained Pipelining Adders and Multiplier with 16 bits of operands. The result is very successful with high throughput and low power dissipation. With 0.24um static CMOS in this design, the power dissipation saves around 62.5 % of iginal design. Same as design [6], with a multiplier power dissipation also decreases. Design [6] used Carry Select Adder and Barrel Shift Rotat with 16 bits of operands; and the ITRS node is 28nm. The result of low power is testing by using Verilog HDL. The simulation proves the decreased of the execution time. With the results of these two designs, we can conclude that with a multiplier, the execution time is decreased; therefe the energy consumption is decreased. However, the costs of these two designs are higher than other designs. Design [7], [8], [9] and [10] are only used Adder to decrease the power dissipation. Design [7] used Carry Out Adders and Carry In Adders with 8 bits of operands. With number of clocks, logic gates and signals at different frequencies, the result of energy consumption of this design with 90 nm Spartan-3 is low. To compare with design [8] and design [9], the result of design [9] is me successful than design [8]. The design [8] used Sparse-tree semi-dynamic Adder and 32b radix-2 sparse-tree Adder with an operand of 64 bits. The power dissipation of 90 nm of CMOS is 300mW. Design [9] used ELM Adder with 8 bits of operands. The power dissipation of 40 nm of Virtex-6 FPGA is 88mW. By comparing with these two designs, design [9] has lower power dissipation. It makes sense by comparing the length of nodes that design [8] and [9] are used. Last design [10] used 32-bit Adder with 32 bits of operand. With the range of length of CMOS from 180 nm to 65 nm, the power dissipation also decreased as length of ITRS node decreased.

III. DATA ANALYSIS Metrics covered by various papers, which are suitable f plotting: Data bus width (bits) vs. Year ITRS technology node (nm) vs. Year Execution time per ALU Floating Point Unit operation (nsec) vs. Year Power Energy vs. Year From the graph above, we can see not every design is using the ITRS technology node. Design [2], [3] and [5] are not including in the graph because these design doesn t have ITRS node. The design in the graph in der is [4], [10], [8], [7], [9], [6] and [1]. We can conclude that the ITRS node is decreasing every few years. The design in the graph in der is [8], [7], [5], [3]. The dots in 2015-2017 are treat as zero, since those design does not have execution time of ALU. Although, some of design does not have the execution time, we can see it from table by comparing the type of operations used in the design. From the graph, we can conclude execution time is increasing every few years. From the graph above, we can see two three columns are together because the year of the design are the same; however, the der is using month if the paper has it. The design in the graph in der is [4], [10], [8], [7], [9], [5], [3], [2], [6] and [1]. Since, design [2] is using 1 bit of operand, so it is hard to see it in the graph. We can conclude that data bus width f every few years are increase. It means that every design is not using the closing length of data bus; every design requires its own length of data bus. The design in the graph in der is [8], [9], [1]. Other designs do not have power dissipation of ITRS node. From the graph above, we can conclude that not every design has power dissipation because those designs used percentage to represent the power dissipations. IV. CONCLUSION In conclusion, from these ten designs, we can conclude that the data path width, different type of adders, types of ITRS Technology node and model of chip used are the key to

decrease the power dissipation and energy consumption. F data path width, larger it is the fastest that ALU run; and it will decrease the power dissipation. Same as using the Multiplier instead of Adder, it will increase the speed of ALU. Meover, as the size of ITRS node decreases, the me amounts of nodes that can use in the ALU, and then it increase the speed of the ALU. However, the higher implements will cost. From all of these designs, I think design [3] has higher data bus of operand, and it has multiplier that will increase the speed of calculation, and the energy consumption is 166.63nJ, which is very small. Follow by that is design [8] has 64 bits of operand with a size 90nm of CMOS, which will decrease the space inside of ALU. Last, design [1] also has very low power dissipation with small size of ITRS node; meover, the data path width is big. REFERENCES [1] S. Salehi, and R. F. DeMara, "Energy and Area Analysis of a Floating- Point Unit in 15nm CMOS Process Technology," in Proceedings of IEEE SoutheastCon 2015 (SECon-2015), Ft Lauderdale, FL, April 9-12, 2015. [2] A. Roohi, R. F. DeMara, and N. Khoshavi, "Design and Evaluation of an Ultra-Area-Efficient Fault-Tolerant QCA Full Adder," Microelectronics Journal, Vol. 46, No. 6, pp. 531-542., June 2015, [3] M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, Energy-Efficient Multiplier-Less Discrete Convolver through Probabilistic Domain Transfmation, in Proceedings of 22nd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA-14), pp. 185-188, Monterey, Califnia, USA, February 27-28, 2014. [4] J. Di, J. S. Yuan, and R. DeMara, "High Throughput Power-aware FIR Filter Design based on Fine-grain Pipeline Multipliers and Adders," in Proceedings of the 2003 IEEE Annual Symposium on VLSI (ISVLSI- 03), pp. 260-261, Tampa, Flida, U.S.A., February 20-21, 2003. [5] N. Imran, J. Lee and R. F. DeMara, "Fault Demotion Using Reconfigurable Slack (FaDReS)," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.21, no.7, pp.1364-1368, July 2013. [6] Trivedi, Priyanka; Tripathi, Rajan Prasad, "Design & analysis of 16 bit RISC process using low power pipelining," Computing, Communication & Automation (ICCCA), 2015 International Conference on, vol., no., pp.1294,1297, 15-16 May 2015. [7] Pandey, B.; Yadav, J.; Pattanaik, M.; Rajia, N., "Clock gating based energy efficient ALU design and implementation on FPGA," Energy Efficient Technologies f Sustainability (ICEETS), 2013 International Conference on, vol., no., pp.93,97, 10-12 April 2013. [8] Mathew, S.K.; Anders, M.A.; Bloechel, B.; Trang Nguyen; Krishnamurthy, R.K.; Bkar, S., "A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS," Solid-State Circuits, IEEE Journal of, vol.40, no.1, pp.44,51, Jan. 2005. [9] Pandey, B.; Yadav, J.; Singh, Y.K.; Kumar, R.; Patel, S., "Energy efficient design and implementation of ALU on 40nm FPGA," Energy Efficient Technologies f Sustainability (ICEETS), 2013 International Conference on, vol., no., pp.45,50, 10-12 April 2013. [10] Bhaskar Chatterjee, Manoj Sachdev, and Ram Krishnamurthy, A CPLbased dual supply 32-bit ALU f sub 180nm CMOS technologies, In Proceedings of the international symposium on power electronics and design (ISLPED '04). ACM, pp. 248-251, New Yk, NY, USA, 2004.

TABLE I. <TEN DIFFERENT TYPES OF DESIGNS FOR DEACREASING THE POWER DISSIPATION.> ALU Floating Point Architecture Name Datapath width (bits) #bits in operands Time f Operation Design Type Adder Multiplier Floating Point ITRS Technology Node (nm) Area Model of Chip used Energy/Power Consumption(W J) else indicate low high Energy and Area Analysis of a Floating-Point Unit [1] 32 bits (Operands) IEEE-754 Single Precision 45nm and 15nm (ITRS Node) 2.048mW (45nm) 0.6340mW (15nm) Ultra-area-efficient faulttolerant QCA full adder [2] 1 bit (Operands) Ultra-areaefficient faulttolerant QCA full adder 18nm^2 (Cell Area) low Energy-Efficient Multiplier- Less Discrete Convolver through Probabilistic Domain Transfmation [3] 128 bits (Operands) 4.09 μs Energy- Efficient Multiplier Virtex 6 FPGA devices (XC6VLX550t) (Model of Chip used) 166.63 nj High Throughput Poweraware FIR Filter Design based on Fine-grain Pipeline Multipliers and Adders [4] 16 bits (Operands) Fine-Grained Pipelining Adders Fine-Grained Pipelining Multipliers 0.24μm static CMOS Fault Demotion Using Reconfigurable Slack (FaDReS) [5] 9 bits (Operands) 200ms DSP48 multipliers FPGA, PRM, DCT and HWICAP (model of Chip used) Design & analysis of 16 bit RISC process using low power pipelining [6] 16 bits (Operands) Carry Select Adder Barrel Shift Rotat XILINX KINTEX (XC7K1607-3fbg676) 28 nm Clock gating based energy efficient ALU design and implementation on FPGA [7] 8 bits (Operands) 1ps Carry Out Adder, Carry In adder 90 nm Spartan-3 (model of device) A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90- nm CMOS [8] 64 bits (Operands) 30ps Sparse-tree semi-dynamic adder and 32b radix-2 sparse-tree adder 90 nm (CMOS) 300 mw Energy efficient design and implementation of ALU on 40nm FPGA [9] 8 bits (Operands) ELM Adder 40 nm (Virtex-6 FPGA) 88 mw A CPL-based dual supply 32-bit ALU f sub 180nm CMOS technologies[10] 32 bits (Operands) 32-bit adder 180 nm-65 nm (CMOS)