Analyzing Metrics of ALU Designs Traversing from Years 2002 to 2015

Similar documents
<Explanation of Improved the Quality of ALU And Ten Different Types of Designs for Decreasing Power Dissipation>

The Metrics and Designs of an Arithmetic Logic Function over

Energy, Time, and Space Complexity Analysis of ALU Designs Spanning from 2000 to the Present

A Scientific Insight to Exemplary ALU s, Floating Point Designs, and Effective Processing Units

The Configurations and Implementations of Different Adders and Multipliers in ALUs throughout the Past Decade

ALU Floating Point Design: An Exploration of Advancement

A Need for Speed with Reduced Power: An ideological look at how ALUs have Improved Over Time

A Study of The Advancement of CMOS ALU & Full Adder Circuit Design For Modern Design

An Analysis of Full Adders and Floating Point Units: Optimization using beyond CMOS Technology

ASIC Implementation of High Speed Area Efficient Arithmetic Unit using GDI based Vedic Multiplier

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Low-Power Multipliers with Data Wordlength Reduction

Field Programmable Gate Arrays based Design, Implementation and Delay Study of Braun s Multipliers

Power Efficient Optimized Arithmetic and Logic Unit Design on FPGA

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Research Article Design of a Novel Optimized MAC Unit using Modified Fault Tolerant Vedic Multiplier

SPIRO SOLUTIONS PVT LTD

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Webpage: Volume 3, Issue V, May 2015 ISSN

MACGDI: Low Power MAC Based Filter Bank Using GDI Logic for Hearing Aid Applications

Design and Analysis of CMOS Based DADDA Multiplier

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

An Area Efficient Decomposed Approximate Multiplier for DCT Applications

International Journal of Modern Engineering and Research Technology

Heterogeneous Concurrent Error Detection (hced) Based on Output Anticipation

Design of Low power and Area Efficient 8-bit ALU using GDI Full Adder and Multiplexer

International Journal of Advance Engineering and Research Development

A Low Power and Area Efficient Full Adder Design Using GDI Multiplexer

Low-Power Digital CMOS Design: A Survey

Low Power 8-Bit ALU Design Using Full Adder and Multiplexer Based on GDI Technique

Low Power 8-Bit ALU Design Using Full Adder and Multiplexer

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

Implementation of Low Power High Speed Full Adder Using GDI Mux

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

Low-Power CMOS VLSI Design

Course Outcome of M.Tech (VLSI Design)

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

Design and Analysis of Row Bypass Multiplier using various logic Full Adders

Design of an Energy Efficient 4-2 Compressor

An Area Efficient and High Speed Reversible Multiplier Using NS Gate

Digital Integrated CircuitDesign

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

DESIGN AND ANALYSIS OF VEDIC MULTIPLIER USING MICROWIND

Fpga Implementation of Truncated Multiplier Using Reversible Logic Gates

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

Implementation of 4x4 Vedic Multiplier using Carry Save Adder in Quantum-Dot Cellular Automata

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

NanoFabrics: : Spatial Computing Using Molecular Electronics

DESIGN OF HIGH EFFICIENT AND LOW POWER MULTIPLIER

FPGA Implementation of MAC Unit Design by Using Vedic Multiplier

Implementation of High Performance Carry Save Adder Using Domino Logic

A new 6-T multiplexer based full-adder for low power and leakage current optimization

A Survey on Power Reduction Techniques in FIR Filter

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

Leakage Power Reduction in 5-Bit Full Adder using Keeper & Footer Transistor

MTCMOS Post-Mask Performance Enhancement

Optimum Analysis of ALU Processor by using UT Technique

A Static Power Model for Architects

An Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing. Rajeevan Amirtharajah University of California, Davis

Design and Implementation of an Efficient Vedic Multiplier for High Performance and Low Power Applications

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

QUATERNARY LOGIC LOOK UP TABLE FOR CMOS CIRCUITS

International Journal of Modern Engineering and Research Technology

A Survey on Design of Pipelined Single Precision Floating Point Multiplier Based On Vedic Mathematic Technique

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Design of 64-Bit Low Power ALU for DSP Applications

CHAPTER 5 IMPLEMENTATION OF MULTIPLIERS USING VEDIC MATHEMATICS

ISSN Vol.07,Issue.08, July-2015, Pages:

Implementation and Performance Analysis of a Vedic Multiplier Using Tanner EDA Tool

DESIGN OF PARALLEL MULTIPLIERS USING HIGH SPEED ADDER

An Efficient Design of Vedic Multiplier Using Pass Transistor Logic

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

CMOS Process Variations: A Critical Operation Point Hypothesis

Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley Based Multiplier

Design of Delay-Power Efficient Carry Select Adder using 3-T XOR Gate

A COMPARATIVE ANALYSIS OF AN ULTRA-LOW VOLTAGE 1-BIT FULL SUBTRACTOR DESIGNED IN BOTH DIGITAL AND ANALOG ENVIRONMENTS

2. URDHAVA TIRYAKBHYAM METHOD

Energy Efficient Memory Design using Low Voltage Complementary Metal Oxide Semiconductor on 28nm FPGA

DESIGN OF HIGH SPEED AND ENERGY EFFICIENT CARRY SKIP ADDER

Design of 32 Bit Vedic Multiplier using Carry Look Ahead Adder

MOS CURRENT MODE LOGIC BASED PRIORITY ENCODERS

Implementation and Performance Analysis of different Multipliers

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

HIGH SPEED APPLICATION SPECIFIC INTEGRATED CIRCUIT (ASIC) DESIGN OF CONVOLUTION AND RELATED FUNCTIONS USING VEDIC MULTIPLIER

Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery

HIGH-PERFORMANCE ENERGY-EFFICIENT MICROPROCESSOR DESIGN

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Transcription:

Analyzing Metrics of ALU Designs Traversing from Years 2002 to 2015 Brianna V. Thomason Department of Electrical and Computer Engineering University of Central Flida Orlando, FL 32816-2362 Email: brianna.thomason@knights.ucf.edu Abstract In the following paper, ten architecture papers are compared, spanning from the years 2002 to 2015. Each has implemented a unique arithmetic logic unit design using fundamental metrics, such as adders, multipliers and floating points, which are examined in this paper. The designers also discuss how power consumption and energy efficiency are effected, which are compared in Table 1, and why they are imptant to consider when implementing a new design. Clock rate, how many clock cycles a CPU can perfm per second, and supply voltage, the voltage between the input and output, are a few other metrics that are compared to one another in the table. Keywds ALU, Power Consumption, Supply Voltage, ITRS Node, Execution Time, CPU, Energy Efficiency, Clock Cycle I. INTRODUCTION One of the main components of the computer s central processing unit (CPU) is the arithmetic logic unit (ALU). The ALU is an intricate electronic circuit perfming arithmetic and bitwise logical operations on the operands, loading the data from registers. As shown in figure 1, the datapath of the ALU includes the inputs and output registers. The control path going in contains the function, while the control path going out includes any flag bits, comprising of the zero and overflow. The inside of the ALU is comprised of the essential logic gates which perfm addition, subtraction and multiplication operations and any Boolean instructions. The process executes a large set of instructions. These instructions tell the CPU which type of arithmetic logical calculations the ALU will execute. Furtherme, they tell the CPU the locations of the data and where to ste the results. The registers provide the sources of the data needed to execute the calculations and the destination f these results. The data bus sends and receives data from the memy. Occasionally, CPUs will have a data bus width that is narrower than the ALU width, which minimizes the cost of the chip. Although energy and power are distinctive from each other, they are interconnected. The process in which the CPU consumes energy results in power dissipation. The goal f engineers is to design processs that use less power and preserve energy which will reduce total cost and be better f the environment. Effective ways to reduce power dissipation would be to decrease the clock rate, lessen voltage, even use multiple and slower ces in the design. The following equations are used to determine the CPU s execution time, power dissipation and energy consumption: (1) CPU Time = Instruction Count * CPI * Clock Cycle Time (2) Power = Capacitance * Voltage 2 * Frequency (3) Energy = Power * Running Time (4) Energy Efficiency = Process Throughput / Energy Consumed The International Technology Roadmap f Semiconducts, ITRS, is a fifteen year valuation of the requirements f the future semiconduct engineering s technology. Moe s Law states that the amount of transists will double nearly every two years. The trend is expected to end around the year 2022, when the ITRS node will reach to 5 nanometers. There are ten arithmetic logic unit designs, spanning from the year 2002 to 2015, that are reviewed in Section II, each with a unique implementation. The auths discuss their projects in terms of energy, power and ITRS technology. The following section compares and contrasts their research and designs. Fig. 1 Design of the Arithmetic Logic Unit Page 1 of 5

II. LITERATURE REVIEW In 2015, Soheil Salehi et al. examined power consumption and cell area of an IEEE-754 Single Precision Floating-Point Unit (Fig. 2) in a 15nm and 45nm Complementary Metal-Oxide Semiconduct (CMOS) technology. He used power consumption and cell area to compare the two technologies. His results revealed that when using the 15nm technology, it had four times less energy and a supply voltage of 0.8V, while the 45nm had a supply voltage of 1.1V (Fig. 5) [1]. Fig. 2 FPU Functional Elements [1] In 2015, Arman Roohi et al. designed a Quantum-Dot Cellular Automata (QCA) full adder. When analyzing the aspects of latency, complexity and area, the results revealed these were advantages of the proposed full adder when compared to preceding designs [2]. In 2014, Mohammed Alawad et al. presented a highperfmance reconfigurable discrete convolver specifically designed f FPGA-based image and video processs. Whereas the typical multiplier-based design can attain a runtime of O(n^2), the most significant benefit of this proposed design is that it can achieve approximately O(n) in algithmic convolution, therefe being me scalable and energy efficient [3]. In 2013, Naveed Imran et al. proposed an active dynamic redundancy-based fault-handling approach exploiting the partial dynamic reconfiguration capability of static random-access memy-based field-programmable gate arrays. The experimental testing of the FaDReS algithm exhibited valuable results f fault-handling situations [4]. In 2013, Bishwajeet Pandey et al. used latch-free clock gating techniques, applied in the ALU and implemented on the 90nm Spartan-3, to reduce clock power and dynamic power consumption (Fig. 5). Although the clock gating techniques increase area, they also reduce clock and power consumption of the overall design. [7]. In 2009, Jian Huang et al. proposed a field programmable gate array-based scalable architecture f discrete cosine transfm (DCT) computation using FPGA dynamic partial reconfiguration. The auths have analyzed certain specifications of their proposed design such as power consumption, processing clock cycle and reconfiguration overhead and provided the detailed trade-offs. The power was found to have a clock rate of 41.79MHz (Fig. 6). Low precision implementation with reduced ROM size can be beneficial in terms of hardware and power consumption. [5]. In 2009, M. Ramalatha proved the efficiency of the Urdhva Triyagbhyam-Vedic method f multiplication which strikes a difference in the actual process of multiplication itself. The complexity, execution time, area and power are reduced by utilizing the techniques in the computation algithms of the coprocess, which is used to build a high speed power efficient multiplier [6]. In 2009, Kui YI et al. analyzed structure and algithm of the Floating-Point ALU, implementing multiplication and division operations. The Floating-Point number is suppted by the Floating-Point multiplication and division ALU, which is IEEE- 754 standard. The result proves that the Floating-Point number realizes the expectant function. This ALU assumes a 4-Level pipelining structure, with each step acting as a single module. The pipelining structure implements each parallel operation and improves the system perfmance [9]. In 2004, Bhaskar Chatterjee et al. presented a high perfmance 32-bit ALU with low power applications, which minimized total energy. This was implemented in the 180nm- 65nm CMOS technologies. The results concluded that it is possible to reduce the ALU total energy by 18-24% with little delay and reduction in power leakage [10]. In 2002, A. Srivastava et al. designed a high-speed 4-bit ALU (Fig. 4), incpating a ripple carry adder into the design (Fig. 3), to show the effectiveness of the Back-Gate Fward Substrate Bias (BGFSB) method in 1.2 um N-well CMOS technology. It has been emphasized f its low-voltage and highspeed applications [8]. Fig. 3 Block diagram of a 4-bit ripple carry adder showing wst case delay [3]

III. DATA ANALYSIS Fig. 4 The number of bit operands in the designs of the indicated papers. Fig. 7 The supply voltage of each design f the indicated papers. IV. CONCLUSION After comparing the ten arithmetic logic unit designs, spanning from the year 2002 to 2015, each with a unique implementation, there are several design metrics that were related. Each has various design types including types of adders, multipliers and floating points. The auths discuss their projects in terms of energy, power and ITRS technology nodes, while this paper also compares the clock rates and supply voltages of each. Fig. 5 Power consumption (mw) in the desgins of the indicated papers. Fig. 6 Clock rate of the designs in the indicated papers. REFERENCES [1] S. Salehi, and R. F. DeMara, "Energy and Area Analysis of a Floating- Point Unit in 15nm CMOS Process Technology," in Proceedings of IEEE SoutheastCon 2015 (SECon-2015), Ft Lauderdale, FL, April 9-12, 2015. [2] A. Roohi, R. F. DeMara, and N. Khoshavi, "Design and Evaluation of an Ultra-Area-Efficient Fault-Tolerant QCA Full," Microelectronics Journal, Vol. 46, No. 6, pp. 531-542., June 2015, [3] M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, Energy-Efficient -Less Discrete Convolver through Probabilistic Domain Transfmation, in Proceedings of 22nd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA-14), pp. 185-188, Monterey, Califnia, USA, February 27-28, 2014. [4] N. Imran, J. Lee and R. F. DeMara, "Fault Demotion Using Reconfigurable Slack (FaDReS)," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.21, no.7, pp.1364-1368, July 2013. [5] J. Huang, M. Parris, J. Lee, and R. F. DeMara, "Scalable FPGA-based Architecture f DCT Computation Using Dynamic Partial Reconfiguration," ACM Transactions on Embedded Computing Systems, Vol. 9, No. 1, Art. 9, pp. 1 18, October, 2009. [6] Ramalatha, M.; Dayalan, K.D.; Dharani, P.; Priya, S.D., "High speed energy efficient ALU design using Vedic multiplication techniques," Advances in Computational Tools f Engineering Applications, 2009. ACTEA '09. International Conference on, vol., no., pp.600,603, 15-17 July 2009. [7] Pandey, B.; Yadav, J.; Pattanaik, M.; Rajia, N., "Clock gating based energy efficient ALU design and implementation on FPGA," Energy Efficient Technologies f Sustainability (ICEETS), 2013 International Conference on, vol., no., pp.93,97, 10-12 April 2013. [8] A. Srivastava and D. Govindarajan, A Fast ALU Design in CMOS f Low Voltage Operation, VLSI Design, vol. 14, no. 4, pp. 315-327, 2002.

[9] Kui Yi; Yue-Hua Ding, "32 bit Multiplication and Division ALU Design Based on RISC Structure," Artificial Intelligence, 2009. JCAI '09. International Joint Conference on, vol., no., pp.761,764, 25-26 April 2009. [10] Bhaskar Chatterjee, Manoj Sachdev, and Ram Krishnamurthy, A CPLbased dual supply 32-bit ALU f sub 180nm CMOS technologies, In Proceedings of the international symposium on Low power electronics and design (ISLPED '04). ACM, pp. 248-251, New Yk, NY, USA, 2004.

TABLE I. COMPARISON OF 10 ARITHMETIC LOGIC UNIT DESIGNS ALU Floating Point Architecture Name Energy and Area Analysis of a Floating- Point Unit [1] Design and Evaluation of an Ultra-Area- Efficient Fault-Tolerant QCA Full [2] Energy-Efficient -Less Discrete Convolver through Probabilistic Domain Transfmation [3] Datapath width (bits) #bits in operands Time f Operation Design Type 32 bits 1 bit Ultra-areaefficient fault-tolerant QCA full adder 128 bits Floating Point IEEE-754 Single Precision ITRS Technology Node (nm) Area Model of Chip used 45 nm and 15 nm (ITRS Technology) Energy/Power Consumption(W J) else indicate low high 2.048 mw (45nm) 0.6340 mw (15nm) Clock Rate Clock Frequency 200 MHz Supply Voltage 1.1 V (45 nm) 0.8 V (15 nm) 18 nm 2 (area) Low 4.09 μs Energy- Efficient 40 nm Virtex-6 FPGA devices (XC6VLX550t) 166.63 nj 250 MHz 1.0 V Fault Demotion Using Rreconfigurable Slack (FaDRes) [4] 32 bits DSP48 90 nm Virtex-4 FX60 FPGA 1541 mw 108 MHz 1.2 V Scalable FPGA-based Architeture f DCT Computation Using Dynamic Partial Reconfiguration [5] High Speed Energy Efficient ALU Design using Vedic Multiplication Techniques [6] Clock Gating Based Energy Efficient ALU Design and Implementation on FPGA [7] Adaptive 1 to 8 bits Adaptive 8 to 64 bits 8 bits Reconfigurable PE f DCT using ROM Shifters (RS) Carry Out and Carry In s 15ns 45ns Optimized Vedic 90 nm Virtex-4 SX35 FPGA 24.03-26.27 mw 41.79 MHz (power) 100 MHz (SelectMap) 1.2 V Vedic MAC unit Low 90 nm Spartan-3 FPGA 23889 mw (1 THz) 2433 mw (100 GHz) 36 mw (1 GHz) 3 mw (100 MHz) 1 THz 100 GHz 1 GHz 100 MHz 1.2 V A Fast ALU Design in CMOS f Low Voltage Operation [8] 4 bits Ripple Carry 1200 nm N-well CMOS Low 0.5 MHz 1 V 32-bit Multiplication and Division ALU Design Based on RISC Structure [9] 32 bits RISC Structure IEEE-754 Standard GW48 EDA system A CPL-Based Dual Supply 32-bit ALU f Sub 180nm CMOS Technologies [10] 32 bits Propgate- Generate Unit 180 nm 65 nm CMOS MUX Reduction in Energy 18-24% 4.2 GHz 0.7-1.0 V