The Metrics and Designs of an Arithmetic Logic Function over

Similar documents
<Explanation of Improved the Quality of ALU And Ten Different Types of Designs for Decreasing Power Dissipation>

Analyzing Metrics of ALU Designs Traversing from Years 2002 to 2015

Energy, Time, and Space Complexity Analysis of ALU Designs Spanning from 2000 to the Present

A Scientific Insight to Exemplary ALU s, Floating Point Designs, and Effective Processing Units

The Configurations and Implementations of Different Adders and Multipliers in ALUs throughout the Past Decade

ALU Floating Point Design: An Exploration of Advancement

A Study of The Advancement of CMOS ALU & Full Adder Circuit Design For Modern Design

An Analysis of Full Adders and Floating Point Units: Optimization using beyond CMOS Technology

A Need for Speed with Reduced Power: An ideological look at how ALUs have Improved Over Time

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

To appear in IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, San Francisco, February 2002.

Digital Integrated CircuitDesign

Anitha R 1, Alekhya Nelapati 2, Lincy Jesima W 3, V. Bagyaveereswaran 4, IEEE member, VIT University, Vellore

Research Article Design of a Novel Optimized MAC Unit using Modified Fault Tolerant Vedic Multiplier

Design of Delay-Power Efficient Carry Select Adder using 3-T XOR Gate

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

Design and Analysis of CMOS Based DADDA Multiplier

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Power Efficient Optimized Arithmetic and Logic Unit Design on FPGA

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

DESIGN OF HIGH SPEED AND ENERGY EFFICIENT CARRY SKIP ADDER

FPGA IMPLEMENATION OF HIGH SPEED AND LOW POWER CARRY SAVE ADDER

A Comparative Study on Direct form -1, Broadcast and Fine grain structure of FIR digital filter

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Webpage: Volume 3, Issue V, May 2015 ISSN

A Survey on Power Reduction Techniques in FIR Filter

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

A Novel Approach For Designing A Low Power Parallel Prefix Adders

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Analysis of Parallel Prefix Adders

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Low Power and High Performance ALU using Dual Mode Transmission Gate Diffusion Input (DMTGDI)

Design and Analysis of Improved Sparse Channel Adder with Optimization of Energy Delay

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI)

Parallel Prefix Han-Carlson Adder

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

Comparative Analysis of Multiplier in Quaternary logic

Design and Implementation of High Speed Carry Select Adder

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

Faster and Low Power Twin Precision Multiplier

A CASE STUDY OF CARRY SKIP ADDER AND DESIGN OF FEED-FORWARD MECHANISM TO IMPROVE THE SPEED OF CARRY CHAIN

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

An Optimized Design of High-Speed and Energy- Efficient Carry Skip Adder with Variable Latency Extension

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Design of Low Power High Speed Adders in McCMOS Technique

International Journal of Advance Engineering and Research Development

By Dayadi Lakshmaiah, Dr. M. V. Subramanyam & Dr. K. Satya Prasad Jawaharlal Nehru Technological University, India

Computer Arithmetic (2)

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

SPIRO SOLUTIONS PVT LTD

Technical Paper. Samuel Naffziger. Hewlett-Packard Co., Fort Collins, CO

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

MULTIRATE IIR LINEAR DIGITAL FILTER DESIGN FOR POWER SYSTEM SUBSTATION

An Efficient Baugh-WooleyArchitecture forbothsigned & Unsigned Multiplication

ASIC Implementation of High Speed Area Efficient Arithmetic Unit using GDI based Vedic Multiplier

On Built-In Self-Test for Adders

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

QUATERNARY LOGIC LOOK UP TABLE FOR CMOS CIRCUITS

Low-Power Multipliers with Data Wordlength Reduction

Design of High Speed Power Efficient Combinational and Sequential Circuits Using Reversible Logic

An Analysis of Multipliers in a New Binary System

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

Glitch Power Reduction for Low Power IC Design

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

A Novel Design of High-Speed Carry Skip Adder Operating Under a Wide Range of Supply Voltages

TIME EFFICIENT PARITY GENERATOR BASED ON QUANTUM-DOT CELLULAR AUTOMATA

CARRY SAVE COMMON MULTIPLICAND MONTGOMERY FOR RSA CRYPTOSYSTEM

CHAPTER 1 INTRODUCTION

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Investigation on Performance of high speed CMOS Full adder Circuits

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

Study and Analysis of Full Adder in Different Sub-Micron Technologies with an Area Efficient Layout of 4-Bit Ripple Carry Adder

Low Power and Area EfficientALU Design

EXPERIMENTS ON DESIGNING LOW POWER DECIMATION FILTER FOR MULTISTANDARD RECEIVER ON HETEROGENEOUS TARGETS

Energy Efficient Memory Design using Low Voltage Complementary Metal Oxide Semiconductor on 28nm FPGA

Low Power and Area Efficient Implementation of B CD Adder on FPGA

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

CS4617 Computer Architecture

AUTOMATIC IMPLEMENTATION OF FIR FILTERS ON FIELD PROGRAMMABLE GATE ARRAYS

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

REDUCTION IN AREA AND POWER ANALYSIS WITH D-LATCH ENABLED CARRY SELECT ADDER USING GATE DIFFUSION INPUT

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Design and Implementation of Parallel Micro-programmed FIR Filter Using Efficient Multipliers on FPGA

Design and simulation of a QCA 2 to 1 multiplexer

Transcription:

The Metrics and Designs of an Arithmetic Logic Function over 2002-2015 Jimmy Vallejo Department of Electrical and Computer Engineering University of Central Flida Orlando, FL 32816-2362 Abstract There has been many modifications made to the Arithmetic Logic Functions throughout 2002-2015. The Arithmetic Logic Function is like the heart to the CPU if it s removed there s no way of getting infmation across to memy. It s been modified by implementing floating points, clock gating, pipeline gating, FIR structure, and the QCA full adder. Every implementation is designed to be faster and better than the next. The smaller amount of power consumption and area needed the me superi becomes the design. F example, the ALU floating point consists of 32 bit wd width in a IEEE 754 single precision with an energy of 2.048mW (45nm) to a 0.6340mW (15nm). It had two difficult challenges one being power density and the other being area f CMOS devices, but results have proved that the 15nm is better than the 45nm technology. The 15nm technology recommends 3-4 fold improvement energy efficiency over the 45nm technology. Using the 15nm technology also gives us about a 30% less cell area. Keywds power consumption, wd width, ALU, clock gating, execution time, area cell, and clock rate. I. INTRODUCTION The ALU stands f arithmetic logic unit. It allows the computer to add, subtract, and to perfm basic logical operations such as AND/OR. It also uses an electrical circuit which prefms arithmetic and bitwise logical operations using binary numbers. The ALU is a very imptant piece in a CPU (central processing unit), FPU (floating point unit), even in a GPU (graphics processing unit); every CPU, FPU, GPU could contain multiple ALUs. Sometimes the ALU is Sub-divided into two units. F example, one could be put in the fixed point operations and the other in the floating point operations. Every ALU has direct input and output access to the process controller, main memy and input/output devices. Input is data being inserted into the ALU in der to make it function. The output is what we get as a solution after the simulation has been completed. Inputs and outputs move along an electrical path called a bus. The input necessitates of an instruction that contains an operation code sometimes a fmat code. The operation code is also referred to as the opcode. The op-code tells the ALU what operation to perfm. F instance, two operands might be added/subtracted together compared logically. This is then joined with the op-code and states if it s a fixed-point a floating-point instruction. Afterwards the output is placed in a stage register which has settings that determine if the operation was completed successfully. Some of the key components that are discussed in this paper are imptant to every computer system s perfmance. The data bus wd width determines how much infmation may be carried in a single instruction. F instance, if a process is called a 16 bit process then it s a 16 bit data bus wd width. An ITRS technology node is the half pitch between two adjacent DRAM metal lines, but a company may be referring to the Lmin of a MOSFET of 130nm. Execution time is the time that it takes f a single instruction to be executed. It also makes up the last half of the instruction cycle. In der to calculate execution time you need to multiply the Instruction count times the CPI times one over the Clock Rate ( [Instr. Count] X [CPI] X [1/Clock Rate] ). Power Dissipation occurs when central processing units consume electrical energy and dissipate this energy by the action of the switching devices energy lost due to heating of the material. Its calculated easily by multiplying current times voltage ([ P=I x V ]). Energy Consumption of a process is the amount of power the computer needs in der to operate properly. The design of the ALU is a critical part of the process and new approaches to speeding up instruction are being made today. F example, many engineers use Ripple Carry Adders (RCA), Carry Look Ahead Adders (CLA), and Carry Select Adder (CSA) to implement into their designs to make the addition, subtraction, multiplication, division process me efficient. The way a RCA wks is if want to add two 32 bit numbers Every ALU is designed differently in der f it to function faster and me efficient. The faster the ALU operates the better. In Section II there are ten ALU designs spanning from 2002-2015. I will be discussing some of the metrics implemented into ALUs over the years and also their functionality s. II. LITERATURE REVIEW The energy analysis of a floating unit was brought fth in 2015 by IEEE Southeast Conference [1]. The energy analysis consisted of 32 bits in a floating point IEEE 754 single precision with an energy of 2.048mW (45nm) to a 0.6340mW (15nm). The power density and area were two difficult Page 1 of 5

challenges f CMOS devices but results have proved that a 15nm is better than the 45nm technology. The 15nm technology recommends 3-4 fold improvement energy efficiency over the 45nm technology. Using 15nm technology also gives us about 30% less cell area. Design and Evaluation of an Ultra-Area-Efficient Fault- Tolerant QCA Full was discussed in 2015 by Microelectronics Journal. It involved a wd width of 1 bit operand with an Ultra-area-efficient fault-tolerant QCA full adder and a cell area of 18nm^2. The full adder achieves significant advances over the previous designs in terms of cell count and area. The effectiveness of the adder is verified through implementation of a 4-bit carry save adder [2]. Design & Analysis of a 16 bit RISC Process Using low Power Pipelining was discussed in 2015 International Conference it explains the pipelining scheme f high throughput FIR [6]. By implementing pipelining multipliers and adders into the design it achieves very high throughput. F a 2-Dimensional pipeline gating technique it makes the designed FIR power aware of the accuracy of the operands. This model was operated with a wd width of 16 bit (Operands) a Carry Save Adder and a Virtex 6 FPGA devices (XC6VLX 550t) chip. Energy-Efficient Multiplier-Less Discrete Convolver through Probabilistic Domain Transfmation was brought fth in 2014 Monterey, Califnia. The design implemented with Virtex 6 FPGA devices (XC6VLX550t) requires 4.09 µs to perfm a 128 128 convolution and dissipates only 166.63 nj in energy consumption at 250 MHz [3]. Also by computing infmation with probabilistic domain enables me basic operations are to be perfmed to accomplish higher energyefficiency at a lower hardware price. This discovery has made probabilistic convolver even me valuable when the problem size increases. Clock Gating Based Energy Efficient ALU Design and Implementation on FPGA was spoken about in an International Conference article in 2013 [7]. This design uses a width of 4 bit (Datapath) with a Clock Gating Based Energy Efficient ALU, a Design 90nm (RITRS Node) Spartan-3, and low power consumption. This design has a Clock power of 50%, 41.46%, 51.30%, 55.15% and 55.78% of total dynamic power the device operating frequency is 100MHz, 1GHz, 10GHz, 100GHz and 1 THz. After clock gating techniques in ALU are done, the clock power reduces to 17.85%, 23.39%, 26.49% and 27.19% of total dynamic power. When the device operating frequency is 1GHz, 10GHz, 100GHz and 1 THz. Then we use clock gating there is 72.77% reduction in clock power, 38.88% reduction in IOs power, and 44% reduction in dynamic power in comparison to power consumption without using clock gating. Improving Power-awareness of Pipelined Array Multipliers using 2-Dimensional Pipeline Gating and its Application to FIR Design was brought into consideration by the VLSI Journal in 2006 [5]. This pipeline array consists of a 16 bit (Operands), an Array Multiplier, 0.24um chip, and low power consumption. This implementation was to gate the clocks to registers in both vertical direction and hizontal direction. F multipliers using 2 s complement representation sign extension, which tend to waste me power and longer delays; could be avoided by using this system. Simulation results have shown that an average power saving of 66% and latency reduction of 47% can be achieved under this implementation. A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS was brought fth in Solid- State Circuits, IEEE Journal in 2005 [9]. This design has a wd width of 64 bit (Operands), a Carry adder, with a 0.073mm^2 cell area, and energy consumption of 300mW as stated in the first sentence. A high perfmance 32-bit ALU f programmable logic was discussed in the 12th international symposium on Field programmable gate arrays in 2004 [10]. It s operated with a wd width of 32 bit (Operands), a high perfmance 32-bit ALU, a logic Altera s NIOS 2.0 Process chip, and has a low power consumption. High Throughput Power-aware FIR Filter Design based on Fine-grain Pipeline Multipliers and Adders were discussed in the 2003 IEEE Annual Symposium on VLSI, in Tampa, Flida [4]. The design had a wd width of 16 bit (Operands), a FIR Structure multiplier, a cell area of 0.24um, a low power consumption. A Fast ALU Design in CMOS f Voltage Operation was deliberated in VLSI Design in 2002 [8]. This implantation has a wd width of 4 bit (Operands), a Ripple Carry Adder, a technology node of 1.2nm n-well CMOS, and a low power consumption. III. DATA ANALYSIS Figure 1. The bits in operands f different processs from 2002-2015.

Figure 2. Nodes Range from 2013 to 2015. Figure 4. No Execution Time. Figure 3. The area of different process throughout the years of 2002-2015. Figure 5. The Power and Energy Consumption graph from 2005 to 2015. The other years indicated as so unable to graph. Metrics covered by various papers which are suitable f plotting: Data bus width (bits) vs. Year ITRS technology node (nm) vs. Year Execution time per ALU Floating Point Unit operation (nsec) vs. Year Power Energy vs. Year Fig. 1. <Write a Caption in your own wds below each Figure.>

IV. CONCLUSION In this study I have learned many ways to implement a design in der to achieve power consumption and a large range of different designs from 2002-2015. F instance, in a floating point f a CMOS with a 32 bit data bus it s better to use a 15nm node instead of a 45nm because it uses up to 30% less of an area cell and it achieves a 3 to 4 fold in energy improvement. Also f an ALU one way to achieve cell count and area is by using a full adder. It shows over the years how the ALUs are implemented either using pipeline gating, floating points, clock gating, FIR Structure, and the most interesting the QCA full adder. REFERENCES [1] S. Salehi, and R. F. DeMara, "Energy and Area Analysis of a Floating-Point Unit in 15nm CMOS Process Technology," in Proceedings of IEEE SoutheastCon 2015 (SECon-2015), Ft Lauderdale, FL, April 9-12, 2015. [2] A. Roohi, R. F. DeMara, and N. Khoshavi, "Design and Evaluation of an Ultra-Area-Efficient Fault-Tolerant QCA Full Adder," Microelectronics Journal, Vol. 46, No. 6, pp. 531-542., June 2015, [3] M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, Energy-Efficient Multiplier-Less Discrete Convolver through Probabilistic Domain Transfmation, in Proceedings of 22nd ACM/SIGDA International Symposium on Field- Programmable Gate Arrays (FPGA-14), pp. 185-188, Monterey, Califnia, USA, February 27-28, 2014. [4] J. Di, J. S. Yuan, and R. DeMara, "High Throughput Power-aware FIR Filter Design based on Fine-grain Pipeline Multipliers and Adders," in Proceedings of the 2003 IEEE Annual Symposium on VLSI (ISVLSI-03), pp. 260-261, Tampa, Flida, U.S.A., February 20-21, 2003. [5] J. Di, J. S. Yuan, and R. F. DeMara, "Improving Power-awareness of Pipelined Array Multipliers using 2- Dimensional Pipeline Gating and its Application to FIR Design," Integration, the VLSI Journal, Vol. 39, No. 2, March, 2006, pp. 90-112. [6] Trivedi, Priyanka; Tripathi, Rajan Prasad, "Design & analysis of 16 bit RISC process using low power pipelining," Computing, Communication & Automation (ICCCA), 2015 International Conference on, vol., no., pp.1294,1297, 15-16 May 2015. [7] Pandey, B.; Yadav, J.; Pattanaik, M.; Rajia, N., "Clock gating based energy efficient ALU design and implementation on FPGA," Energy Efficient Technologies f Sustainability (ICEETS), 2013 International Conference on, vol., no., pp.93,97, 10-12 April 2013. [8] A. Srivastava and D. Govindarajan, A Fast ALU Design in CMOS f Voltage Operation, VLSI Design, vol. 14, no. 4, pp. 315-327, 2002. [9] Mathew, S.K.; Anders, M.A.; Bloechel, B.; Trang Nguyen; Krishnamurthy, R.K.; Bkar, S., "A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90- nm CMOS," Solid-State Circuits, IEEE Journal of, vol.40, no.1, pp.44,51, Jan. 2005. [10] Paul Metzgen, A high perfmance 32-bit ALU f programmable logic, In Proceedings of the ACM/SIGDA 12th international symposium on Field programmable gate arrays (FPGA '04). ACM, pp. 61-70, New Yk, NY, USA,2004.

TABLE I. <WRITE A CAPTION IN YOUR OWN WORDS ABOVE EACH TABLE.> ALU Floating Point Architecture Name Datapath width (bits) #bits in operands Time f Operation Design Type Adder Multiplier Floating Point ITRS Technology Node (nm) Area Model of Chip used Energy/Power Consumption(W J) else indicate low high Energy and Area Analysis of a Floating-Point Unit [1] 32 bits (Operands) IEEE-754 Single Precision 45nm and 15nm (ITRS Node) 2.048mW (45nm) 0.6340mW (15nm) Ultra-area-efficient faulttolerant QCA full adder [2] 1 bit (Operands) Ultra-areaefficient faulttolerant QCA full adder 18nm^2 (Cell Area) low Energy-Efficient Multiplier- Less Discrete Convolver through Probabilistic Domain Transfmation [3] Design & Analysis of 16 bit RISC Process Using Power Pipelining [6] 128 bits (Operands) 16 bit (Operands) Carry Save Adder 4.09 μs Energy- Efficient Multiplier Virtex 6 FPGA devices (XC6VLX550t) (Model of Chip used) Virtex 6 FPGA devices (XC6VLX 550t) (Model of Chip used) 166.63 nj Clock Gating Based Energy Efficient ALU Design and Implementation on FPGA [7] 4 bit (Datapath) Clock Gating Based Energy Efficient ALU Design 90nm (RITRS Node) Spartan-3 A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90- nm CMOS[9] High Throughput Poweraware FIR Filter Design based on Fine-grain Pipeline Multipliers and Adders[4] Improving Power-awareness of Pipelined Array Multipliers using 2- Dimensional Pipeline Gating and its Application to FIR Design[5] 64 bit (Operands) Carry 0.073mm^2 300mW 16 bit (Operands) FIR Structure 0.24um 16 bit (Operands) Array Multiplier 0.24um A Fast ALU Design in CMOS f Voltage Operation[8] 4 bit (Operands) Ripple Carry Adder 1.2nm n-well CMOS A high perfmance 32-bit ALU f programmable logic[10] 32 bit (Operands) A high perfmance 32-bit ALU f programmable logic Altera s NIOS 2.0 Process