Class Project: Low power Design of Electronic Circuits (ELEC 6970) 1

Similar documents
Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

CHAPTER 5 IMPLEMENTATION OF MULTIPLIERS USING VEDIC MATHEMATICS

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

DESIGN OF BINARY MULTIPLIER USING ADDERS

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter

CSE 370 Winter Homework 5 Solutions

Design of Delay Efficient PASTA by Using Repetition Process

Design and Implementation of Complex Multiplier Using Compressors

DESIGN OF LOW POWER HIGH SPEED ERROR TOLERANT ADDERS USING FPGA

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN

CHAPTER 3 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED ADDER TOPOLOGIES

An Efficent Real Time Analysis of Carry Select Adder

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

DESIGN OF HIGH SPEED 32 BIT UNSIGNED MULTIPLIER USING CLAA AND CSLA

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

High Speed, Low power and Area Efficient Processor Design Using Square Root Carry Select Adder

ADVANCES in NATURAL and APPLIED SCIENCES

Analysis of Low Power, Area- Efficient and High Speed Multiplier using Fast Adder

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

II. LITERATURE REVIEW

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

A HIGH SPEED DYNAMIC RIPPLE CARRY ADDER

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

PUBLICATIONS OF PROBLEMS & APPLICATION IN ENGINEERING RESEARCH - PAPER CSEA2012 ISSN: ; e-issn:

Reconfigurable High Performance Baugh-Wooley Multiplier for DSP Applications

Design and Implementation of High Speed Carry Select Adder

SQRT CSLA with Less Delay and Reduced Area Using FPGA

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

CHAPTER 1 INTRODUCTION

Multiplier and Accumulator Using Csla

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

NOVEL HIGH SPEED IMPLEMENTATION OF 32 BIT MULTIPLIER USING CSLA and CLAA

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

Design of an optimized multiplier based on approximation logic

Comparative Analysis of Various Adders using VHDL

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

128 BIT MODIFIED SQUARE ROOT CARRY SELECT ADDER

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

Signal Processing Using Digital Technology

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

A Survey on Power Reduction Techniques in FIR Filter

FPGA Implementation of Area-Delay and Power Efficient Carry Select Adder

Design and Implementation of 64-bit MAC Unit for DSP Applications using verilog HDL

A Highly Efficient Carry Select Adder

Techniques to Optimize 32 Bit Wallace Tree Multiplier

Design of 32-bit Carry Select Adder with Reduced Area

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

An Efficient Baugh-WooleyArchitecture forbothsigned & Unsigned Multiplication

An Efficient Implementation of Downsampler and Upsampler Application to Multirate Filters

Wave Pipelined Circuit with Self Tuning for Clock Skew and Clock Period Using BIST Approach

On Built-In Self-Test for Adders

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

CLAA, CSLA and PPA based Shift and Add Multiplier for General Purpose Processor

Implementation and Performance Analysis of different Multipliers

DESIGN OF CARRY SELECT ADDER WITH REDUCED AREA AND POWER

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

DESIGN OF LOW POWER MULTIPLIERS

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

II. Previous Work. III. New 8T Adder Design

FPGA IMPLEMENATION OF HIGH SPEED AND LOW POWER CARRY SAVE ADDER

Chapter 1 Introduction

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

Design and Implementation Radix-8 High Performance Multiplier Using High Speed Compressors

Ajmer, Sikar Road Ajmer,Rajasthan,India. Ajmer, Sikar Road Ajmer,Rajasthan,India.

Architectural and Technology Influence on the Optimal Total Power Consumption

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

DESIGN OF PARALLEL MULTIPLIERS USING HIGH SPEED ADDER

Digital Integrated CircuitDesign

Pipelined Linear Convolution Based On Hierarchical Overlay UT Multiplier

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

Course Outcome of M.Tech (VLSI Design)

Design, Implementation and performance analysis of 8-bit Vedic Multiplier

Computer Architecture and Organization:

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Compressor Based Area-Efficient Low-Power 8x8 Vedic Multiplier

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Low-Power Approximate Unsigned Multipliers with Configurable Error Recovery

IES Digital Mock Test

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

A NOVEL IMPLEMENTATION OF HIGH SPEED MULTIPLIER USING BRENT KUNG CARRY SELECT ADDER K. Golda Hepzibha 1 and Subha 2

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Computer Arithmetic (2)

Improved Performance and Simplistic Design of CSLA with Optimised Blocks

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power

Transcription:

Power Minimization using Voltage reduction and Parallel Processing Sudheer Vemula Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL. Goal of the project:- To reduce the power consumed by the x array multiplier by including parallel processing, without any increase in the delay of the critical path. Problem Statement:- The array multiplier in itself has a lot of parallelism included, i.e, the processing is done simultaneously in most of the blocks. The initial intuition is that the further inclusion of parallelism in the same circuit may not have any further improvement. Design Approach:- Inclusion of parallelism should improve the speed of operation of the circuit. Then we can reduce the voltage supply, which will reduce the power dissipation but will increase the delay of the circuit. The additional delay added in the circuit should be compensated by the included parallelism and the circuit should be able to work at its normal frequency of operation. For including parallelism in the circuit additional circuitry might be added, which will increase the area overhead and also the power consumption. The amount of power dissipated may be high or low depending on the type of the overhead circuitry. The final power consumption of the circuit should be lower than the initial value by adding appropriate amount of overhead. Introduction:- Concurrent execution of several programs or several blocks of a program is known as parallel processing [1]. There are two ways of including parallelism in the circuit. They are 1) Data Parallelism 2) Control Parallelism Data Parallelism is parallel execution of single expression on data distributed over multiple processors [2]. Control Parallelism is the parallelism that is achieved by the simultaneous execution of multiple threads [3], i.e., performing different operations on same data simultaneously. Design Techniques:- There are two ways of including parallelism in the circuits. First, the same core can be replicated several times as shown in Fig.1. This is known as multi-core architecture. By replicating the same core several times the incoming inputs are applied to different cores in sequence. Now the individual cores can be slowed down by Voltage scaling which in turn will reduce the power consumption of the whole circuit. The area overhead is very high in multi-core architecture and this architecture will work for any circuit independent of the combinational logic present in the circuit. Class Project: Low power Design of Electronic Circuits (ELEC 6970) 1

A copy processes every Nth input, operates at reduced voltage Input Copy 1 Copy 2 N to 1 multiplexer Supply voltage: V N V 1 = V ref N = Deg. of parallelism f Output CK Multiphase Clock gen. and mux control Copy N Fig.1 Multi-core parallel Architecture [4] In second design technique, the big circuit can be partitioned into several small circuits to include parallelism. This is also a type of pipelining but it is different from the pipelining which has been shown for the first design technique. This is similar to the pipelining which is used in the data-path architecture. In this design technique the area overhead is very less compared to the first one. As design of the circuit by first technique is independent of the combinational logic, it is more of an implementation of the design. The area overhead for the second circuit much lesser than the first one and also there is scope for both the design and implementation. So, I have concentrated mostly on the second design technique. Architecture of the Design:- The basic idea used in the design of my pipelined multiplier architecture is to compute partial products rather than computing the whole product at a time. Once the partial products are computed, they can be added by including the respective shift value. It is described in an example showing the operation on two 2 integer numbers Ex.: A=98 and B=76 AxB = (90x76) + (8x76) = (9x76) 10 + 8x76 = (9x7) 100 + (9x6) 10 + (8x7) 10 + (8x6) As the binary representation takes more digits to represent the same value, this method becomes more effective. The given x array multiplier can be partitioned in two different ways to include parallelism 1) Horizontal Partition and 2) Vertical Partition. Class Project: Low power Design of Electronic Circuits (ELEC 6970) 2

Fig.2 Horizontal and Vertical Partitions of the circuit By doing the vertical partition, the delay of the critical path can be reduced by a larger amount than by doing the horizontal partition. The red lines show one of the possible critical paths in the circuit. Both vertical and horizontal partitions can be included on same circuit to further decrease the delay of the critical path. Delay Overhead:- A x multiplier with four x multipliers is shown in Fig. 3. There is only one full adder delay overhead due to the bit full adder because the ripple carry adder performs the addition in sequence as soon as it receives the values. And the outputs of the multiplier are always available to the full adder. And the delay overhead due to bit Carry Look Ahead (CLA) adder and the bit Ripple Carry (RC) adder is equal to the delay of the bit CLA adder because the inputs to the RC adder arrive in sequence before the inputs to the CLA adder. And both the inputs to CLA adder arrive at the same time. The delay of the critical path can be further reduced by implementing the x multiplier with four 8x8 bit multipliers. And it can be further reduced by implementing 8x8 multiplier with four 4x4 multipliers. RC adders have been chosen because they have the lowest power consumption of all the adders. And wherever the speed is important CLA adders have been used [5]. For a x multiplier designed with 4x4 multipliers the delay is supposed to get reduced by 68% with only 17% area overhead. The detailed calculations of delay and area overhead can be found in the presentation slides. Implementation:- First a generic NxM array multiplier has been designed in VHDL. The circuit has been simulated and verified using the Mentor Graphics, Modelsim simulation tool. ELDO, a circuit level simulation tool has been used to find the actual delay of the circuit. ELDO accepts only verilog files. So, the VHDL source code has to be converted to verilog. This conversion is done using Leonardo, a synthesis tool. Leonardo performs the synthesis and also it can provide the output in several different formats. Class Project: Low power Design of Electronic Circuits (ELEC 6970) 3

bits bits bits bits x bits x bits x bits x bits Cout 1 bit HA bit Full Adder bit Full Adder 48 bit result 15 48 bit result 1 Cout bit CLA bit FA 15 bit HA FA S 15 C 64 bit Result Fig.3 x multiplier with four x multipliers Next the proposed circuit with parallelism has been designed using VHDL and the same procedure as explained above has been done to import the circuit to ELDO. For designing the circuit with parallelism Full Adders, Half Adders and Multiplexers are designed separately and then combined. Comparison of Delay using the Leonardo:- Circuit Normal Circuit Delay Parallelized Circuit Delay 8x8 7.42 units 6.54 units x.7 units 12.73 units x 35.27 units 25.13 units When the simulation is performed using ELDO, the circuit has been simulated for all 0 s to all 1 s transition. For a x normal multiplier the delay was 8.7ns and for a parallelized circuit the delay was 12.07ns. The delay was actually higher because the delay due to different inputs will be different for different design of the multipliers. The actual delay of the critical path is supposed to be smaller. Class Project: Low power Design of Electronic Circuits (ELEC 6970) 4

Here each multiplier has been implemented with 4 mini multipliers of respective dimensions. And the CLA adder has not been implemented, so it is replaced by a RC adder. The power consumption for a x multiplier was around 1 watt (peak power for different vectors). Points of Interest:- 1) ELDO simulation tool hasn t been totally developed. It was requiring some modifications in the verilog file even when the source code is totally correct. 2) The documentation for the tool is not really good. It takes some time to figure out things. Lessons Learned:- 1) I was mainly concentrating on improvement of the delay and I didn t note the area values provided by the synthesis tool. So, I couldn t provide any results on area overhead. 2) Didn t find the vector to activate the critical path, so couldn t provide any valuable results of the work done at the circuit level simulation. Future Work:- The parallelism in the circuit has been implemented at only four levels. The number of levels can be improved to get good results. CLA can be implemented and used to improve the results. Still there are quite a few improvements that can be made in this circuit to provide better results. An LFSR can be used to provide the inputs required for power estimation. Conclusion:- The Leonardo synthesis tool showed considerable improvement in delay. So, power can be reduced by implementing parallelism in the array multiplier. References:- [1] dspvillage.ti.com/docs/catalog/dspplatform/details.jhtml [2] www.llnl.gov/casc/overture/henshaw/documentation/app/manual/node0.html [3] books.nap.edu/html/up_to_spedd/appd.html [4] Class Slides [5] J. M. Rabey & M. Pedram, Low power Design Metodologies, Kluwer Academic Publishers, Boston MA, 1996. Class Project: Low power Design of Electronic Circuits (ELEC 6970) 5