Resource Efficient Reconfigurable Processor for DSP Applications

Similar documents
Design and Implementation of High Speed Carry Select Adder

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Efficient FIR Filter Design Using Modified Carry Select Adder & Wallace Tree Multiplier

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Design and Implementation of Scalable Micro Programmed Fir Filter Using Wallace Tree and Birecoder

Design of an optimized multiplier based on approximation logic

International Journal of Advanced Research in Computer Science and Software Engineering

II. LITERATURE REVIEW

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

Low Power FIR Filter Structure Design Using Reversible Logic Gates for Speech Signal Processing

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Tirupur, Tamilnadu, India 1 2

International Journal of Modern Trends in Engineering and Research

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Design of 32-bit Carry Select Adder with Reduced Area

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Optimized FIR filter design using Truncated Multiplier Technique

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

Efficient Carry Select Adder Using VLSI Techniques With Advantages of Area, Delay And Power

Design and Simulation of 16x16 Hybrid Multiplier based on Modified Booth algorithm and Wallace tree Structure

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

An area optimized FIR Digital filter using DA Algorithm based on FPGA

High Speed Non Linear Carry Select Adder Used In Wallace Tree Multiplier and In Radix-4 Booth Recorded Multiplier

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

An Efficient Implementation of Downsampler and Upsampler Application to Multirate Filters

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Performance Analysis of FIR Filter Design Using Reconfigurable Mac Unit

DESIGN AND IMPLEMENTATION OF AREA EFFICIENT, LOW-POWER AND HIGH SPEED 128-BIT REGULAR SQUARE ROOT CARRY SELECT ADDER

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

A Survey on Power Reduction Techniques in FIR Filter

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

The Comparative Study of FPGA based FIR Filter Design Using Optimized Convolution Method and Overlap Save Method

An Optimized Design for Parallel MAC based on Radix-4 MBA

Low Power and Area EfficientALU Design

Design and Analysis of Improved Sparse Channel Adder with Optimization of Energy Delay

Index Terms: Low Power, CSLA, Area Efficient, BEC.

Design and Performance Analysis of a Reconfigurable Fir Filter

Mahendra Engineering College, Namakkal, Tamilnadu, India.

FPGA Implementation of Area Efficient and Delay Optimized 32-Bit SQRT CSLA with First Addition Logic

International Journal of Scientific & Engineering Research, Volume 7, Issue 3, March-2016 ISSN

Implementation of 32-Bit Carry Select Adder using Brent-Kung Adder

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

ENHANCING SPEED AND REDUCING POWER OF SHIFT AND ADD MULTIPLIER

Design of Efficient 64 Bit Mac Unit Using Vedic Multiplier

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

Design of Digital FIR Filter using Modified MAC Unit

International Journal Of Scientific Research And Education Volume 3 Issue 6 Pages June-2015 ISSN (e): Website:

CHAPTER 1 INTRODUCTION

Design and Analysis of CMOS Based DADDA Multiplier

Design and Implementation of Complex Multiplier Using Compressors

Faster and Low Power Twin Precision Multiplier

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Performance Analysis of Multipliers in VLSI Design

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

A Comparative Study on Direct form -1, Broadcast and Fine grain structure of FIR digital filter

Design and Implementation of Truncated Multipliers for Precision Improvement and Its Application to a Filter Structure

International Journal of Advance Engineering and Research Development

Comparative Analysis of Various Adders using VHDL

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

DESIGN OF CARRY SELECT ADDER WITH REDUCED AREA AND POWER

FPGA Implementation of Serial and Parallel FIR Filters by using Vedic and Wallace tree Multiplier

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

128 BIT MODIFIED SQUARE ROOT CARRY SELECT ADDER

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Design and Implementation of Low Power Digital FIR Filter Based on Configurable Booth Multiplier

Design of High Speed Hybrid Sqrt Carry Select Adder

ASIC Design and Implementation of SPST in FIR Filter

An Efficient Reconfigurable Fir Filter based on Twin Precision Multiplier and Low Power Adder

ISSN Vol.07,Issue.08, July-2015, Pages:

Analysis Parameter of Discrete Hartley Transform using Kogge-stone Adder

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Pipelined Linear Convolution Based On Hierarchical Overlay UT Multiplier

A VLSI Implementation of Fast Addition Using an Efficient CSLAs Architecture

Area Efficient and Low Power Reconfiurable Fir Filter

Design and Performance Analysis of 64 bit Multiplier using Carry Save Adder and its DSP Application using Cadence

Design of FIR Filter Using Modified Montgomery Multiplier with Pipelining Technique

High Speed and Reduced Power Radix-2 Booth Multiplier

LOW POWER HIGH SPEED MODIFIED SQRT CSLA DESIGN USING D-LATCH & BK ADDER

DESIGN OF AREA EFFICIENT TRUNCATED MULTIPLIER FOR DIGITAL SIGNAL PROCESSING APPLICATIONS

DESIGN AND IMPLEMENTATION OF 64- BIT CARRY SELECT ADDER IN FPGA

Implementation of 256-bit High Speed and Area Efficient Carry Select Adder

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Design and Analysis of RNS Based FIR Filter Using Verilog Language

VLSI IMPLEMENTATION OF AREA, DELAYANDPOWER EFFICIENT MULTISTAGE SQRT-CSLA ARCHITECTURE DESIGN

Design of 32 Bit Vedic Multiplier using Carry Look Ahead Adder

Implementation of 32-Bit Unsigned Multiplier Using CLAA and CSLA

Design and Implementation of Efficient Carry Select Adder using Novel Logic Algorithm

Design of a Floating Point Fast Multiplier with Mode Enabled

FIR Filter Design on Chip Using VHDL

Design and Implementation of Carry Select Adder Using Binary to Excess-One Converter

IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA

Implementation of FPGA based Design for Digital Signal Processing

Design of Roba Mutiplier Using Booth Signed Multiplier and Brent Kung Adder

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

Transcription:

ISSN (Online) : 319-8753 ISSN (Print) : 347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 014 014 International onference on Innovations in Engineering (IIET 14) On 1 st & nd March Organized by K.L.N. ollege of Engineering, Madurai, Tamil Nadu, India Resource Efficient Reconfigurable Processor for DSP Applications P..Franklin 1, M.Ramya Department Of Electronics and ommunication Engineering,S.A Engineering ollege, hennai, India 1 Department Of Electronics and ommunication Engineering,S.A Engineering ollege, hennai, India ABSTRAT--Reconfigurable processor will configure the architecture based on the application. In general processor consists of data path, control and memory unit. In proposed system SLA(arry Select Adder) with BE(Binary to Excess onverter) and SLA with D- Latch along with Wallace tree were developed to enhance the performance of MA(Multiplication and Accumulation) in data path unit. Wallace tree and SLA are used to reduce the size of the MA unit. Multiplication and addition performed in MA operation which can be enhanced for FIR(Finite Impulse Response) filters application. In MA operation 16-bit SLA with BE and SLA with D-latch architectures along with 8- bit Wallace tree which effectively reduces resource utilization. To make the function faster Wallace tree is replaced by Dadda tree. Reconfiguration in control unit also done for various functions using SLA with Dadda tree. ontrol unit is designed for controlling the operations of the data path unit. By changing the data path and control unit architecture, resource utilization, power, delay and interconnects are reduced efficiently which mostly supports multimedia and DSP applications. KEYWORDS- SLA, RA, BE, D-latch, Wallace and Dadda. I.INTRODUTION Multimedia and DSP applications mainly depend upon the speed and performance. To improve this parameters processor used in DSP and multimedia devices also to be very efficient. Generally processor consists of data path, control and memory unit [1]. Data path unit consist of MA unit which performs arithmetic and logical functions. In order to using efficient adder and components in MA unit performance of the processor is improved. This efficient MA unit is enhanced for applications. Multipliers and adders are applied for s to eliminate the delay in data transitions []. In [3][7] they presented area efficient SLA using RA and BE. In [4] they presented Wallace tree for low area and less delay. In [5] they presented design of using computational sharing based upon carry select adder technique. This paper proposes two modified design in using SLA and Dadda tree.. One for using SLA with BE along with Dadda and other for using SLA with D- latch along with Dadda Section gives the design of MA unit. Section 3 presents the design of adder unit. Section 4 describes the design of unit. Section 5 represents the Mathematical concepts of. Section 6 gives design of. Section 7 describes simulation and synthesis results. Section 8 represents conclusion of the project. II.DESIGN OF MA UNIT Multiply and accumulate (MA) operation that calculates the product of two numbers and adds that product to an accumulator unit. MA, consisting of a unit followed by an adder and an accumulator register that stores the result. The output of the register is fed back to one input of an adder unit, so that on each clock cycle, the output of the is added and stored in register unit is described in fig (1). In MA adder unit is replaced with carry select adder [3] and unit is replaced with Dadda [11]. This modified MA unit is enhanced for applications. opyright to IJIRSET www.ijirset.com 681

MA unit has adder section which uses efficient carry select adder. arry select consists of two sets of Ripple arry Adder (RA) one for carry being zero and another for carry being one and multiplexer which can selects carry input for next stage whether the carry is zero or one. A. BASI 16-BIT ARRY SELET ADDER: Fig.1. Structure of MA unit III. ADDER DESIGN SLA consists of two inputs A and B with 16 bit each. From the figure X its clear that 16 bit input is separately given to RA. Based on the first RA carry output it is fed as the carry input to the next RA and the, sum considered as a direct output. Similarly the carry and 16 bit sum output is received. SLA structure consists of two sets of Ripple arry adders [3][7]. Upper RA for carry in=0 and lower RA for carry in=1 both produces different carry output. arry output of upper and lower RA is connected to multiplexer which can select the carry input for the next stage shown in fig (). A[15:11] B[15:11] A[10:7 B[10:7] A[6:4] B[6:4] A[3:] B[3:] A[1:0] B[`1:0] 5 5 4 3 3 1 1 0 0 0 0 15:11RA 10:7 RA 6:4 RA 3: RA 1:0 RA 15:11RA 10:7 RA 6:4 RA 3: RA 1 1 1 1 10 8 6 4 in out 1:6 10:5 8:4 6:3 3 6 4 3 SUM [15:11] SUM [10:7] SUM [6:4] SUM [3:] 1 SUM [1:0] Fig.. 16-bit arry select Adder B. 16-BIT SLA USING BE: Basic SLA structure can produces more delay and also resource utilization is high because it uses two sets of RA for its operation. To avoid this problem SLA uses Binary to Excess-1(BE) ode converter for its operation [3]&[7]. One RA produces carry in=0 and another arry in=1. Here BE is used instead of RA with in=1 is shown in fig (3). Use of in=0 is avoided due to this delay for each operation is reduced. This can occupy less resource as compared with basic SLA. Remaining operations are of same as that of basic SLA. A[15:11] B[15:11] A[10:7] B[10:7] A[6:4] B[6:4] A[3:] B[3:] A[1:0] B[`1:0] 5 5 4 3 3 1 1 0 0 0 0 15:11 RA 10:7 RA 6:4 RA 3: RA 1:0 RA 6-bit BE 5-bit BE 4-bit BE 3-bit BE in 10 8 6 4 1:6 10:5 8:4 6:3 1 10 3 6 5 4 3 SUM [1:0] out SUM [15:11] SUM [10:7] SUM [6:4] Fig.3.16-bit SUM SLA [3:] using BE Binary to Excess-1 onverter opyright to IJIRSET www.ijirset.com 68

. 16-BIT SLA USING D-LATH The basic function of SLA with BE is Shown in fig(4). It consists of 4-bit BE and 8:4 multiplexer. Multiplexer receives two inputs one for zero and another for one. Zero is direct input for multiplexer and one is BE output. BE performs shift by 1 operation. Multiplexer can selects either direct input or BE output for the next stage. In SLA with D-latch architecture depends upon clock signal [9]. It can produce carry signal only when the clock input is enable otherwise it will not produce output. So no need to give any separate carry inputs for its operation. When En=1 RA will calculate the output for in=1 and store this value in D-latch. When En=0 then the RA will calculate the output for in=0 and stores in D-latch but the output will not change which will be in same state as that of previous stage which is shown in fig(5). This can produces less delay as compared with SLA with RA structure. IV. MULTIPLIER DESIGN Multipliers are more energy consuming elements in processor design. In order to select the efficient components it is possible to reduce the delay as well as resource utilization. In this project describes two s Wallace tree and Dadda. Both techniques are used for reduction of partial product stage in multiplication operation. Fig.4. 4-bit BE operation A[15:11] B[15:11] A[10:7] B[10:7] A[6:4] B[6:4] A[3:] B[3:] A[1:0] B[`1:0] 5 5 4 3 3 1 1 15:11 RA 10:7 RA 6:4 RA 3: RA 1:0 RA D-LATH 10 D-LATH 8 D-LATH 6 D-LATH 4 in out 1:6 5 10:5 SUM [15:11] SUM [10:7] 4 6 8:4 SUM [6:4] 3 6:3 3 SUM [3:] 1 SUM [1:0] Fig.5.16-bit SLA using D-latch A. WALLAE TREE MULTIPLIER opyright to IJIRSET www.ijirset.com 683

Wallace tree is an implementation of adder tree designed mainly for reducing propagation delay for each stage operations. It has follows three stages such as partial product generation stage, compression and reduction. The fig(6) shows the operation of Wallace tree. Here uses (8 8) Wallace tree [4].Multiply each bit of the argument by each bit of the other; Which can generates 8 set of partial products in row order. Depending on position of the bits the wires carry different weights. Reduce the number of partial products by layer of full adder and half adder. In this full adder is implemented using 3: compression technique and half adder is implemented using : compression technique. Group the wires into two and three columns respective of half and full adder and add them using carry propagation adder. It can uses minimum number of carry propagation adder for final reduction stage. In carry propagation method carry of previous stage is added with the sum of next stage. This algorithm can be mainly developed for reducing the propagation delay for each stage compared with existing s techniques. reduction in the same number of levels as required by Wallace tree is shown in fig (7). Dadda tree algorithm[8] follows three levels like Wallace tree such as generation of partial products, compression and reduction Unlike Wallace tree, Dadda algorithm requires more carry look ahead adder at final reduction level thus the operation is faster and delay is less as compared with Wallace tree algorithm. There are four reduction stages takes place such as h = 8,6,4,3 and.. Fig.7. (8 8) Dadda Tree Multiplier Fig.6. (8 8) Wallace Tree Multiplier B. DADDA TREE MULTIPLIER In Wallace tree method the partial products are reduced as soon as possible. In Dadda s tree does the minimum reduction at each level to perform V. MATHEMATIAL ONEPTS USED IN FIR FILTER DESIGN Filters are very important part of signal processing applications. Filters are used for signal separation and for signal restoration. In general filtering is described by simple convolution operation opyright to IJIRSET www.ijirset.com 684

y(n) = x(n)*f(n) = f k x n k k=0 Resource Efficient Reconfigurable Processor for DSP Applications L 1 = k=0 f k x n k (3) = x k f n k (1) k=0 The straight forward way of implementing LTI (Linear Time Invariant) is finite convolution of input series x(n) with impulse response coefficients which is given by y(n) = x(n)*f(n) () Here L is the length of, L is the length of FIR filter, h(n) is filter impulse response coefficients, x(n) is input sequence and y(n) is output of. The above equations can also expressed in Z domain as Y(z) = x(z) H(z) (4) Where H(z) is the transfer function of. X(z) is input filter coefficient. Y(z) is output filter coefficient. VI. DESIGN OF FIR FILTER s are used in signal processing applications. Filter structure consists of delay element, adder and elements. The adder is replaced using carry select adder and is replaced using Dadda tree is shown in fig (8). Fig.8. 4-tap There are two structures are developed one is SLA with BE along with Dadda and another is SLA with D-latch along with Dadda tree. Here X(n) is input coefficient and Y(n) is output filter coefficient. Both can produces less delay as well as consume less resource for its operation. VII. SIMULATION AND SYNTHESIS RESULTS We perform the simulation and synthesis and summarize the results of all adders and s. Functional verification of all the adders and s are performed and these modified architectures are applied in 4-tap finally results are summarized. Fig.9. Simulation output for using SLA using BE and Dadda tree Fig.9 shows the output for 4 tap using SLA with BE and Dadda tree. Here X is 8-bit input coefficient that is multiplied with 8 bit filter coefficients h0, h1, h and h3 produces 16-bit output. Both are sum together and produce filter output Y. Here uses Dadda tree and adder unit uses SLA with BE. Fig.10 shows the output for 4 tap using SLA with D-latch and Dadda tree. Here X is 8-bit input coefficient that is multiplied with 8 bit filter coefficients h0, h1, h and h3 produces 16-bit output. Both are sum together and produce filter output Y. Here uses Dadda tree and adder unit uses SLA with D-latch. opyright to IJIRSET www.ijirset.com 685

TABLE OMPARISON OF MULTIPLIER UNITS Parameters Wallace Tree Multiplier Dadda Tree Multiplier Number of gates used Destination paths 64 56 8867 5943 Delay(ns) 11.531 9.377 From the above mentioned table delay and resource utilization is less in Dadda compared with Wallace tree. TABLE 3 Fig.10. Simulation output for using SLA using D-latch and Dadda tree Parameters OMPARISON OF FIR FILTER STRUTURES Number of gates used Delay(ns) OMPARISON OF ADDER AND MULTIPLIER ARHITETURES After observation of simulation waveforms, synthesis is performed for calculation of delay and area and comparison of adder and architectures are made in terms of area and delay and listed in the below table. From the comparison it s clear that the area and delay is very much less in proposed adder and techniques. These modified units are used in MA which has to be enhanced for applications. Parameters Number of gates used Destination paths TABLE 1 OMPARISON OF ADDER ARHUTETURES Basic SLA SLA using BE SLA using D-latch 30 8 18 66 437 365 Delay(ns) 7.195 7.370 5.984 using SLA- BE and Wallace using SLA- D-latch & Wallace tree using SLA- BE and Dadda tree using SLA- D-latch and Dadda tree 333 16.957 31 15.889 308 13.566 7 1.999 From the above mentioned table delay and resource utilization is less in SLA with BE and SLA with D- latch compared with basic SLA. From the above table it is clear that delay and resource utilization is less in with SLA-BE,D-latch opyright to IJIRSET www.ijirset.com 686

and Dadda compared with with SLA-BE,D-latch and Wallace tree. Resource Efficient Reconfigurable Processor for DSP Applications VIII. ONLUSION Area efficient MA unit for data path unit is designed and are implemented in VHDL using Xilinx 10.1 ISE tool and the results are compared in terms of delay and area. Using MA unit two structures are developed one for SLA with BE along with Dadda tree another for SLA with D-latch along with Dadda tree. The improved MA unit is therefore high speed and efficient for VLSI hardware implementation. REFERENES [1] SohanPurohith, Sai Rahul halamacheti, Martin Margala and WimVanderbauwhede, Throughput/Resource Efficient Reconfigurable processor for Multimedia Applications, IEEE Transactions on VLSI Systems, Vol.1,No.7,013. [] A. Senthilkumar, A.M. Natarajan, S.Subha Design and Implementation of Low Power Digital FIR Filters relying on Data Transition Power Diminution Technique DSP Journal,Volume 8, pp. 1-9, 008. [3] Ram Kumar.B, Harish M Kittur, Low Power And Area Efficient SA, IEEE transactions on VLSI Systems,Vol.0, No.,01 [4] Thapliyal.H, Gobi.N, Kumar.K.K.P, Srinivas.M.B, Low Power Hierarchical Multiplier and arry Look Ahead Architecture, IEEE International onference on omputer Systems and Applications,006. [5]Karunakaran.S, Kasthuri.N, VLSI Implementation of using computational sharing based on high speed carry select adder, American Journals of Applied Sciences,Vol.9.No.1,01 [6] Oklobdzija. V. G, High-Speed VLSI Arithmetic Units: Adders and Multipliers, in Design of High-Performance Microprocessor ircuits, Book edited by A.handrakasan, IEEE Press,000. [7] B. Ramkumar, H.M. Kittur, and P. M. Kannan, ASI implementation of modified faster carry save adder, Eur. J. Sci. Res., vol. 4, no. 1, pp.53 58,010. [8] P.Samundiswary, K.Anitha, Design and analysis of MOS based Dadda IJEM, vol16, issue 6,013. [9] LaxmanShanigarapu, Bhavana P. Shrivastava, Low-Power and High Speed arry Select Adder, IJSRP,volume 3, Issue 8,013. opyright to IJIRSET www.ijirset.com 687