CHAPTER 6 IMPLEMENTATION OF DIGITAL FIR FILTER

CHAPTER 6 IMPLEMENTATION OF DIGITAL FIR FILTER 6.1 INTRODUCTION The digital FIR filters are commo compoets i may digital sigal processig (DSP) systems. There are various applicatios like high speed/low power error cotrol [81], high performace processor [82]. Digital sigal processig algorithms rely heavily o the efficiet computatio of ier products. Very efficiet methods have bee developed for implemetatio of digital filters i FPGAs ad custom ICs. There are several basic structures for FIR filters such as caoical, pipelied ad iverted form. The key fuctioal uits i a digital filter are delay elemets, adders, ad multipliers; out of which multiplier domiates the hardware complexity. It is well kow that by represetig filter coefficiets as sum-ofpowers-of-two (SOPOT), each multiplicatio i filterig ca be replaced with simple shift-ad-add operatios [71-73]. The complexity of the FIR multiplier is domiated by the umber of adders or subtractors employed i the coefficiet multipliers. I this research work, implemetatio of FIR filters is preseted usig a distributed arithmetic scheme with CSD coefficiet represetatio. It is show that the filter coefficiets coded usig CSD are advatageous i terms of area, speed ad power as compared to SOPOT represetatio [74]. I this research work a efficiet coefficiet codig scheme usig CSD represetatio for implemetig FIR filters is proposed. A ew reversible logic based hardware reductio techique based o DA method is also proposed i this research work which is efficiet i terms of device utilizatio ad speed.

6.2 DIGITAL FIR FILTER The processig of digital sigals icludes desig ad implemetatio of systems called filters. Filters are liear time ivariat (LTI) systems; they eed the three elemets to describe digital filter structures i.e. adders, multipliers ad delay elemets. Figure 6.1 shows additio, multiplicatio ad delay computatio for LTI system. Figure 6.2 shows basic setup of FIR filter. x1() x1()+x2() x2() x() a ax() x() 1/z x(-1) Figure 6.1 Additio, multiplicatio ad delay computatio for LTI system Figure 6.2 Basic setup of FIR filter

form, A fiite-duratio impulse respose filter has a system fuctio of the M 1 1 1M H ( z) b b1 z... bm 1z b z (6.1) Hece the impulse respose h () is, b h( ) M 1 else The differece equatio represetatio is give i equatio (6.1), which is a liear covolutio of fiite support. y ) b x( ) b x( 1)... bm x( M 1) (6.2) ( 1 1 The order of the filter is M-1, while the legth of the filter is M. Figure 6.3 shows structure of 4-tap FIR filter whose equatio ca be writte as, 3 H ( z) h z (6.3) Figure 6.3 4-tap FIR filter structure

6.2.1 Direct form of FIR filter The differece equatio of FIR filter is implemeted as a tapped delay lie sice there are o feedback paths. Filter output y []. B( z) H ( z) with iput x [] ad A( z) M k N y[ ] b[ k] x[ k] a[ k] y[ k] (6.4) k1 Direct forms use coefficiets a [k] ad b [k] directly. Direct form 1 ivolves direct implemetatio of differece equatio. It ca be viewed as B(z) followed by 1. Figure 6.4 shows direct form 1 FIR filter structure. A( z) Direct form 2 implemets direct form 2 FIR filter structure. 1 A( z) followed by B (z). Figure 6.5 shows Figure 6.4 Direct form 1 FIR filter structure Figure 6.5 Direct form 2 FIR filter structure

6.2.2 Cascade form of FIR filter The cascade form ca be writte as, H z b b z 1 b z 1M ( ) 1... M 1 (6.5) b1 1 bm 1 1M b (1 z... z ) (6.6) b b b K 1 2 1 B k k z Bk z K M 1,1,2 ); / 2 ( (6.7) 6.2.3 Liear phase form of FIR filter For frequecy-selective filters (e.g., low-pass filters) it is geerally desirable to have a phase respose that is a liear fuctio of frequecy. That is, H ( e jw ) w, w, or 6.8) 2 For a causal FIR filter with impulse over [, M-1] iterval, the liearphase coditios are, h( ) h( M 1 );, M 1 (6.9) h( ) h( M 1 );, M 1 (6.1) 2 y() give as, Cosider the differece equatio with a symmetric impulse respose y ) b x( ) b x( 1)... b x( M 2) b x( M 1) (6.11) ( 1 1 b x( ) x( M 1)] b [ x( 1) x( M 2)]... (6.12) [ 1

6.2.4 Frequecy samplig form of FIR filter I this form the system fuctio H (z) of a FIR filter ca be recostructed from its samples o the uit circle, M 1 z H( k) H ( z) ( ) (6.13) 1 M W z k M M 1 k k M W are the roots ( k,..., M 1) H (k) are the residues ( k,..., M 1) It is also iterestig to ote that the FIR filter described by the above equatio has a recursive form similar to a IIR filter because it cotais both poles ad zeros. writte as, Usig the symmetry properties of DFT ad (W k M ) factor, H k (z) ca be H H H k (6.14) * k k ( z) 1 * 1 1 p1z 1 p1 z cos( H ) z cos( H 2k / M ) 1 2z cos(2k / M ) z 1 * k k 2 Hk 1 2 (6.15) Let, k p1 W exp( j2k / M) (6.16) M cos( 2k / M) jsi(2 k / M) (6.17) H k H exp( jh ) (6.18) k * k H (cos( H ) j si( H )) (6.19) k * k k

The, 1 z H( z) M M H k H() H( M Hk ( z) 1 1 z 1 z / 2) 1 (6.2) L M 1 for M odd; L 1 2 M 2 for M eve. 6.3 FILTER IMPLEMENTATION The trasfer fuctio of digital FIR filter is, M 1 k y ( ) h( k) x( k) (6.21) The zero-phase frequecy respose of a Type 1 liear phase FIR filter ca be expressed as, N H( ) h() 2 h( )cos( ) (6.22) 1 Figure 6.6 shows the proposed digital FIR filter structure ad Figure 6.7 shows the proposed digital FIR filter structure implemeted usig multiplierless techique. Figure 6.6 Proposed digital FIR filter structure

Figure 6.7 Proposed digital FIR filter structure implemeted usig multiplierless techique 6.3.1 Compariso results The implemetatio of FIR filter has bee doe usig reversible gates ad the delay computatio is doe. As show i the Figure 6.8 the delay for the NTG logic is less as compared to the Rgate logic. The NTG logic based FIR filter desiged usig baugh wooley multiplier provides 1% reduced delay as compared to Rgate logic based desig.

Rgate logic NTG logic(proposed) 25 2 Delay(s) 15 1 5 Array Brau Baugh wooley Wallace Desig of 4-tap FIR filter usig reversible logic Figure 6.8 Delay compariso for 4-tap FIR filters 14 Rgate logic NTG logic(proposed) 12 Power cosumptio(mw) 1 8 6 4 2 Array Brau Baugh wooley Wallace Desig of FIR filters usig reversible logic Figure 6.9 Power cosumptio compariso for 4-tap FIR filters

The desiged reversible multiplier structures are used to implemet the FIR filter. It is show i Figure 6.9 that the power cosumptio i the FIR filters usig Rgate logic is more as compared to proposed logic. Reversible cocept adds advatage to the filter desig i terms of lower power computatio. The NTG logic based FIR filter desiged usig wallace tree multiplier have 17% improvemet i power as compared to the FIR filter usig other logic. Figures 6.1-6.15 are showig compariso results illustratig umber of LUTs used, umber of gates used, umber of occupied slices, delay, power cosumptio ad power delay product (PDP) respectively for 4-tap FIR filter desiged usig various add ad shift multiplier techiques implemeted usig reversible logic. The optimizatio results are also compared by implemetig the multipliers used i digital FIR filters usig VLSI stregth reductio techiques. 4.5x1 2 Proposed logic Ragte logic 3 3 3 Number of LUTs used 4.x1 2 3.5x1 2 3 3 3 3 3 3 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers 3 Figure 6.1 Compariso results for umber of LUTs used i desig of 4-tap FIR filters usig add ad shift multipliers

4.x1 3 3.8x1 3 Proposed logic Rgate logic Total umber of gates used 3.6x1 3 3.4x1 3 3.2x1 3 3.x1 3 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers Figure 6.11 Compariso results for umber of gates used i desig of 4-tap FIR filters usig add ad shift multipliers 3.5x1 2 Number of slices used 3.x1 2 2.5x1 2 Proposed logic Rgate logic 2.x1 2 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers Figure 6.12 Compariso results for umber of slices used i desig of 4-tap FIR filters usig add ad shift multipliers

34 Proposed logic Rgate logic Delay(s) 32 3 28 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers Figure 6.13 Delay compariso results for desig of 4-tap FIR filters usig add ad shift multipliers 8 78 76 Proposed logic Rgate logic Power cosumptio(mw) 74 72 7 68 66 64 62 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers Figure 6.14 Power cosumptio results for desig of 4-tap FIR filters usig add ad shift multipliers

2.1x1 4 2.1x1 4 Proposed logic Rgate logic 2.x1 4 Power delay product 2.x1 4 2.x1 4 1.9x1 4 1.9x1 4 1.8x1 4 1.8x1 4 1.7x1 4 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers Figure 6.15 Compariso of power delay product for desig of 4-tap FIR filters usig add ad shift multipliers 3.7x1 2 3.6x1 2 Rgate logic Proposed logic 3.6x1 2 Number of LUTs used 3.5x1 2 3.5x1 2 3.5x1 2 3.4x1 2 3.4x1 2 3.3x1 2 3.3x1 2 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig SE as stregth reductio Figure 6.16 Compariso results for umber of LUTs used i desig of 4-tap FIR filters implemeted usig SE as stregth reductio

3.x1 3 3.x1 3 Rgate logic Proposed logic Number of gates used 3.x1 3 2.9x1 3 2.9x1 3 2.8x1 3 2.8x1 3 2.7x1 3 2.6x1 3 2.6x1 3 2.5x1 3 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig SE as stregth reductio Figure 6.17 Compariso results for umber of gates used i desig of 4-tap FIR filters implemeted usig SE as stregth reductio 2.x1 2 2.x1 2 Rgate logic Proposed logic 1.9x1 2 Number of slices used 1.9x1 2 1.8x1 2 1.8x1 2 1.7x1 2 1.6x1 2 1.6x1 2 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig SE as stregth reductio Figure 6.18 Compariso results for umber of occupied slices used i desig of 4-tap FIR filters implemeted usig SE as stregth reductio

3.x1 1 Rgate logic Proposed logic 2.9x1 1 2.9x1 1 Delay(s) 2.8x1 1 2.8x1 1 2.7x1 1 2.6x1 1 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig SE as stregth reductio Figure 6.19 Delay compariso for 4-tap FIR filters implemeted usig SE as stregth reductio 6.4x1 2 6.2x1 2 Rgate logic Proposed logic Power cosumptio(mw) 6.x1 2 5.8x1 2 5.6x1 2 5.4x1 2 5.2x1 2 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig SE as stregth reductio Figure 6.2 Power cosumptio compariso for 4-tap FIR filters implemeted usig SE as stregth reductio

2x1 4 Rgate logic Proposed logic 2x1 4 Power delay product 2x1 4 2x1 4 1x1 4 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig SE as stregth reductio Figure 6.21 Compariso of power delay product for 4-tap FIR filters implemeted usig SE as stregth reductio 3.8x1 2 3.8x1 2 3.7x1 2 Rgate logic Proposed logic Number of LUTs used 3.6x1 2 3.6x1 2 3.5x1 2 3.5x1 2 3.5x1 2 3.4x1 2 3.4x1 2 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig LT as stregth reductio Figure 6.22 Compariso results for umber of LUTs used i desig of 4-tap FIR filters implemeted usig LT as stregth reductio

3.3x1 3 3.2x1 3 Rgate logic Proposed logic Number of gates used 3.1x1 3 3.1x1 3 3.x1 3 3.x1 3 3.x1 3 2.9x1 3 2.9x1 3 2.8x1 3 2.8x1 3 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig LT as stregth reductio Figure 6.23 Compariso results for umber of gates used i desig of 4-tap FIR filters implemeted usig LT as stregth reductio 2.x1 2 2.x1 2 Rgate logic Proposed logic Number of slices used 2.x1 2 1.9x1 2 1.9x1 2 1.8x1 2 1.8x1 2 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig LT as stregth reductio Figure 6.24 Compariso results for umber of occupied slices used i desig of 4-tap FIR filters implemeted usig LT as stregth reductio

3.x1 1 2.9x1 1 Rgate logic Proposed logic 2.9x1 1 Delay(s) 2.8x1 1 2.8x1 1 2.7x1 1 2.6x1 1 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig LT as stregth reductio Figure 6.25 Delay compariso for 4-tap FIR filters implemeted usig LT as stregth reductio Power cosumptio(mw) 6.4x1 2 6.2x1 2 6.x1 2 5.8x1 2 5.6x1 2 Rgate logic Proposed logic 5.4x1 2 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig LT as stregth reductio Figure 6.26 Power cosumptio compariso for 4-tap FIR filters implemeted usig LT as stregth reductio

1.9x1 4 1.8x1 4 1.8x1 4 Rgate logic Proposed logic Power delay product 1.7x1 4 1.6x1 4 1.6x1 4 1.6x1 4 1.5x1 4 1.5x1 4 1.4x1 4 CSM CSD Radix-4(3a) Booth 4-tap FIR filter desig usig add ad shift multipliers by implemetig LT as stregth reductio Figure 6.27 Compariso of power delay product for 4-tap FIR filters implemeted usig LT as stregth reductio Thus, it is show that stregth reductio promises further complexity reductio ad gives better results i terms of delay, power cosumptio ad overall performace metric i.e. power delay product. Figures 6.16-6.21 are showig the compariso results illustratig umber of LUTs used umber of gates used, umber of occupied slices, delay, power cosumptio ad PDP respectively amog 4-tap FIR filter implemeted usig subexpressio elimiatio. Figures 6.22-6.27 are showig the compariso results illustratig device utilizatio, delay, power cosumptio ad PDP amog 4-tap FIR filter implemeted usig liear trasformatio approach of stregth reductio repectively.

6.4 DA BASED IMPLEMENTATION OF DIGITAL FIR FILTER The computatioal complexity of fiite impulse respose (FIR) filters used i the sigal processig blocks is domiated by the umber of adders or subtractors employed i the multipliers. The use of software defied radio (SDR) techology is predicted to replace may of the traditioal methods for implemetig trasmitters ad receivers while offerig a wide rage of advatages icludig adaptability, recofigurability, ad multifuctioality ecompassig modes of operatio, radio frequecy bads, air iterfaces, ad waveforms. Research i this field is maily directed towards improvig the architecture ad the computatioal efficiecy of SDR systems. The most computatioally itesive part of a SDR receiver is the chaelizer sice it operates at the highest samplig rate. 6.4.1 Distributed arithmetic I order to reduce the hardware requiremet ad to implemet of large filters with high throughput distributed arithmetic [85] is ofte preferred. Distributed arithmetic (DA) is a multiplicatio free method applicable to fixed-poit data, ad is based o table lookups of pre-calculated partial products [86]. Also, DA filters achieve these advatages while retaiig full precisio, ulike filters usig reduced sums ad differeces of powers of two. Figure 6.28 illustrates basic cocept of DA. DA provides multiplier free multiplicatio by usig bit serial computatio by storig all possible combiatio sums of filter weights i LUT. Distributed arithmetic is a reliable optio for low power applicatios because it allows replacemet of costly multiplies operatios with shifts ad table lookups [86]. Today s major cocer is battery lifetime of portable electroics as more fuctioality is icorporated ito these devices. Therefore, low-power circuits for sigal processig applicatios are i demad. The sigal processig fuctios employed i these devices iclude fiite-impulse respose (FIR) filters, discrete cosie trasforms (DCTs), ad discrete Fourier trasforms (DFTs). The commo feature of these fuctios is that they are all based o the ier product. The digital sigal processig applicatios are preseted i [87]. Digital sigal processig (DSP)

implemetatios typically make use of multiply-accumulate (MAC) uits for calculatio of these operatios, ad the computatio time icreases liearly as the legth of the iput vector grows. Figure 6.28 Basic cocept of distributed arithmetic 6.4.2 SOPOT represetatio of filter coefficiets The geeral represetatio of sum-of-powers-of-two (SOPOT) terms [72, 73] for the i th filter coefficiet is, B 1 a ij h i 2 (6.24) j where B is the umber of digits i the power-of-two represetatio. The expressio for h i ca be writte as, B1 B1 a a a c io ij io aio ij h i 2. 2 2. 2 (6.24) j j where c ij a ij a io. The term a io is kow as the upper limit value shift. The bracketed term is kow as the ormalized value ( value). The shift ad the

ormalized value are aalogous to the expoet ad matissa i true floatig poit represetatios. Cosider two 2 s complemet W-bit umbers X ad Y, W 1 i 1 1i 2 i1 X x W x W (6.25) W 1 i1 Y -y W 2-1 i y W 1i (6.26) The (2W - 1)-bit ideal product P I ca be expressed as, P I MP LP (6.27) W 1 2 W 2 p2 W 2i i (6.28) i1 MP p 2 2W 2 iw LP p (6.29) i 2W 2i 2 ( W 1) PQ MP Q 2 (6.3) 6.4.3 Example system - Digital dow coverter Software radio receivers [88] require mixig, filterig ad dow samplig of received sigals to allow data to be processed at a suitable rate. Part of this process ca be achieved i FPGAs usig a digital dow coverter (DDC). Mixig the icomig real sigal from the aalog to digital coverters (ADC s) to extract the complex sigal, a DDC must filter the complex sigal to reject image compoets itroduced by the mixig process ad the dow samplig is doe. DDC s are most commoly implemeted i logic i field-programmable gate arrays or applicatiospecific itegrated circuits. While software implemetatios are also possible, operatios i the DDC, multipliers ad iput stages of the low-pass filters all ru at

the samplig rate of the iput data. This data is commoly take directly from ADC samplig at tes or hudreds of MHz, which is beyod the real time computatioal capabilities of software processors. For maximum software radio flexibility, the ADC, mixer ad filters should sample as quickly as possible. Figure 6.29 shows the architecture of DDC. Multiplicatio of the itermediate frequecy with the iput sigal creates images cetered at the sum ad differece frequecy (which follows from the frequecy shiftig properties of the fourier trasform). The low-pass filters pass the differece (i.e. basebad) frequecy while rejectig the sum frequecy image, resultig i a complex basebad represetatio of the origial sigal. I its ew form, it ca readily be dow sampled ad is more coveiet to may DSP algorithms. The sample frequecy is ow much higher tha required for the maximum frequecy i frequecy bad ad so the sample frequecy ca be reduced or decimated, without ay loss of iformatio. Hece, if the DDC is implemeted o a FPGA, full-parallel techiques ca be used to reach the required samplig rates. The calculatio of low pass filter coefficiets for DDC specificatios used i this research work are calculated usig MATLAB, where samplig frequecy is 2MHz, cutoff frequecy is 4Mhz ad atteuatio bad of 6dB is desiged usig kaiser widow. Tables 6.1 ad 6.2 represet the filter coefficiet values for 4-tap ad 8-tap filter respectively. The phase ad magitude respose of 4-tap ad 8-tap filters are show i Figures 6.3 ad 6.31 respectively. Table 6.1 Calculated 4-tap filter coefficiets Coefficiet Value H().1691 H(1).33998 H(2).33998 H(3).1691

Table 6.2 Calculated 8-tap filter coefficiets Coefficiet Value H() -.8265 H(1) H(2).22832 H(3).379824 H(4).379824 H(5).22832 H(6) H(7) -.8265 Figure 6.29 A digital dow coverter architecture with low pass filter

(a) (b) Figure 6.3 (a) 4-tap low pass FIR filter magitude respose (b) 4-tap low pass FIR filter phase respose

(a) (b) Figure 6.31 (a) 8-tap low pass FIR filter magitude respose (b) 8-tap low pass FIR filter phase respose

6.4.4 Distributed arithmetic based filterig scheme Distributed Arithmetic was first brought up by Croisier [89], ad was exteded to cover the siged data system. The it was itroduced ito FPGA desig to save MAC blocks with the developmet of FPGA techology. High performace FIR filters based o DA usig LMS architecture are implemeted i [9, 91]. If h[] is the filter coefficiet ad x[] is the iput sequece to be processed, the N-legth FIR filter ca be described as, 1 ] [ ] [, N x h x h y (6.31) Distributed Arithmetic is itroduced ito the desig of FIR filters as follows. I the two's complemet system, x[] ca be described as, ] [ 2 ] [ 2 ] [ 1 x x x b B b b B B (6.32) Substitute equatio (6.31) ito equatio (6.33) yields, 1 1 ] [ 2 ] [ ] [ ] [ 2 B b b N b B B x h h x y (6.33) The equatio (6.33) ca be chaged ito aother form, 1 1 1 1 ] [ ] [ 2 ] [ 2 ] [ N b B b b b B b N b x h x h (6.34) Substitutig equatio (6.34) ito equatio (6.33) yields to the fial form of distributed arithmetic, 1 1 ] [ ] [ 2 ] [ ] [ 2 N N b B b b B B x h h x y (6.35)

N 1 The values of h [ ] x b [ ] ca be coserved ito a LUT uit ad the relevat value ca be called out accordig to the iput data to save MAC blocks. The, the weighted sum of result is B1 b 2 N 1 b N N 1 h[ ] x b [ ] is calculated through shift registers ad the h [ ] x [ ]. I siged system, the siged bit should be take ito b B cosideratio so 2 [ ] h[ ] is also added. As a result, the fial form of x B distributed arithmetic is defied as equatio (6.35) ad the implemetatio ca be achieved o FPGA through LUT uits. Figure 6.32 shows the DA based filterig scheme usig reversible logic. Figure 6.32 Cocept of distributed arithmetic filterig 6.4.5 Proposed DA based filterig scheme usig multiplexer The basic LUT-DA scheme o a FPGA would cosist of three mai compoets- the iput registers, the 4-iput LUT uit ad the shifter/accumulator uit. Additioally, it would require a cotrol uit to maipulate the filter operatio, ad a adder tree uit to perform additio of partial filter results. Applyig this approach, the 4-iput LUT uit will ot be directly accessed istead 2-iput LUT is

used based o multiplexer select. The use of multiplexer icorporates savigs i terms of adder offsets ad results i a overall improvemet i area ad performace metric. The proposed architecture achieves high throughput ad low complexity i two ways as explaied i Figure 6.33. Firstly, by usig multiplexer based distributed arithmetic which allows the accumulatio to complete usig oly a sigle accumulator ad secodly, parallelizatio of the coefficiet computatios for higher tap filter. The combiatio of the multiplexer with distributed arithmetic helps to compute higher filter taps without icreasig the complexity of filter. Thus, the switchig logic for both the iput samples ad the coefficiets becomes faster. I additio, sice the lookup tables cotai all the possible combiatios of the coefficiets. The coefficiets are stored i a already multiplexed fashio so extra logic is ot required for computatio. Figure 6.33 Cocept of multiplexer based DA filterig scheme The multiplexer based DA filterig usig reversible logic is show i Figure 6.34. The particular 2-iput LUT is selected which represets all the possible sum combiatios of filter coefficiets. It implies about 5% reductio i the umber of LUT used with icreased speed. To evaluate the performace of the proposed scheme, 4-tap ad 8-tap low pass FIR filters are implemeted usig VHDL ad sythesis is carried out i Xilix-ISE8.1i. The results are compared with Rgate

logic [41,42] based DA implemetatio usig SOPOT ad CSD method for coefficiet represetatio. Figure 6.34 Multiplexer based DA filterig scheme usig proposed logic 6.4.6 Compariso results The evaluatio of device utilizatio usig proposed DA architecture ca be explaied easily with the help of the results i graphs show below. Also, it is clear that CSD represetatio gives better results as compared to SOPOT represetatio. Figures A 3.1-A 3.14 of Appedix 2 are showig the simulatio results. Figure 6.35 reports the compariso of umber of LUTs used amog the various filter architectures desiged usig Rgate [41, 42] based DA ad proposed multiplexer based DA method. It is show that proposed multiplexer based DA filter usig NTG logic compreheds the existig DA based low pass FIR filter usig Rgate logic ad the umber of LUTs are reduced by 3% i case of 4-tap low pass FIR filter desig ad this reductio is 25% i case of 8-tap low pass FIR filter

desig, which is due to the fact that the multiplexer based techique i DA has reduced size of LUTs so device utilizatio is improved. Also, it is clear that CSD coefficiets represetatio gives better device utilizatio. Figure 6.36 plots the compariso of umber of occupied slices amog various filter architectures desiged usig Rgate [41, 42] based DA ad proposed method. It is show that the proposed multiplexer based DA filter usig NTG logic has 45% reduced umber of occupied slices as compared to existig DA based filter usig Rgate logic i case of 4-tap low pass FIR filter desig ad this reductio is 4% i case of 8-tap low pass FIR filter desig o a average for both SOPOT ad CSD coefficiet represetatio methods. 45 4 SOPOT coefficiets CSD coefficiets 35 Number of LUTs used 3 25 2 15 1 5 4-tap DA 4-tap Proposed Mux based DA 8-tap DA 8-tap Proposed Mux based DA Desig of FIR filter usig reversible logic Figure 6.35 Compariso of umber of LUTs used amog filter architectures Figure 6.37 reports the compariso of umber of gates used amog various filter architectures desiged usig Rgate [41, 42] based DA ad proposed gate based DA method. It is show that the proposed architecture has 4% lesser umber of gates i case of 4-tap low pass filter desig ad umber of gates are 32% lesser i case of 8-tap low pass filter desig as compared to existig scheme to desig DA based FIR filter.

3 25 SOPOT coefficiets CSD coefficiets Number of occupied slices 2 15 1 5 4-tap DA 4-tap Proposed Mux based DA 8-tap DA 8-tap Proposed Mux based DA Desig of FIR filter usig reversible logic Figure 6.36 Compariso of umber of occupied slices amog filter architectures 35 3 SOPOT coefficiets CSD coefficiets Number of gates used 25 2 15 1 5 4-tap DA 4-tap Proposed Mux based DA 8-tap DA 8-tap Proposed Mux based DA Desig of FIR filter usig reversible logic Figure 6.37 Compariso of umber of gates used amog filter architectures

5 SOPOT coefficiets CSD coefficiets 4 Delay(s) 3 2 1 4-tap DA 4-tap Proposed Mux based DA 8-tap DA 8-tap Proposed Mux based DA Desig of FIR filters usig various architectures Figure 6.38 Delay compariso amog filter architectures Figure 6.38 represets the delay compariso for 4-tap ad 8-tap filter desiged usig Rgate [41,42] based DA ad proposed gate based DA method. The proposed method outperforms by 15% speed improvemet. Figure 6.39 represets power calculatio results. Though the proposed method has bit higher power cosumptio as compared to Rgate [41, 42] based DA method because of the multiplexer based LUT which icreases the umber of look up access ad thus switchig. It is show that usig CSD coefficiets the power cosumptio is reduced as compared to SOPOT coefficiets for filter desig because CSD computatio requires lesser umber of 1 s. Figure 6.4 compares the delay ad power cosumptio for existig DA based 4-tap FIR filter ad proposed DA based 4-tap FIR filter usig CSD coefficiets. It is show that 14% improvemet i delay is achieved usig proposed method. Similar results are obtaied for 8-tap FIR filter. Though the proposed method has 1% higher power cosumptio as compared to existig DA based method because of icrease i switchig activity with icrease i umber of accesses per look up table. However, overall performace metric (i.e. power, delay

ad umber of gates product) for proposed desig is improved by 45% as show i Figure 6.41. 6 5 SOPOT coefficiets CSD coefficiets Power cosumptio(mw) 4 3 2 1 4-tap DA 4-tap Proposed Mux based DA 8-tap DA 8-tap Proposed Mux based DA Desig of FIR filters usig various architectures Figure 6.39 Power cosumptio compariso amog filter architectures 3.6x1-8 4 Delay 3.4x1-8 3.2x1-8 3.x1-8 2.8x1-8 2.6x1-8 Delay Power cosumptio 35 3 25 Power cosumptio(mw) 2.4x1-8 Existig DA based 4-tap FIR filter Proposed DA based 4-tap FIR filter 2 Figure 6.4 Delay ad power cosumptio compariso amog filter architectures

Power, delay ad umber of gates product(w-sec) 1.4x1-5 1.2x1-5 1.x1-5 8.x1-6 6.x1-6 4.x1-6 2.x1-6. Existig DA based 4-tap FIR filter Proposed DA based 4-tap FIR filter Figure 6.41 Power, delay ad umber of gates product performace amog filter architectures 6.5 SUMMARY A efficiet reversible logic based DA scheme is preseted which is used to implemet FIR filters usig SOPOT ad CSD represetatio for the filter coefficiets. The device utilizatio of the proposed architecture is relatively less sice it uses split LUT techique with multiplexer select logic. The 4-tap ad 8-tap FIR filters desiged i this research work ca be exteded eve for more taps. A high speed ad less area implemetatio is achieved. The simulatio results idicate that the desiged filter usig proposed distributed arithmetic ca work stable with high speed ad ca save almost 4 percet hardware resources. Meawhile, it is very easy to trasplat the filter to other applicatios through modifyig the order of filter ad other parameters, ad therefore they have great practical applicatios i digital sigal processig. This research work discusses FPGA implemetatio of fiite impulse respose (FIR) filters usig their applicatio i digital dow coverters (DDCs) for software radio as a backgroud based o reversible logic which has fault detectio property. The implemetatio is based o distributed arithmetic (DA) which substitute multiply ad accumulate operatios

with a series of look-up-table (LUT) accesses. Caoical siged digit (CSD) represetatio is used for the coefficiets ad it is compared with sum-of-powers-oftwo (SOPOT) techique of coefficiets represetatio. The proposed DDC is implemeted i VHDL ad verified via simulatio. The proposed method offers average reductio of 3% i the umber of LUTs, 42% reductio i occupied slices ad 38% reductio i the umber of gates eeded for low pass FIR filter implemetatio method. The proposed desig shows 14% reductio i delay as compared to Rgate logic based DA architecture. Though there is power trade off but there is sigificat improvemet i overall performace of FIR filter with 4% reduced hardware resources.