OPTIMIZATION OF RNS FIR FILTERS FOR 6-INPUTS LUT BASED FPGAS

Similar documents
Reducing Power Dissipation in Complex Digital Filters by using the Quadratic Residue Number System Λ

High Speed Area Efficient Modulo 2 1

A Comparative Study on LUT and Accumulator Radix-4 Based Multichannel RNS FIR Filter Architectures

Reconfigurable architecture of RNS based high speed FIR filter

CHAPTER 6 IMPLEMENTATION OF DIGITAL FIR FILTER

Design of FPGA- Based SPWM Single Phase Full-Bridge Inverter

Single Bit DACs in a Nutshell. Part I DAC Basics

Modulo 2 n +1 Arithmetic Units with Embedded Diminished-to-Normal Conversion

FPGA Implementation of the Ternary Pulse Compression Sequences

Some Modular Adders and Multipliers for Field Programmable Gate Arrays

A New Design of Log-Periodic Dipole Array (LPDA) Antenna

ELEC 204 Digital Systems Design

Delta- Sigma Modulator with Signal Dependant Feedback Gain

CHAPTER 5 A NEAR-LOSSLESS RUN-LENGTH CODER

APPLICATION NOTE UNDERSTANDING EFFECTIVE BITS

CP 405/EC 422 MODEL TEST PAPER - 1 PULSE & DIGITAL CIRCUITS. Time: Three Hours Maximum Marks: 100

Outline. Motivation. Analog Functional Testing in Mixed-Signal Systems. Motivation and Background. Built-In Self-Test Architecture

Design of FPGA Based SPWM Single Phase Inverter

Combined Scheme for Fast PN Code Acquisition

High-Order CCII-Based Mixed-Mode Universal Filter

Logarithms APPENDIX IV. 265 Appendix

Chapter 3 Digital Logic Structures

Design of Area and Speed Efficient Modulo 2 n -1 Multiplier for Cryptographic Applications

COMPRESSION OF TRANSMULTIPLEXED ACOUSTIC SIGNALS

Lossless image compression Using Hashing (using collision resolution) Amritpal Singh 1 and Rachna rajpoot 2

INCREASE OF STRAIN GAGE OUTPUT VOLTAGE SIGNALS ACCURACY USING VIRTUAL INSTRUMENT WITH HARMONIC EXCITATION

A Comparison on FPGA of Modular Multipliers Suitable for Elliptic Curve Cryptography over GF(p) for Specific p Values

x y z HD(x, y) + HD(y, z) HD(x, z)

Design of modulo 2 n -1 multiplier Based on Radix-8 Booth Algorithm using Residue Number System

A Novel Three Value Logic for Computing Purposes

Intermediate Information Structures

Data Acquisition System for Electric Vehicle s Driving Motor Test Bench Based on VC++ *

Delta- Sigma Modulator based Discrete Data Multiplier with Digital Output

Survey of Low Power Techniques for ROMs

Sensors & Transducers 2015 by IFSA Publishing, S. L.

7. Counting Measure. Definitions and Basic Properties

SEVEN-LEVEL THREE PHASE CASCADED H-BRIDGE INVERTER WITH A SINGLE DC SOURCE

A New 3-Bit Integrating Time to Digital Converter Using Time to Voltage Conversion Technique

AkinwaJe, A.T., IbharaJu, F.T. and Arogundade, 0.1'. Department of Computer Sciences University of Agriculture, Abeokuta, Nigeria

Tehrani N Journal of Scientific and Engineering Research, 2018, 5(7):1-7

Massachusetts Institute of Technology Dept. of Electrical Engineering and Computer Science Fall Semester, Introduction to EECS 2.

A New Basic Unit for Cascaded Multilevel Inverters with the Capability of Reducing the Number of Switches

Application of Improved Genetic Algorithm to Two-side Assembly Line Balancing

History and Advancement of the Family of Log Periodic Toothed Planer Microstrip Antenna

PRACTICAL ANALOG DESIGN TECHNIQUES

Analysis of SDR GNSS Using MATLAB

Encode Decode Sample Quantize [ ] [ ]

Cascaded Feedforward Sigma-delta Modulator for Wide Bandwidth Applications

AN ESTIMATION OF MULTILEVEL INVERTER FED INDUCTION MOTOR DRIVE

A study on the efficient compression algorithm of the voice/data integrated multiplexer

Compound Controller for DC Motor Servo System Based on Inner-Loop Extended State Observer

THE LUCAS TRIANGLE RECOUNTED. Arthur T. Benjamin Dept. of Mathematics, Harvey Mudd College, Claremont, CA Introduction

A SIMPLE METHOD OF GOAL DIRECTED LOSSY SYNTHESIS AND NETWORK OPTIMIZATION

Analysis and Optimization Design of Snubber Cricuit for Isolated DC-DC Converters in DC Power Grid

The Detection of Abrupt Changes in Fatigue Data by Using Cumulative Sum (CUSUM) Method

PRACTICAL FILTER DESIGN & IMPLEMENTATION LAB

A SIMPLE METHOD OF GOAL DIRECTED LOSSY SYNTHESIS AND NETWORK OPTIMIZATION

Run-Time Error Detection in Polynomial Basis Multiplication Using Linear Codes

A SELECTIVE POINTER FORWARDING STRATEGY FOR LOCATION TRACKING IN PERSONAL COMMUNICATION SYSTEMS

Fault Diagnosis in Rolling Element Bearing Using Filtered Vibration and Acoustic Signal

AME50461 SERIES EMI FILTER HYBRID-HIGH RELIABILITY

Sampling. Introduction to Digital Data Acquisition: Physical world is analog CSE/EE Digital systems need to

CHAPTER 8 JOINT PAPR REDUCTION AND ICI CANCELLATION IN OFDM SYSTEMS

Department of Electrical and Computer Engineering, Cornell University. ECE 3150: Microelectronics. Spring Due on April 26, 2018 at 7:00 PM

LETTER A Novel Adaptive Channel Estimation Scheme for DS-CDMA

Joint Power Allocation and Beamforming for Cooperative Networks

On Parity based Divide and Conquer Recursive Functions

Functional Testing. approach. In this paper, we discuss the effect of phase delay on the. The mixed-signal BIST architecture, illustrated in Fig.

A New Space-Repetition Code Based on One Bit Feedback Compared to Alamouti Space-Time Code

Super J-MOS Low Power Loss Superjunction MOSFETs

Chapter 1 The Design of Passive Intermodulation Test System Applied in LTE 2600

Adaptive Resource Allocation in Multiuser OFDM Systems

PROJECT #2 GENERIC ROBOT SIMULATOR

NOISE IN A SPECTRUM ANALYZER. Carlo F.M. Carobbi and Fabio Ferrini Department of Information Engineering University of Florence, Italy

COS 126 Atomic Theory of Matter

DIGITALLY TUNED SINUSOIDAL OSCILLATOR USING MULTIPLE- OUTPUT CURRENT OPERATIONAL AMPLIFIER FOR APPLICATIONS IN HIGH STABLE ACOUSTICAL GENERATORS

A Low Spurious Level Fractional-N Frequency Divider Based on a DDS-like Phase Accumulation Operation

Optimal Arrangement of Buoys Observable by Means of Radar

A GHz Constant KVCO Low Phase Noise LC-VCO and an Optimized Automatic Frequency Calibrator Applied in PLL Frequency Synthesizer

Summary of Random Variable Concepts April 19, 2000

A Bipolar Cockcroft-Walton Voltage Multiplier for Gas Lasers

Density Slicing Reference Manual

Enhanced LUT For Modified Distributed Arithematic Architecture - FIR Filter

Design and Implementation of Vedic Algorithm using Reversible Logic Gates

HDL LIBRARY OF PROCESSING UNITS FOR GENERIC AND DVB-S2 LDPC DECODING

Comparison of Frequency Offset Estimation Methods for OFDM Burst Transmission in the Selective Fading Channels

Spread Spectrum Signal for Digital Communications

Problem of calculating time delay between pulse arrivals

Cross-Entropy-Based Sign-Selection Algorithms for Peak-to-Average Power Ratio Reduction of OFDM Systems

FLEXIBLE ADC: A DITHER AND OVERSAMPLING BASED SOLUTION TO IMPROVE THE PERFORMANCE OF ADC SYSTEMS

A Simplified Method for Phase Noise Calculation

Objectives. Some Basic Terms. Analog and Digital Signals. Analog-to-digital conversion. Parameters of ADC process: Related terms

信號與系統 Signals and Systems

R. W. Erickson. Department of Electrical, Computer, and Energy Engineering University of Colorado, Boulder

A 5th order video band elliptic filter topology using OTRA based Fleischer Tow Biquad with MOS-C Realization

A Reduced Complexity Channel Estimation for OFDM Systems with Precoding and Transmit Diversity in Mobile Wireless Channels Hlaing Minn, Dong In Kim an

信號與系統 Signals and Systems

Implementation of OFDM wireless communication model for achieving the improved BER using DWT-OFDM

A Novel Small Signal Power Line Quality Measurement System

Importance Analysis of Urban Rail Transit Network Station Based on Passenger

Transcription:

OPTIMIZATION OF RNS FIR FILTERS FOR 6-INPUTS LUT BASED FPGAS G.C. Cardarilli, M. Re, A. Salsao Uiversity of Rome Tor Vergata Departmet of Electroic Egieerig Via del Politecico 1 / 00133 / Rome / ITAL {marco.re, g.cardarilli}@ieee.org salsao@ig.uiroma2.it S. Potarelli (ASI) Italia Space Agecy Viale Liegi 26 00198 Rome, ITAL potarelli@ig.uiroma2.it ABSTRACT I this paper optimized Residue Number System (RNS) arithmetic blocks to better exploit some of the architectural characteristics of the last geeratio FPGAs are preseted. The implemetatio of modulo m adders, costat ad geeral multipliers, iput ad output coverters is preseted. These architectures are based o moduli sets chose i order to optimally use the six iputs Look-Up Tables (LUTs) available i the Complex Logic Blocks (CLBs) of the ew geeratio FPGAs. Experimets based o the implemetatio of Fiite Impulse Respose (FIR) filters characterized by differet umber of taps ad wordlegths shows that the use the RNS together with suitable moduli sets optimally fits the six iputs LUTs of the last geeratio FPGAs architectures. 1. INTRODUCTION The silico itegrated circuits tred is characterized by a steady reductio i the feature size combied with a steady rise i desity ad speed as show i [1]. I the last twety years FPGAs evolved rapidly i terms of complexity ad architecture startig from the first FPGA, the ilix C2064 chip with its 1,000 gates of complexity [2] to the ewest geeratios. The major evolutio was related to the structure of the itercoect, the topology of the basic cell (LE i.e. the Logic Elemet), ad the itroductio of full custom processig elemets such as multipliers, hardware processor cores, MAC uits, ad very high speed serial I/O blocks. Oe of the last iovatio i the FPGAs architecture, has bee the itroductio of 6-iputs LUTs as the mai block for the implemetatio of combiatorial fuctios [4], [3]. Moreover, chages i the FPGAs architecture require chages i the sythesis algorithms i order to guaratee a optimum mappig o the available resources. I this paper, the use of a RNS represetatio based o suitable moduli sets is used to optimally implemet the basic arithmetic operators by usig six-iputs LUTs. I particular it is show that the use of moduli that are represeted by five bits offers the best results i terms of used resources ad delay. For this reaso the moduli set that has bee used used for the sythesis experimets is composed by five bits moduli ad the bigger modulo has bee choses as a power of two. I this way dyamic rages of up to 34 bits are obtaied. The paper is orgaized as follows: i Sectio II a backgroud o the RNS arithmetic is give. I Sectio III architectures ad performace of 6-iputs LUT based implemetatios of modulo m arithmetic operators such as adders, costat multipliers ad geeral multipliers are discussed. Sectio IV illustrates the implemetatio of the iput ad output coverters, while i Sectio V a set of experimets based o the implemetatio of FIR filters are show discussig the obtaied area ad speed results. Coclusio are draw i Sectio VI. 2. BACKGROUND ON RESIDUE NUMBER SSTEM A Residue Number System (RNS) is defied by a set of relatively prime itegers: {m 1, m 2,..., m P } The dyamic rage of the system is give by the product of the moduli m i M = P i=1 Ay iteger [0, M 1] has a uique RNS represetatio give by m i RNS ( m1, m2,..., mp ) (1) where mi = mod m i A comprehesive descriptio of the RNS theory ad its applicatio to computer systems ca be foud i [6], [7], ad [8]. I the RNS represetatio, operatios, such as additio ad multiplicatio, are executed i parallel o the differet moduli

Z = op RNS Z m1 = m1 op m1 m1... Z mp = mp op mp mp (2) where eq. (2) is valid if the fial results prior the coversio i the two s complemet represetatio (TCS) belogs to the rage [0, M 1]. The coversio of Z i TCS is accomplished by the Chiese Remaider Theorem (CRT) Clearly, coversios the from the biary represetatio to RNS, ad vice-versa, costitute a overhead for systems based o the RNS represetatio. However, efficiet methods to perform those coversios have bee preseted i [9], [10], ad [11]. The iput coversio is obtaied by the reductio modulo m i of the iput samples x(), providig the residue digits x mi. The mod. m i RNS filters compute the residues y mi defied i eq. (4), while the output coversio based o CRT computes back y(). 3. MODULO OPERATIONS BASED ON SI INPUTS LUTS I FPGAs, the LEs are based o LUTs ad, i particular, the last geeratio FPGAs are characterized by LEs cotaiig six iputs LUTs (useful to implemet six iput oe output combiatorial fuctios) that ca be also cofigured as double 5 iputs LUTs (useful to implemet five iputs double output combiatorial fuctios) ([4], [5]). Cosequetly, i the paper, the moduli set is chose such that the moduli rage belogs to the iterval [17, 64], moreover they must be coprime ad usually it is coveiet to use a power of two (such as 2 with = 5 or = 6) as the bigger modulus. I the rest of the paper the followig arithmetic blocks are aalyzed 1. Modulo m adders 2. Costat multipliers (costat coefficiets FIR filters) 3. Geeral multipliers (variable coefficiets FIR filters) 3.1. Modulo m adders If a modulo m is chose such that 2 1 < m 2, the rage of the results geerated by operatios mod. m are i the rage [0, 2 1] ad therefore bits are used to represet the results. A fast architecture ca be used to implemet the additio modulus m as show i Fig. 1 It is composed by a two operads adder, a three operads adder ad a multiplexer. The two operads -bits adder computes S1 = +, the three iputs -bits adder computes S2 = + m ad the 2 1 multiplexer selects S1 or S2 depedig o the the carry out of S2. -m -bits -bits S1 Cout S2 MU Fig. 1. The parallel modulo m adder. I this paper a differet architecture to compute + m is preseted obtaiig a delay comparable to that of the parallel architecture by usig less resources. The architecture is show i Fig. 2 where ad are added obtaiig S. This value is used to address a ROM (based o six iputs LUTs) cotaiig S m. -bits +1 S 2 +1 ROM < > m Fig. 2. The ROM based modulo m adder For a 5 bits modulo, the size of the ROM is 2 6 5, correspodig to 5 six iputs LUTs, while for a 6 bits modulo the size of the ROM is 2 7 6, correspodig to 12 six iputs LUTs. The growth of the ROM size is expoetial, but for m up to 64 this structure is slightly coveiet with respect to the parallel implemetatio as show i Table I. This table shows the sythesis results i terms of umber of LUTs ad delay for differet values of five ad six bits moduli i compariso with the parallel implemetatio. m Parallel mod. Adder ROM based mod. Adder delay(s) #LUT delay(s) #LUT 19 1,59 15 1,53 10 31 1,62 15 1,55 10 35 1,77 18 1,77 18 63 1,71 16 1,62 15 Table 1. Area ad delay of parallel ad ROM based modulo adders implemeted o a ilix Virtex V FPGA I the case of five bits moduli this implemetatio gives 33% of resource savigs maitaiig the same delay, while for six bits moduli the results i term of used resources ad delay are similar ad there are o advatages. 3.2. Modulo m Multipliers: variable coefficiets, costat coefficiets I this sectio, costat coefficiets ad geeral multipliers are aalyzed.

1. Modulo m costat multipliers. They are used to implemet RNS FIR filters with costat coefficiets. If is the umber of bits to represet m, K m requires output bits. If = 6, it ca be implemeted by usig a 2 6 6 ROM that, i the case a ilix Virtex V FPGA is implemeted by usig 6 six iputs LUTs with a critical path of about 0.8 s. 2. Geeral multipliers. I this case, beig m a prime umber (6 bit) the isomorphism techique [6] ca be used to perform the multiplicatio. This techique is based o the algebraic properties of the structure composed by the modulo m additio ad multiplicatio ad the umbers i the iterval [0, m 1]. I fact the rig is a fiite field ad therefore (a) each elemet differet from zero has a multiplicative iverse (b) it exists a elemet of the field, called α, such as x [1, m 1] i α i = x ad α m = α The modulo m multiplicatio of two umbers becomes x y m = α i α j m = α i+j m. Because αm = α the additio of i+j is performed modulo m 1. The architecture of the isomorphic multiplier is show i Fig. 3. Log () Log () MODULO m-1 α k 0 Fig. 3. The modulo m multiplier based o the isomorphism techique The blocks amed Log (based o LUTs) performs the associatio betwee the value ad ad the correspodig idexes i ad j, while the block α k performs the iverse associatio betwee the result of i + j m 1 ad the value α k. Some additioal logic allows resolvig the case i which oe or both the operads are zero. The modulo m 1 adder ca be implemeted by either the parallel ad the ROM based modulo adder. If the ROM based modulo adder is used, two ROMS, the first performig the operatio m 1, the secod performig the iverse isomorphism, are used as show i Fig. 4. The two ROMs ca be combied i a sigle ROM performig both the operatios. This implemetatio requires MU OR Log () Log () +1 2 +1 ROM < > m 1 α k 0 Fig. 4. The isomorphic based modulo m multiplier with ROM based Modulo additio the use of about 30 LUTs, with a maximum delay of 3.04s. Istead, the parallel implemetatio requires 36 LUTs with a maximum delay of 3.78s. Therefore, by embeddig the two ROMs i a sigle ROM the architecture is about 20% faster ad shows a 15% of resource savigs. 4. FIR FILTER IMPLEMENTATION A N taps FIR filter is described by y() = N 1 k MU h k x( k) (3) Its fixed poit implemetatio, i trasposed or direct form, is obtaied by usig multipliers adders ad registers. I particular, i parallel implemetatios, the reductio of the used resources is usually accomplished by trucatig the multipliers outputs. The umber of trucated bits is the result of a fixed poit optimizatio phase that is based o a trade of betwee resource savigs ad sigal to oise ratio worseig. The implemetatio of RNS FIR filters is a direct cosequece of eq. (2) ad eq. (3) becomes y() m1 =... y() mp = N 1 k N 1 k hk m1 x( k) m1 m 1 hk mp x( k) mp m P m 1 OR m P(4) The filter is implemeted i RNS by decomposig it ito P FIR filters workig i parallel, as sketched i Fig. 5 (P=3). 4.1. Modulo m i filters The architecture of the mod m i filters (based o eq. (4)) is depicted i Fig. 6. where, the shaded area, is filter basic buildig block (the mod. m i tap).

< >m1 m1 filter x mi () x() < >m2 < >m3 m2 filter m3 filter RNS to Biary Fig. 5. RNS implemetatio of a FIR filter y() h j s j + Delay s j+1 Fig. 7. Optimized slice of a modulo m i FIR filter with costat coefficiets x mi () + h N Delay...... + h j Delay... h 1 h 0... + Delay + y mi () are memorized. For a 5 bits modulo the resource usage is 10 LUTs ad the delay is about 1,5 s, while for a 6 bits modulo the resource usage is aroud 16 LUTs ad the delay is about 1,7 s. I Fig. 8 the architecture of a tap i case of variable coefficiets filter is show. s i s out x m () Log Fig. 6. Architecture of a modulo m i FIR filter + Log h j+1 The filter tap computes the followig equatio E s out (j) = x mi h j + s i (5) where s i = s out (j 1) ad m i 2. Also i this case, the filter tap has bee optimized by usig a method similar to that used for the geeral multiplier preseted i the previous sectio. Moreover, i the followig, the aalysis is restricted to moduli beig prime umbers i order to make it possible the use of the isomorphism techique. For costat coefficiets filters, the filter tap (Fig.6) requires a ROM ad a modular adder that ca be either a parallel or a ROM based adder. Equatio 5 ca be rewritte as s out (j) = h j (x mi + h 1 j s i ) = h j s out (6) where s out = x mi + h 1 j iverse of h j mod. m i. s i ad h 1 j is the multiplicative For cosecutive slices the filter coefficiets h 1 j+1 ad h j ca be combied as s out (j + 1) = h j+1 (x mi + h 1 j+1 s out(j) ) = ( h j+1 x mi + h 1 j+1 h j (x mi + h 1 j s out (j 1) )) = h j+1 (x mi + h j s out (j) ) (7) where h j = h 1 j+1 h j. I this way for the itermediate slices the tap ca be implemeted as depicted i Fig. 7. The operatio hj (x mi + s j ), where h j is a costat factor is implemeted by usig a ROM based modular adder. I the ROM precomputed values of m i hj (x mi + s j ) m i s j + < > m Delay s j+1 Fig. 8. Optimized architecture of a slice for a modulo m i variable coefficiets FIR filter The Log operators are implemeted by 2 ROMs, the E operator is a 2 +1 ROM performig modulo reductio ad expoetiatio, the mi operator is ROM based, the adders are -bits adders, while the critical path is composed by two adders ad three ROMs. The first optimizatio cosists i sharig the Log operator that is the same for all the slices composig the modulo m i filter. The secod optimizatio is obtaied by balacig the paths of the slices movig the ROM implemetig the mi operator after the delay elemet. I this way the critical path is reduced to two ROMs ad two adders. 5. FIR FILTERS EPERIMENTS I this sectio a set of experimets for the characterizatio of FIR filters are described. Two cases have bee selected: 8 bits ad 12 bits both for the coefficiets ad iput samples while, the umber of filter taps vary from 16 to 256. The aalysis has bee restricted to costat coefficiet filters, but it ca be easily exteded to variable coefficiet filters. The set of moduli is composed by a power of two modulo (2, up to 9) ad the remaiig moduli are prime umbers that ca be represeted by 5 bits. I table II the set of sythesized filters are show. For dyamic rages up to 23 bits 4 moduli have bee used while for the biggest dyamic rage (32 bits) 7 moduli

FIR Iput/Coeff (bits) N. taps M(Bits) Moduli set FIR1 8 16 20 64,31,29,23 FIR2 8 32 21 128,31,29,23 FIR3 8 64 22 256,31,29,23 FIR4 8 128 23 512,31,29,23 FIR5 8 256 24 64,31,29,23,19 FIR6 12 16 28 64,31,29,23,19,17 FIR7 12 32 29 128,31,29,23,19,17 FIR8 12 64 30 256,31,29,23,19,17 FIR9 12 128 31 512,31,29,23,19,17 FIR10 12 256 32 64,31,29,23,19,17,13 Table 2. Descriptio of the set of FIR filters sythesis experimets I this table, the resources for the implemetatio of the iput ad output coverters have bee evaluated showig that it become less tha 10% for N > 64 (see FIR3 ad FIR8). Fially, the results of the sythesis of the RNS filters have bee compared to a TCS implemetatio (o trucatio). As idicated i sectio II usually trucatio is used i order to limit the resources i TCS filters but it has bee show i the literature [15] that trucatio do ot offset the advatages of a RNS implemetatio. Moreover, the RNS represetatio is ofte used to desig filters with error detectio ad correctio capabilities [13], [14]). If trucatio is used, error detectio techiques caot be used. The results are preseted i table IV. The resource savigs obtaied by usig RNS are always greater tha 30% whe the dyamic rage of the iput data is 12 bits, while i case of 8 bits the advatage depeds o the umber of taps. For the FIR1 there are o savigs but a small icremet i the resources usage due to the overhead of the coversio blocks but savigs up to 20% are obtaied for FIR5 ad FIR 3 experimets. The experimetal results shows that the preseted techiques offer iterestig advacaoical RNS savig Exp. Name (#LUTs) (#LUTs) (%) FIR1 788 804-2 FIR2 1800 1460 18 FIR3 3632 2900 20 FIR4 6966 6356 8 FIR5 15203 12296 19 FIR6 1899 1252 34 FIR7 3338 2228 33 FIR8 6555 4308 34 FIR9 14043 9044 35 FIR10 29234 17545 40 Table 4. Compariso of RNS ad TCS filters are required. I Table III the results i terms of resources ad speed performaces for the set of sythesized filters are listed. The maximum frequecy for the 8 bits filters (from FIR1 to FIR5) is bouded by the maximum operatig frequecy of the filter tap (about 435 MHz), while for the 12 bits filters (from FIR6 to FIR10) the maximum workig frequecy of the filter is limited by the iput coverter speed (300 MHz). FIR Max. freq. Taps I coverter Out coverter Total resources (MHz) (#LUTs) (#LUTs) (#LUTs) (#LUTs) FIR1 400 592 30 182 804 FIR2 400 1248 30 182 1460 FIR3 400 2688 30 182 2900 FIR4 400 6144 30 182 6356 FIR5 400 12032 40 224 12296 FIR6 303 912 70 270 1252 FIR7 303 1888 70 270 2228 FIR8 303 3968 70 270 4308 FIR9 303 8704 70 270 9044 FIR10 303 17152 84 309 17545 Table 3. Resource usage ad speed for the experimets tages for FIR filters characterized by high dyamic rage ad high umber of taps especially whe full custom multipliers are ot available i the target FPGA architecture or whe they must to be used for differet purposes. 6. CONCLUSION The optimizatio of Residue Number System (RNS) arithmetic to better exploit some of the architectural characteristic of the last geeratio FPGAs has bee preseted. Usig a approach based o ROM modular adders differet optimizatio techiques for the basic modular operatios ad for the basic blocks of RNS filters has bee discussed. The choice of 5-bit moduli allows to implemet high speed, low resource occupatio RNS filters, as show i the set of experimets discussed i the paper. 7. REFERENCES [1] 2007 Iteratioal Techology Roadmap for Semicoductors, http://public.itrs.et/. [2] http://www.xilix.com/compay/history.htm#begi [3] http://www.altera.com/products/devices/stratix3/st3- idex.jsp [4] Virtex-5 Family Overview L, LT, ad ST Platforms [5] Logic Array Blocks ad Adaptive Logic Modules i Stratix III Devices chapter i volume 1 of the StratixIII Device Hadbook. [6] I. Viogradov, A Itroductio to the Theory of Numbers. New ork: Pergamo Press, 1955. [7] N. Szabo ad R. Taaka, Residue Arithmetic ad its Applicatios i Computer Techology. New ork: McGraw-Hill, 1967. [8] M. Sodestrad, W. Jekis, G. A. Jullie, ad F. J. Taylor, Residue Number System Arithmetic: Moder Applicatios i Digital Sigal Processig. New ork: IEEE Press, 1986.

[9] T. V. Vu, Efficiet implemetatio of the chiese remaider theorem for sig detectio ad residue decodig, IEEE Tras. Circuits Systems-I, vol. 45, pp. 667-669, Jue 1985. [10] S.Piestrak, A high-speed realizatio of a residue to biary umber system coverter, IEEE Tras. Circuits Systems-II Aalog ad Digital Sigal Processig, vol. 42, pp. 661-663, Oct. 1995. [11] G. Cardarilli, M. Re, ad R. Lojacoo, A residue to biary coversio algorithm for siged umbers, Europea Coferece o Circuit Theory ad Desig (EC- CTD97), vol. 3, pp. 1456-1459, 1997. [12] S. Badyopadhyay, G.A. Jullie, A. Segupta, A Systolic Array for Fault [13] Mark H. Etzel ad W. K. Jekis Redudat Residue Number Systems for Error Detectio ad Correctio i Digital Filters, IEEE Trasactios o Acoustics, Speech ad Sigal Processig, vol. ASS-28, No 5, pp. 538-544, October 1980. [14] S. Potarelli, G.C. Cardarilli, M. Re, A. Salsao Totally Fault Tolerat RNS based FIR Filters, to be published i IEEE Iteratioal O-Lie Testig Symposium 2008. [15] A. Naarelli, M. Re, G. C. Cardarilli, Tradeoffs betwee Residue Number System ad Traditioal FIR Filters, IEEE Iteratioal Symposium o Circuits ad Systems, ISCAS 2001, Vol. II, pp. 305-308, Sydey (Australia), May 6-9, 2001.