FIR_NTAP_MUX. N-Channel Multiplexed FIR Filter Rev Key Design Features. Block Diagram. Applications. Pin-out Description. Generic Parameters

Similar documents
DDC_DEC. Digital Down Converter with configurable Decimation Filter Rev Block Diagram. Key Design Features. Applications. Generic Parameters

BPSK_DEMOD. Binary-PSK Demodulator Rev Key Design Features. Block Diagram. Applications. General Description. Generic Parameters

Block Diagram. i_in. q_in (optional) clk. 0 < seed < use both ports i_in and q_in

Multi-Channel FIR Filters

FIR Compiler v3.2. General Description. Features

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 2 nd year engineering students

VLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K.

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Audio Sample Rate Conversion in FPGAs

Implementation and Comparison of Low Pass FIR Filter on FPGA Using Different Techniques

CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR

FINITE IMPULSE RESPONSE (FIR) FILTER

EE25266 ASIC/FPGA Chip Design. Designing a FIR Filter, FPGA in the Loop, Ethernet

[Devi*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785

FPGA Implementation of Adaptive Noise Canceller

QAM Receiver Reference Design V 1.0

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Performance Analysis of FIR Filter Design Using Reconfigurable Mac Unit

Rapid Design of FIR Filters in the SDR- 500 Software Defined Radio Evaluation System using the ASN Filter Designer

The Comparative Study of FPGA based FIR Filter Design Using Optimized Convolution Method and Overlap Save Method

FPGA Implementation of High Speed FIR Filters and less power consumption structure

Keywords: Adaptive filtering, LMS algorithm, Noise cancellation, VHDL Design, Signal to noise ratio (SNR), Convergence Speed.

FIR Filter Design on Chip Using VHDL

Pre-distortion. General Principles & Implementation in Xilinx FPGAs

SCUBA-2. Low Pass Filtering

CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER

Distributed Arithmetic FIR Filter v8.0

Single Chip FPGA Based Realization of Arbitrary Waveform Generator using Rademacher and Walsh Functions

Stratix II DSP Performance

Implementing Logic with the Embedded Array

Design of FIR Filter on FPGAs using IP cores

Cyclone II Filtering Lab

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

EFFICIENT FPGA IMPLEMENTATION OF 2 ND ORDER DIGITAL CONTROLLERS USING MATLAB/SIMULINK

Using Soft Multipliers with Stratix & Stratix GX

Techniques for Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices

FPGA based Asynchronous FIR Filter Design for ECG Signal Processing

Verification of a novel calorimeter concept for studies of charmonium states Guliyev, Elmaddin

Digital Systems Design

REALIZATION OF FPGA BASED Q-FORMAT ARITHMETIC LOGIC UNIT FOR POWER ELECTRONIC CONVERTER APPLICATIONS

Implementing Multipliers with Actel FPGAs

Stratix Filtering Reference Design

A Survey on Power Reduction Techniques in FIR Filter

On-Chip Implementation of Cascaded Integrated Comb filters (CIC) for DSP applications

Multi-Channel Digital Up/Down Converter for WiMAX Systems

High Performance DSP Solutions for Ultrasound

MULTIRATE IIR LINEAR DIGITAL FILTER DESIGN FOR POWER SYSTEM SUBSTATION

Design and Implementation of High Speed Carry Select Adder

FPGA based Uniform Channelizer Implementation

Comparison between Haar and Daubechies Wavelet Transformions on FPGA Technology

IMPLEMENTATION OF DIGITAL FILTER ON FPGA FOR ECG SIGNAL PROCESSING

Keyword ( FIR filter, program counter, memory controller, memory modules SRAM & ROM, multiplier, accumulator and stack pointer )

CHAPTER III THE FPGA IMPLEMENTATION OF PULSE WIDTH MODULATION

Crest Factor Reduction

IMPLEMENTATION OF QALU BASED SPWM CONTROLLER THROUGH FPGA. This Chapter presents an implementation of area efficient SPWM

Resource Efficient Reconfigurable Processor for DSP Applications

Time to Digital Converter Core for Spartan-6 FPGAs

Efficient Parallel Real-Time Upsampling with Xilinx FPGAs

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

PLC2 FPGA Days Software Defined Radio

An FPGA Based Architecture for Moving Target Indication (MTI) Processing Using IIR Filters

Discontinued IP. IEEE e CTC Decoder v4.0. Introduction. Features. Functional Description

PARIS-MB User Manual

Design of FIR Filter Using Modified Montgomery Multiplier with Pipelining Technique

10. DSP Blocks in Arria GX Devices

Design and Analysis of RNS Based FIR Filter Using Verilog Language

High speed all digital phase locked loop (DPLL) using pipelined carrier synthesis techniques

CHAPTER 4 DDS USING HWP CORDIC ALGORITHM

6. DSP Blocks in Stratix II and Stratix II GX Devices

FPGA Realization of Gaussian Pulse Shaped QPSK Modulator

Hardware Realization of Embedded Control Algorithm on FPGA

The Application of System Generator in Digital Quadrature Direct Up-Conversion

An Optimized Direct Digital Frequency. Synthesizer (DDFS)

IJSRD - International Journal for Scientific Research & Development Vol. 5, Issue 06, 2017 ISSN (online):

Stratix II Filtering Lab

Field Programmable Gate Array Implementation and Testing of a Minimum-phase Finite Impulse Response Filter

Channelization and Frequency Tuning using FPGA for UMTS Baseband Application

Tirupur, Tamilnadu, India 1 2

Team proposals are due tomorrow at 6PM Homework 4 is due next thur. Proposal presentations are next mon in 1311EECS.

VLSI Implementation of Digital Down Converter (DDC)

SINGLE MAC IMPLEMENTATION OF A 32- COEFFICIENT FIR FILTER USING XILINX

EECS 452 Midterm Exam Winter 2012

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

An area optimized FIR Digital filter using DA Algorithm based on FPGA

An Effective Implementation of Noise Cancellation for Audio Enhancement using Adaptive Filtering Algorithm

Functional analysis of DSP blocks in FPGA chips for application in TESLA LLRF system

The Design and Simulation of Embedded FIR Filter based on FPGA and DSP Builder

International Journal of Advanced Research in Computer Science and Software Engineering

Architecture for Canonic RFFT based on Canonic Sign Digit Multiplier and Carry Select Adder

Lecture 3, Handouts Page 1. Introduction. EECE 353: Digital Systems Design Lecture 3: Digital Design Flows, Simulation Techniques.

CARRY SAVE COMMON MULTIPLICAND MONTGOMERY FOR RSA CRYPTOSYSTEM

Design of Multiplier Less 32 Tap FIR Filter using VHDL

Ultrasonic imaging has been an essential tool for

FPGA Implementation of Desensitized Half Band Filters

Design and Implementation of Scalable Micro Programmed Fir Filter Using Wallace Tree and Birecoder

EEM478-WEEK8 Finite Impulse Response (FIR) Filters

Beam Forming Algorithm Implementation using FPGA

Lecture 3 Review of Signals and Systems: Part 2. EE4900/EE6720 Digital Communications

Advanced Digital Signal Processing Part 5: Digital Filters

An Optimized Design for Parallel MAC based on Radix-4 MBA

Transcription:

Key Design Features Block Diagram Synthesizable, technology independent VHDL Core N-channel FIR filter core implemented as a systolic array for speed and scalability Support for one or more independent channels (8 maximum) Configurable and independent coefficient sets for each channel Configurable data width and number of taps Symmetric arithmetic rounding limits DC-bias problems Output saturation or wrap modes Much cheaper than implementing separate FIR filters in parallel as hardware resources are shared between all channels Supports 550 MHz+ sample rates (550/N MHz per channel) 1 Applications High-speed filtering applications where hardware resources are limited - e.g. when it becomes impractical to use multiple FIR filters in parallel Dual-channel inputs such as complex valued I/Q in digital communications systems Parallel DSP processor architectures General purpose FIR filters with odd or even numbers of taps Filters with arbitrary sets of coefficients (e.g. non-symmetrical) Generic Parameters Generic name Description Type Valid range num_channels Number of filter channels (N) integer 1 N 8 num_taps Number of filter taps integer > 2 dw Width of input/output data samples integer 2 cw Width of coefficients integer 2 fw coeff_a to coeff_h [num_taps-1:0] USE_ROUNDING USE_SATURATE Number of coefficient fraction bits Filter coefficients (one coefficient set per filter channel) Use symmetric arithmetic rounding (not truncate) Saturate outputs (not wrap) 1 Xilinx Virtex 6 FPGA used as a benchmark integer 0 (fw < cw) integer array x N Boolean Boolean Any integer in range +/- 2 (cw-1) TRUE/FALSE TRUE/FALSE Pin-out Description Pin name I/O Description Active state clk in System clock (F S) rising edge reset in System reset low en in Clock enable high x_val in Filter inputs valid (coincident with first valid input sample at x0_in) xn_in [dw - 1:0] in Filter input samples (signed number) y_val out Filter outputs valid (coincident with first valid output sample at y0_out) yn_out [dw - 1:0] out Filter output samples (signed number) General Description Figure 1: N-Channel FIR filter architecture high data high data FIR_NTAP_MUX is an N-channel multiplexed FIR filter designed for high sample rate applications where hardware resources are limited. The main filter core is organized as a scalable systolic array permitting the user to specify large order filters without compromising maximum attainable clock-speed. Copyright 2013 www.zipcores.com Download this VHDL Core Page 1 of 5

Essentially the filter functions as if it were 'N' separate FIR filters. Each input sample is multiplexed into the filter at a sample rate equal to F S/N, where F S is the sampling frequency of the main filter core. Likewise, output samples are updated at a frequency of F S/N. The first sample into the filter is aligned by asserting the signal x_val high. The signal y_val is asserted with the first valid output sample. Data samples are advanced in the pipeline on the rising clock-edge of clk when en is active high. When en is low then all data samples are stalled. The clock-enable signal may be used to temporarily disable the filter - or possibly to modify the effective sampling frequency of the system clock. If the clock-enable is not needed it is recommended that this signal be tied high as it will improve overall circuit performance. Mathematically, the filter implements the difference equation: y[n] = h 0 x[n] + h 1 x[n 1] +... + h N x[n N ] In the above equation, the input signal is x[n], the output signal is y[n] and h0 to hn represent the filter coefficients. The number N is the filter order, the number of filter taps being equal to N+1. Filter coefficients and I/O specification Filter coefficients 2 are defined as signed fixed-point numbers in [cw fw] format where cw is the total number of coefficient bits and fw is the number of bits in the fractional part. In all cases, cw must be at least 2 bits and fw must be less than cw to accommodate the sign bit. For instance, a coefficient in [10 8] format would be arranged as follows: Filter latency The latency of the filter defined here is the latency in system clock-cycles from the point in which the first input sample is valid, to the point in which the first output sample is valid. The total latency is defined by the following formula: Lat TOT = (N Taps) + ( N 2) + Lat RND + Lat SAT + 2 N = Number of channels Taps = Number of filter taps Lat RND = 2 if rounding enabled,0otherwise Lat SAT = 1 if saturate enabled, 0otherwise As an example, consider a 4-channel, 50 tap filter with rounding and saturation enabled. The total latency would be calculated as: (4*50) + (4*2) + 2 + 1 + 2 = 213 clock cycles. Sampling frequency considerations The system clock frequency is the sampling frequency of the internal filter core. Let this be denoted as F S. It follows that the sampling frequency of the input and output samples is dependent on the number of multiplexed channels, N. In particular the following formula must be observed for correct filter operation: F S (one channel) = F S N Functional Timing The number of bits in the input and output samples is controlled by the parameter dw. Inputs and outputs are signed values (their format is purely relative). Unused inputs should be tied to zero. For instance, if a filter design only requires four channels, then inputs x4_in to x7_in should be tied low. Figure 2 shows a sequence of input samples for an 8-channel filter. Note that the signal x_val is used to align the first data sample at the filter input. From that point onwards, the remaining inputs are sampled sequentially in turn. If the user wishes to re-align the filter inputs, then a system reset must be performed before x_val is reasserted with the new first sample. Filter implementation options Output samples may be truncated to dw bits or rounded depending on the implementation option USE_ROUNDING. If the rounding option is selected, then symmetric arithmetic rounding is used. This means that the fraction 0.1000... is added to positive numbers and 0.0111... is added to negative numbers. Note that filters implemented with the rounding option will help to reduce the small amplitude offset introduced at DC (0 Hz baseband frequency) attributable to rounding error. In addition, the option USE_SATURATE determines what will happen if the output samples are too large. If the saturate option is enabled, then in the event of an overflow, the output samples will saturate to the largest positive or negative number permitted by dw. With the saturate option disabled, the output samples will simply wrap around. Note that depending on the format of the coefficients and the data width relative to the magnitude of the input samples, the filter outputs may not overflow. In this case, the user may not require the saturation logic. 2 The design is supplied with Matlab scripts for the easy generation of coefficient sets using FDAtool. Please see application note: app_note_zc002.pdf for more details. Figure 2: Input timing - all 8 channels active The output samples follow a similar pattern. Figure 3 shows the corresponding outputs for an 8-channel filter. From the point at which y_val is asserted, the downstream circuit must sample the filter outputs on consecutive clock cycles. Copyright 2013 www.zipcores.com Download this VHDL Core Page 2 of 5

The test provided is configured for an eight-channel 31-tap FIR design with each channel having a different low-pass filter characteristic. A sampling frequency of 480MHz has been chosen for the test meaning that each FIR channel has a sample rate of 60MHz. The magnitude response of the first and last filter are shown in figures 4 and 5 respectively. Figure 3: Output timing - all 8 channels active Source File Description All source files are provided as text files coded in VHDL. The following table gives a brief description of each file. Note that all generic parameters (including coefficients) are defined in the package called 'fir_ntap_mux_pack.vhd'. Coefficient sets for the 8 possible filter channels are labelled coeff_a, coeff_b, coeff_c, etc... in the package. Figure 4: Low-pass filter response for channel 0 Source file fir_ntap_mux_pack.vhd fir_ntap_mux_in.vhd fir_ntap_mux_out.vhd fir_ntap_mux_val_pipe.vhd fir_ntap_mux_mad.vhd fir_ntap_mux_reg.vhd fir_ntap_mux_rnd.vhd fir_ntap_mux_sat_vhd fir_ntap_mux.vhd fir_ntap_mux_bench.vhd Description Package containing all generic parameters - including coefficients Input sample multiplexer Output sample de-multiplexer Control / valid pipeline Multiply-add block Register delay element Rounding block Saturation block Top-level component Top-level test bench Figure 5: Low-pass filter response for channel 7 Functional Testing The other 6 filters have different low-pass responses with intermediate cut-off frequencies between that of FIR channel 0 and FIR channel 7. An example VHDL testbench is provided for use in a suitable VHDL simulator. The compilation order of the source code is as follows: 1. fir_ntap_mux_pack.vhd 2. fir_ntap_mux_in.vhd 3. fir_ntap_mux_out.vhd 4. fir_ntap_mux_val_pipe.vhd 5. fir_ntap_mux_mad.vhd 6. fir_ntap_mux_reg.vhd 7. fir_ntap_mux_rnd.vhd 8. fir_ntap_mux_sat.vhd 9. fir_ntap_mux.vhd 10. fir_ntap_mux_bench.vhd (Note that the filter responses have been arbitrarily chosen for demonstration purposes only. In practice, the user may choose any FIR filter characteristic to suit their requirements). The simulation must be run for at least 1 ms during which time the impulse response and step response of each multiplexed filter is tested. The simulation generates a series of 8 text files called 'fir0_out.txt', 'fir1_out.txt', 'fir2_out.txt', etc.. These files contain the output samples for all 8 filter channels captured during the course of the test. The VHDL testbench instantiates the FIR filter component and the user may modify the generic parameters in the file 'fir_ntap_mux_pack.vhd' as required. Copyright 2013 www.zipcores.com Download this VHDL Core Page 3 of 5

Figures 6 and 7 respectively demonstrate the impulse response and step response outputs for the given test example. Synthesis The files required for synthesis and the design hierarchy is shown below: fir_ntap_mux_pack.vhd fir_ntap_mux.vhd fir_ntap_mux_in.vhd fir_ntap_mux_val_pipe.vhd fir_ntap_mux_mad.vhd fir_ntap_mux_reg.vhd fir_ntap_mux_rnd.vhd fir_ntap_mux_sat.vhd fir_ntap_mux_out.vhd The VHDL core is designed to be technology independent. However, as a benchmark, synthesis results have been provided for the Xilinx Virtex 6 and Spartan 6 FPGA devices. Synthesis results for other FPGAs and technologies can be provided on request. Smaller and faster designs will be achieved by setting the parameters USE_ROUNDING and USE_SATURATION to FALSE. In addition, choosing filters with similar sets of coefficients should result in small optimizations during synthesis. Figure 6: Impulse responses for all 8 independent channels Also, fixing the clock-enable signal to logic '1' will generally result in a faster and more compact filter implementation. Unused data inputs should be tied to logic '0'. Trial synthesis results are shown with the generic parameters set to: num_channels = 8, num_taps = 31, dw = 16, cw = 10, fw = 9, USE_ROUNDING = FALSE, USE_SATURATION = FALSE. Resource usage is specified after Place and Route. VIRTEX 6 Resource type Quantity used Slice register 1863 Slice LUT 1099 Block RAM 0 DSP48 31 Occupied Slices 314 Clock frequency (approx) 590 MHz Figure 7: Step response for all 8 independent channels SPARTAN 6 Resource type Quantity used Slice register 1867 Slice LUT 1094 Block RAM 0 DSP48 31 Occupied Slices 345 Clock frequency (approx) 250 MHz Copyright 2013 www.zipcores.com Download this VHDL Core Page 4 of 5

Revision History Revision Change description Date 1.0 Initial revision 08/02/2011 1.1 Added filter output plots 23/03/2011 1.2 Design now supports different coefficient sets pert channel. Updated synthesis results for Xilinx 6 series FPGAs 21/01/2013 Copyright 2013 www.zipcores.com Download this VHDL Core Page 5 of 5