Implementing DDC with the HERON-FPGA Family

HUNT ENGINEERING Chestnut Court, Burton Row, Brent Knoll, Somerset, TA9 4BP, UK Tel: (+44) (0)1278 760188, Fax: (+44) (0)1278 760199, Email: sales@hunteng.demon.co.uk URL: http://www.hunteng.co.uk Implementing DDC with the HERON-FPGA Family Rev 1.0 R.Weir 8-01-2001 Introduction The HERON-FPGA family is ideal for many of the building blocks of digital communications. Providing large easily-programmed gate arrays, often combined with interface elements like ADC or DACs, they can be used to implement many system components. A fundamental part of many communications systems is Digital Down Conversion (DDC). This allows a signal to be shifted from its carrier (or IF) frequency down to baseband. The technique greatly reduces the amount of effort required for subsequent processing of the signal without loss of any of the information carried. This note describes how a DDC can be implemented in the HERON-FPGA or HERON-IO family. While the technology looks complex, it is actually very simple, and can be implemented extremely quickly. Theory of operation is not addressed here there is a separate paper from Hunt Engineering which introduces the concepts of the DDC. What Are We Implementing? Essentially, a DDC comprises an oscillator, generating a quadrature signal; a mixer, which multiplies the incoming signal by the local oscillator; and then filtering. This combination frequency-shifts the signal. Typically the frequency shift is the frequency of the local oscillator. Many systems will try and match the local oscillator to the carrier, shifting the received signal back to baseband. RF Fast ADC Complex Multiplier Low Pass Filter Decimator Complex Oscillator DDC Block Figure 3 - A Theoretical DDC Block Diagram Getting the Signal from the ADC The ADC will be implemented externally to the FPGA. If you re using a module with an integrated ADC, the support library supplied should provide a macro for interfacing to the ADC converter(s). If the module does not have an ADC, you will need to gather the data using the FIFO interfaces again using macros from the library supplied. The details of this are module dependent so are not discussed further here. Implementing The Local Oscillator & Mixer This is the first stage of the DDC. The local oscillator is a quadrature oscillator, generating a highly accurate and stable version of the carrier. The Xilinx Core Generator tool offers a Direct Digital Synthesiser or DDS which will implement this function.

There are a number of parameters to consider: 1) How accurate should the frequency control of the oscillator be? With a digital oscillator, you can specify very fine control at the expense of increased FPGA utilisation. 2) Do you need to program the phase of the oscillator? To what resolution? 3) How much noise can you tolerate in the oscillator s output? Any noise will be passed through the mixer into the DDC s filters, and will add to the system s noise floor. You may want to analyse this. With this information, you can move to generating a core with the Xilinx toolset. The best approach is to generate the biggest DDS you can squeeze into the FPGA, but record the parameters you use. If you need to optimise the design later, you can regenerate the core with (for example) reduced frequency accuracy to increase speed. Similarly, the mixer is implemented as a pair of hard-coded multipliers. It is usually not worth trying to implement the mixer in a single multiplier. Ensure that the outputs are registered, and select the pipelined multiplier option. It should now be a simple matter to connect the oscillator to two of the multiplier inputs, while the ADC can go on the other input. Filtering & Decimation There are two main classes of DDC wideband and narrowband, differentiated by their decimation ratios. As a rough guide, if the decimation ratio is less than 32, consider the DDC wideband; if 32 or more, the DDC is narrowband. The filtering we will perform is different for narrowband or wideband, so is tackled separately. However, the decimators can be treated identically for wideband or narrowband systems. Note also that in some systems it may make sense to combine wideband and narrowband DDCs. For example, in a GSM system which uses 8 carriers, a wideband DDC could be used to shift the carriers down to a moderate frequency. This could be done using a simple oscillator no complex components. 8 narrowband DDCs could then be used to select the individual carriers. The theory is the same Filtering for Wideband DDCs With a wideband signal, we are typically reducing the sampling rate by a small amount, and the data output rate is large. Note that the output rate of a wideband DDC should be checked as part of your overall system design. In some systems that data rate will be significant, and could saturate a DSP processor if that is meant to be receiving it. The main challenge of a wideband receiver is getting enough processing to filter the signal. All the processing is performed at a fairly high rate, often 20-40MHz. Because of this, the filters tend to be very gate-intensive; a single wideband channel will typically consume more of an FPGA than several narrowband channels. Each design has different requirements. However the following is a rough guide to implementing the filter. The filter is best implemented as an FIR, and in fact the best approach is to use a multi-rate FIR. This may sound complex, but in fact a multi-rate FIR is simply an efficient way of implementing large filters with decimation. Imagine we need to implement a large filter at a high sampling rate, before decimating the signal. We could implement a 128-tap filter at 100MHz, but this would require a lot of multipliers and a huge FPGA. However, suppose we split the filter. The first filter can perform enough filtering to allow us to perform some decimation. The second filter is now operating at a much reduced sampling rate. Typically by splitting the filter in this way we can reduce the number of taps in the filter, and reduce the sampling rate that some of these taps operate it. Both reduce the amount of FPGA resource we require to build the filter. So, to implement your filter: start with a simple FIR. This stage should have a small number of taps - if you are operating at a high sampling rate, each FPGA multiplier will implement a very small number of taps. Use symmetric FIRs here the processing load is about half a non-symmetric filter, and the core generator provides this as an option. Regardless of which FPGA you are using, this stage will take a lot of silicon! The filter s bandwidth should match the output bandwidth of the DDC, but don t worry too much about the filter s performance later filters will improve this. (Note: the output bandwidth is the band that we re interested in typically only a few hundred KHz wide) Immediately after this filter, decimate the signal by 2, and implement a larger filter. Again, the filter s bandwidth should match the DDC output bandwidth. This will improve the response of the first filter. You can afford to have more taps in this stage, as the sampling rate is lower.

If you are operating a low decimation ratio, this could be all you have to do. That means that these filters have to have more taps than if we can use an additional stage. Experiment by trying cascades of filters with varying numbers of taps you will probably have to do this iteratively, using the Xilinx tools to try several different scenarios. For higher decimation ratios (e.g. 8 and up), you can afford to use a third stage filter. This can have significantly more taps than the first two, as each multiplier here can implement at least 4x as many taps as in the first stage. Again, you will want to experiment with the layout of the filters to see what gives best performance. Filtering for Narrowband DDCs Narrowband DDCs have a different set of challenges. With these, we need filters that can allow large decimation ratios without consuming too much of the FPGA. A very useful filter here is the Comb- Integrator Cascade filter, or CIC filter. This filter has remarkable properties it can implement decimation within the filter, and it provides a steep cut-off for relatively few stages. Best of all, it is implemented using only adders and delays, which makes it very well suited to FPGA implementation. The CIC has one failing it has a lot of droop in its passband, and serious ripples in its stopband. However, we can compensate for these with additional filtering of its output. Because of the need for additional filtering on a CIC s output, it is at its best with large decimation ratios. The larger the decimation ratio, the smaller the overhead of the filters used to compensate for the CIC. This makes it unsuitable for the wideband DDC we looked at earlier as the compensation filter becomes significant. However, for the narrowband DDC, it is ideal as a first stage. We would then follow that with a multirate FIR filter, as with the wideband DDC. Now, we can use as many taps as a single multiplier will allow. Generally a two-stage FIR works well, decimating by 2 between the stages. For many applications, 23 taps will work well in the first filter, and should be realisable with a single multiplier design; while 63 would be ideal in the second. Again, use symmetric FIRs to reduce the processing load. Designing the Filters Everybody has their own favourite way of designing filters. Some will simulate the filters in Matlab or one of the other commercial packages; others may use one of the on-line tools, such as the filter design package at http://www.nauticom.net/www/jdtaft/. In all cases, studying the datasheets of commercially available DDC devices can give some guidance many of these detail the procedures for designing a DDC filter chain, and often list the coefficients. These filters may be copyright protected, but they will give good guidance on design. The CIC doesn t really need designed you just need to calculate the word length required in the integrator s accumulators. Be careful here the CIC needs surprisingly big accumulators due to huge wordlength growth. The size of the accumulator is dependent on the number of stages in the filter and the decimation ratio check the references if you need help on this. Implementing the Filters Once you have the filters designed, implement them. For the CIC, it is best to implement this from scratch although there is a 5-stage CIC included in the example for this white paper. Be aware that the CIC s wordlength is dependent on the decimation ratio though the example CIC will overflow on larger decimation ratios. It is easily modified for longer wordlengths. For the FIRs, we will again us the Xilinx Core Generator tool this allows you to specify the parameters of the filter and it will generate an FPGA core to match. Make sure you keep a record of the settings you use this may well be an iterative process if you need to optimize the design for speed or size, and knowing what you asked for will help immensely! The Xilinx tool will even generate decimating FIRs for you. Specify the decimation ratio, and the output will signal a valid sample at the decimated output rate an incredibly powerful tool for this type of work. Note that the Xilinx FIR tool also allows you to specify the number of clocks per output sample. Make sure you specify the maximum number of clocks; that way, the tools will be able to use the least possible amount of FPGA resource to implement your filter. Note that in a multi-rate filter, the later stages will have more clocks per output than the early stages! Once the filters have been generated, they ll be in the library for your Xilinx design. Build up the DDC s processing chain. It is worth registering the data along the chain (specify the register output option on the filters). Make sure the system clock is connected to all elements we ll see how to control the DDC in just a moment.

The Decimator The final decimator can be implemented within the last FIR stage, if the decimation ratio is an integer. To do this, simply specify the required decimation ratio as part of the filter design. However, you may require a non-integer decimation ratio, or you may require a variable decimation ratio. In these cases, we must treat the decimator as a separate block. For non-integer decimation, run the FIR Core Generator one more time. Select an interpolating filter; this will automatically increase the sampling rate by the given amount. Place this in your design. The decimator can now be implemented as a register & counter. An example of this type of decimator can be found in the middle of the CIC filter used in this example. The Output Stages Finally, we will pass the output of the DDC to the HERON FIFOs. There is a decision to be made here, depending on your system design should you use blocking or non-blocking communications? Generally this is a system issue: In BLOCKING communications, the DDC will not send data to the FIFO if the FIFO is full. This prevents any data loss between the DDC and the FIFO. However, the knock-on effect is that data will be lost at the ADC. In NON-BLOCKING communications, the DDC will keep sending data to the FIFO even if the FIFO is full. This will result in a FIFO overflow. Again, data will be lost. Control over the DDC is simpler, but the data in the FIFO is now invalid. For most systems, the designer will work hard to ensure that FIFO overflow never occurs. In view of this, we d recommend non-blocking mode; however, blocking mode may allow easier testing of the system in non-real time. Controlling the Data Flow For most DDC systems, there is no need to control the data flow simply connect the output of each stage together. Then, on the high-speed sections, samples are processed on every clock; while on the lower speed sections, the Data Ready strobes of each stage can be used to clock the data into the next stage. If however the blocking communications model is adopted (as in the case of the example) we need to be able to stop the DDC s processing chain. This can be achieved using the Clock Enable inputs of each block. Use the example to see how this is achieved.

The Example The example implements a simple DDC. It has not been optimised, and was implemented by someone unfamiliar with the Xilinx toolset as a self-teach tutorial. Including learning the tools, it took a couple of days to implement and test. Features of the example include: Narrowband DDC for Heron-FPGA Local oscillator with quadrature output and fine control over frequency Local oscillator programmable via HERON FIFO interface 5-stage CIC filter, suitable for decimation in the range of 4-32 CIC decimation ratio programmable over FIFO interface 1 st stage symmetric FIR with 23 taps and fixed decimate by 2 2 nd stage symmetric FIR with 63 taps, again fixed decimate by 2 Blocking-mode interface to the HERON communications FIFOs This example, while implemented very quickly, is a versatile DDC with a feature set not dissimilar to several commercial DDC chips. It is suitable for implementation in a 150K-gate Xilinx Virtex device, although could be implemented with a lower specification in smaller devices. Features that could be added include an interpolator on the output for non-integer decimation ratios, and an output decimator. However, these are simple to add and are quite application specific. The example is on the Hunt Engineering CD, in the getting started with FPGA section under general examples.

Further Reading There are many aspects of DDCs and the HERON system that we have touched on lightly in this tutorial. Try the following for more information: System Documentation. Every HERON system is supplied with a system configuration document describing the modules installed and the FIFO interfaces used. This is an excellent starting point for understanding your system. Manuals. CIC Filters DDC Chips Make sure you read the manuals for the carrier and FPGA module you are using. If you are using a DSP module for the tutorial, you should read the documentation for that too. Specifically, look at the sections on HERON FIFOs, and the configurations your system supports. The original paper on CIC filters is a mathematical work of art however, that doesn t make it useful to engineering purposes! A lighter introduction is available at: DSPGuru http://www.dspguru.com/info/tutor/cic.htm This references Hogenauer s original paper, but it s not strictly necessary to understand all the maths to get a CIC working! Take a look at how the fixed-function DDC chips are structured. Many of these are overkill they have to be as they need to match the needs of many users, while your design can be application specific. You can still learn from them; Graychip in particular have excellent datasheets: Graychip http://www.graychip.com Xilinx Documentation We use the Xilinx FPGA tools to build & modify the FPGA program, and these will be your main development environment as you move forward with your own HERON-FPGA programs. These are powerful tools, well beyond the scope of this document to describe.