Stratix II DSP Performance

Similar documents
Using Soft Multipliers with Stratix & Stratix GX

Implementing FIR Filters and FFTs with 28-nm Variable-Precision DSP Architecture

Stratix II Filtering Lab

Stratix Filtering Reference Design

DIRECT UP-CONVERSION USING AN FPGA-BASED POLYPHASE MODEM

Techniques for Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices

Arria V Timing Optimization Guidelines

Cyclone II Filtering Lab

Enabling High-Performance DSP Applications with Arria V or Cyclone V Variable-Precision DSP Blocks

6. DSP Blocks in Stratix II and Stratix II GX Devices

Crest Factor Reduction

10. DSP Blocks in Arria GX Devices

FPGA Co-Processing Solutions for High-Performance Signal Processing Applications. 101 Innovation Dr., MS: N. First Street, Suite 310

4. Embedded Multipliers in Cyclone IV Devices

4. Embedded Multipliers in the Cyclone III Device Family

Implementing Logic with the Embedded Array

White Paper Stratix III Programmable Power

Power Optimization in Stratix IV FPGAs

High-Speed Link Tuning Using Signal Conditioning Circuitry in Stratix V Transceivers

Stratix GX FPGA. Introduction. Receiver Phase Compensation FIFO

Managing Metastability with the Quartus II Software

Digital Downconverter (DDC) Reference Design. Introduction

Implementing Dynamic Reconfiguration in Cyclone IV GX Devices

VLSI Implementation of Digital Down Converter (DDC)

Introduction to Simulation of Verilog Designs. 1 Introduction

Channelization and Frequency Tuning using FPGA for UMTS Baseband Application

CDR in Mercury Devices

Implementing Multipliers

Implementing QPI Using the Transceiver Native PHY IP Core in Stratix V Devices

Introduction to Simulation of Verilog Designs Using ModelSim Graphical Waveform Editor. 1 Introduction. For Quartus II 13.1

Technical Brief High-Speed Board Design Advisor Thermal Management

Digital Systems Design

Understanding Timing in Altera CPLDs

A Scalable OFDMA Engine for WiMAX

PLL & Timing Glossary

Introduction to Simulation of Verilog Designs. 1 Introduction. For Quartus II 13.0

Implementation of FPGA based Design for Digital Signal Processing

The Loss of Down Converter for Digital Radar receiver

Quartus II Simulation with Verilog Designs

Introduction to Simulation of Verilog Designs. 1 Introduction. For Quartus II 11.1

Quartus II Simulation with Verilog Designs

NCO MegaCore Function User Guide

3. Cyclone IV Dynamic Reconfiguration

Digital Logic, Algorithms, and Functions for the CEBAF Upgrade LLRF System Hai Dong, Curt Hovater, John Musson, and Tomasz Plawski

This document addresses transceiver-related known errata for the Stratix GX FPGA family production devices.

Pre-distortion. General Principles & Implementation in Xilinx FPGAs

MAX 10 Analog to Digital Converter User Guide

Section 1. Fundamentals of DDS Technology

2. Cyclone IV Reset Control and Power Down

8. QDR II SRAM Board Design Guidelines

Audio Sample Rate Conversion in FPGAs

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

FIR_NTAP_MUX. N-Channel Multiplexed FIR Filter Rev Key Design Features. Block Diagram. Applications. Pin-out Description. Generic Parameters

Design of Multiplier Less 32 Tap FIR Filter using VHDL

PRODUCT HOW-TO: Building an FPGA-based Digital Down Converter

Software Design of Digital Receiver using FPGA

THIS work focus on a sector of the hardware to be used

Multiple Reference Clock Generator

ATSC 8VSB Modulator IP Core Specification

A Survey on Power Reduction Techniques in FIR Filter

Intel MAX 10 Analog to Digital Converter User Guide

LOW-POWER SOFTWARE-DEFINED RADIO DESIGN USING FPGAS

Wideband DDC IP Core Specifcaton

The Design and Simulation of Embedded FIR Filter based on FPGA and DSP Builder

Mapping Multiplexers onto Hard Multipliers in FPGAs

PLC2 FPGA Days Software Defined Radio

Achieve a better design sooner.

The Frequency Divider component produces an output that is the clock input divided by the specified value.

Reed-Solomon II MegaCore Function User Guide

FPGA Based 70MHz Digital Receiver for RADAR Applications

QAM Receiver Reference Design V 1.0

Increasing ADC Dynamic Range with Channel Summation

Multi-Channel Digital Up/Down Converter for WiMAX Systems

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 2 nd year engineering students

DDC_DEC. Digital Down Converter with configurable Decimation Filter Rev Block Diagram. Key Design Features. Applications. Generic Parameters

Burst BPSK Modem IP Core Specifccatoon

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Functional analysis of DSP blocks in FPGA chips for application in TESLA LLRF system

QAM Modulator IP Core Specifcatoon

An Efficient Method for Implementation of Convolution

Design of Adjustable Reconfigurable Wireless Single Core

Design Implementation Description for the Digital Frequency Oscillator

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Appendix B. Design Implementation Description For The Digital Frequency Demodulator

Intel MAX 10 Analog to Digital Converter User Guide

A FFT/IFFT Soft IP Generator for OFDM Communication System

MULTIRATE IIR LINEAR DIGITAL FILTER DESIGN FOR POWER SYSTEM SUBSTATION

DOPPLER SHIFTED SPREAD SPECTRUM CARRIER RECOVERY USING REAL-TIME DSP TECHNIQUES

DVB-C Modulator IP Core Specifcatoon

Design of Digital FIR Filter using Modified MAC Unit

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

BPSK_DEMOD. Binary-PSK Demodulator Rev Key Design Features. Block Diagram. Applications. General Description. Generic Parameters

Inside the Delta-Sigma Converter: Practical Theory and Application. Speaker: TI FAE: Andrew Wang

FPGA based Uniform Channelizer Implementation

CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR

LLRF4 Evaluation Board

A Comparison of Two Computational Technologies for Digital Pulse Compression

An area optimized FIR Digital filter using DA Algorithm based on FPGA

Transcription:

White Paper Introduction Stratix II devices offer several digital signal processing (DSP) features that provide exceptional performance for DSP applications. These features include DSP blocks, TriMatrix memory, and three-input adder support; and make Stratix II devices ideal for the entire data path or as FPGA coprocessors for wireless infrastructure, broadcast television, medical imaging, and other signal and image processing applications. This white paper discusses performance of Stratix II devices by examining the device architecture, Quartus II software support, and DSP MegaCore intellectual property (IP) support. Several examples show the benefits of Stratix II devices using readily available DSP IP from Altera. Stratix II Device Architecture Stratix II DSP blocks deliver up to 420-MHz performance on 18-bit 18-bit fully pipelined multipliers and can be configured to support the following modes: Up to eight 9-bit 9-bit multipliers Up to four 18-bit 18-bit multipliers Up to one 36-bit 36-bit multiplier Stratix II devices contain up to 96 DSP blocks or 384 18-bit 18-bit multipliers, more than four times the number of DSP blocks found in Stratix devices. The Stratix II adaptive logic modules (ALMs) in the logic structure can be configured to efficiently support three-input adders so that two bits of three numbers can be added together in one ALM. Figure 1 shows the benefits of Stratix II devices in designs such as filtering, which require large adder trees. The three-input adders lead to a reduction in the adder count of nearly 50%, which directly translates into logic resource savings. January 2005, ver. 2.0 1 WP-STXIIDSP-2.0

Altera Corporation Figure 1. Stratix II & Stratix Comparison in Adder Tree Designs 128 Inputs Stratix Adder Tree Implementation Add 2 Bits per Logic Element Stratix II Adder Tree Implementation Two 3-Bit Adds per ALM 128 Inputs 64 Adders 42 Adders 32 Adders 14 Adders 16 Adders 5 Adders 8 Adders 2 Adders 4 Adders 127 Adders, 7 Levels 64 Adders, 5 Levels The DSP blocks also have hardware support to perform optional saturation and rounding after each 18 18 multiplier for Q1.15 input formats. This feature enables critical logic savings for algorithms such as speech compression. For more information on device architecture features, refer to the DSP Block section of Stratix II Device Handbook, Volume 2. Quartus II Software Support The Quartus II software, version 4.2, has been optimized to support Stratix II device features, such as DSP Blocks and three-input adders. There are two distinct methods for implementing various modes of the DSP block in a design: instantiation and inference. Both methods use the following three Quartus II megafunctions: lpm_mult altmult_add altmult_accum It is possible to instantiate the megafunctions in the Quartus II software to use the DSP block. Alternatively, with inference, it is possible to create an HDL design and synthesize it using a third-party synthesis tool such as LeonardoSpectrum, Synplify, or Quartus II Native Synthesis, which infers the appropriate megafunction by recognizing multipliers, multiplier adders, and multiplier accumulators. Using either method, the Quartus II software maps the functionality to the DSP blocks during compilation. 2

Altera Corporation Refer to the Quartus II software online help for instructions on using the megafunctions and the MegaWizard Plug-In Manager. For more information about synthesis tool inference support, refer to application note AN 193: Design Guidelines for Using DSP Blocks in the Synplify Software and AN 194: Design Guidelines for Using DSP Blocks in the LeonardoSpectrum Software. Both the Parallel Adder megafunction (parallel_add) and synthesis inference are available for implementing the three-input adder. For example, the code d = a b c automatically maps to the three-input adder structure. DSP Megacore IP Support Altera and its partners in the Altera Megafunction Partners Program (AMPP SM ) offer a large selection of off-the-shelf megafunctions optimized for Altera devices. Designers can implement these parameterized blocks of IP easily, reducing design and test time. Altera has optimized the following DSP megacores for Stratix II device features such as DSP Blocks, TriMatrix Memory, and three-input adders: Finite Impulse Response (FIR) Compiler Fast Fourier Transform (FFT) Compiler Numerically Controlled Oscillator (NCO) Compiler Viterbi Compiler Turbo Encoder/Decoder Reed Solomon Encoder/Decoder Examples using Altera s DSP Megacore IP The following sections present examples that take advantage of the Stratix II device architecture and optimized DSP Megacore IP. The examples are based on FIR, NCO, and FFT, three very common DSP functions: The first example is a digital down converter for 3G wireless systems. A digital down converter, shown in Figure 2, is used in wireless systems for translating signals from an intermediate frequency (IF) to a baseband frequency. A digital down converter typically consists of an NCO, an FIR filter (pulse shaping decimation filter), a mixer, and an optional cascaded integrator comb (CIC) filter. Figure 2. Block Diagram for a Generic Digital Down Converter 3

Altera Corporation NCO Because of the 3G wireless systems stringent requirements, NCOs must generate very high-quality signals as measured by spurious free dynamic range. A typical spurious free dynamic range requirement for 3G is for the magnitude to be greater than 110 db. This block can be implemented with Altera s NCO Compiler 2.2.0 using the following parameters, based on the NCO MegaCore Function User Guide: Phase accumulator precision 32 bits Angular precision 18-bits Magnitude precision 18-bits Multiplier based algorithm Dithering ON Figure 3 shows a frequency response that meets the 3G wireless requirements for an NCO. Figure 3. NCO Frequency Response Figure 3 shows that the device meets the spurious free dynamic range requirements of greater than 110 db. This NCO design implemented in Stratix II devices gives a performance improvement of 25% over Stratix devices, because of the increased performance of the Stratix II DSP blocks, M4K memory, and the new logic structure. 4

Altera Corporation Table 1 summarizes the resource utilization and performance of Stratix II devices. Table 1. NCO Resource Utilization and Performance in Stratix II Devices Parameter Stratix II LEs (1) 236 Memory usage (M4K) 6 DSP block 9-bit elements 8 Speed (f max ) 361 MHz Notes to Table 1: (1): The LE count in Stratix II devices refers to the ALUTs reported in the Quartus II software. FIR The decimating FIR filter for 3G applications requires over 65 db of out-of-band signal rejection. A typical system clock rate is a multiple of the channel bandwidth of 3.84 MHz for 3G systems. Many systems use 61.44 MHz (3.84 16) as the system clock rate. The following example shows how to achieve those specifications using Altera s FIR Compiler 3.2. As an engineering experiment, the following parameters for a 3G decimating filter were chosen to meet the design requirements: 14-bit data 14-bit coefficients 120-tap Filter pipeline =1 # of channels = 1 multi-bit serial architecture Figure 4 shows a typical frequency response curve for an FIR. Figure 4. FIR Frequency Response 5

Altera Corporation Table 2 compares the common parameters that characterize Stratix and Stratix II devices. Table 2. FIR Benchmark Comparison Between Stratix & Stratix II Devices Parameter Stratix Stratix II Adder tree type Binary Ternary Number of adders 123 73 LEs (1) 3,691 1,776 Memory usage (M512) 28 21 Speed (f max ) 248 MHz 373 MHz Notes to Table 2: (1): The LE count in Stratix II devices refers to the ALUTs reported in the Quartus II software. The FIR Compiler 3.2 provides a multi-bit serial architecture, which allows the users to optimize their designs to a specific performance requirement. Adding multiple serial engines achieves higher performance by adding parallelism. The number of serial engines required is defined by the following equation: # of Serial Units = Round (Data Bit Width Required Filter Performance / Filter Fmax) Stratix # of Serial Units = Round (15 61.44 MHz / 248 MHz) = 4 Stratix II # of Serial Units = Round (15 61.44 MHz/ 373 MHz) = 3 In this way, the 24% increase in f max performance permits a 20% smaller design. The ternary adder capability within Stratix II devices reduces the number of adders within each serial unit, leading to an additional reduction in the logic resource usage. A Stratix II architecture can implement this design with 50% less logic resources and 25% less memory resources than Stratix architectures. This design equates to 8.9 GMACs (over twice the speed of the fastest DSP processors available 720 MHz 4 16 16 MACs) and uses less than 1% of an EP2S180 device s logic resources (1,776 of 179,400 Logic Elements). 6

Altera Corporation FFT An FFT is used in applications such as OFDM modems, filtering in the frequency domain, and spectrum analysis. Some of the most advanced spectrum analysis applications require a 1K-point FFT to be calculated in less than 2 s at 16-bits of precision. The FFT core generated by the FFT Compiler 2.1 targeting Stratix II devices can now achieve this using a dual engine, quad output, buffered burst architecture. Figure 5 shows the multiplier and memory options that are available with the FFT. Figure 5. FFT Implementation Options 7

Altera Corporation Table 3 summarizes the resource uitlization and performance for Stratix II devices with this FFT design. Table 3. 1K-Point FFT Resource Utilization & Performance for Stratix II Devices Parameter Stratix II LEs (1) 6,660 Memory usage (M4K) 14 DSP block 9-bit elements 36 Speed (f max ) 300 MHz Cycles (excl data load: 557 cycles) Transform calculation time 1.9 s Notes to Table 3: (1): The LE count in Stratix II devices refers to the ALUTs reported in the Quartus II software. The Stratix II implementation provides a 21% performance improvement over Stratix devices when designing this FFT. Using Altera s Quad Engine, Quad Output Architecture, this same FFT can be implemented in Stratix II with a transform calculation time of only 1.35 s. Furthermore, when compared to a 720-MHz discrete DSP processor, the FFT core is seven times faster and can fit in the smallest Stratix II device, the EP2S15. Figure 6 shows the contrast in speed among various devices Figure 6. 16-Bit Fixed-Point, 1,024-Point FFT Performance Comparison 8

Altera Corporation Summary Stratix II devices deliver DSP performance in the range of 300 to 400 MHz, providing significant speed increases in the DSP blocks and logic resources, as compared with Altera s Stratix family of devices. In many cases, such as in the FIR Filter example, this increase can lead to a significant reduction in logic resources for a given performance requirement, resulting directly in a lower cost design. Refer to the Stratix II Device Performance & Logic Efficiency Analysis white paper for detailed information on the performance of Stratix II devices and the increased efficiency that the new and innovative logic structure offers. 101 Innovation Drive San Jose, CA 95134 (408) 544-7000 www.altera.com Copyright 2005 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera Corporation in the U.S. and other countries.* All other product or service names are the property of their respective holders. Altera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. 9