High Performance DSP Solutions for Ultrasound By Hong-Swee Lim Senior Manager, DSP/Embedded Marketing Hong-Swee.Lim@xilinx.com 12 May 2008
DSP Performance Gap Performance (Algorithmic and Processor Forecast) 350 GMACs 30 GMACs 5 GMACs Algorithm Complexity DSP/GPP Performance Limit Virtex -DSP Spartan -DSP Traditional Processor Architectures 3D Medical Imaging Wireless Base Stations HD Audio/Video Broadcast Radar & Sonar HD Video Surveillance Mobile Software Defined Radio MIMO High End Ultrasound Low End Ultrasound Pico/Femto Base Stations Consumer Video SD/HD Video Surveillance Mobile Software Defined Radio Source: Jan Rabaey BWRC Time High Performance DSP Solutions 2
Agenda The Demand for DSP in Medical Imaging FPGAs The Programmable Ultra High DSP Performance Platform The DSP48E Slice Essential DSP Building Blocks Imaging Algorithms Digital Beamforming High Level Development Tools Conclusion High Performance DSP Solutions 3
A Little Ultrasound History Machines Images First Ultrasound introduced in mid 50s - Analog Processing Chain - Low Ultrasound Frequencies - 2-D Images - Small Image Sizes - Black and White Latest Ultrasounds - Digital Processing Chain - Higher Ultrasound Sample Frequencies (50 MHz) - Portable - 2-D,3-D and 4-D - Larger Image Sizes - Colour Images - Higher Quality - Elastography - Tissue Harmonic Imaging Trend: More and More Data being processed Faster and Faster High Performance DSP Solutions 4 Photo courtesy of Dynamic Imaging Limited, and Siemens Medical
FPGAs The Programmable High Performance DSP Platform High Performance DSP Solutions 5
Two Devices Over 30 GMACS XC3SD3400A Over 20 GMACS XC3SD1800A Spartan-3A DSP Overview Built on cost-effective, industry-accepted Spartan platform Superset of Spartan-3A platform Increased capacity of DSP resources, memory and logic Signal processing, memory capacity, bandwidth Integrated, cost-optimized XtremeDSP DSP48A slice High Performance DSP Solutions 6
Spartan-3A DSP DSP48A 250 MHz operation in the lowest cost speed grade High Performance DSP Solutions 7
The Virtex-5 DSP Messages Higher Performance (352 GMACs/s and 38% improvement over V-4) Optimized Ratio of Circuit Functions (Logic, Memory, and DSP) Expanded Functionality (Higher precision, SIMD) Lower Power (35% reduction in Dynamic Power over Virtex-4) High Performance DSP Solutions 8
BCOUT ACOUT Virtex-5 DSP48E Slice PCOUT B A 18 25 0 1 0 1 B REG CE D Q 2-Deep A REG CE D Q 2-Deep 18 25 48 A:B M REG CE D Q 72 36 0 36 0 1 0 17-bit shift 17-bit shift X Y Z ALUMode 4 48 P REG CE D Q 48 P C 48 C REG CE D Q 7 OpMode CarryIn 48 = PATTERN DETECT C or MC 48 BCIN ACIN 450 MHz operation in the slowest speed grade PCIN High Performance DSP Solutions 9
Common Functions DSP Designers need DSP48E provides the key functions for DSP Multipliers Multiply Accumulate A B opmode = 0000101 P A B opmode = 0100101 P Adders / Accumulators Multiply Add A C P P B opmode = 0110010 C opmode = 0110101 These form the building blocks for the majority of arithmetic functions required for DSP. Cascading capabilities for Multiply Add, Accumulate and adder chains is also a requirement for performance driven designs. High Performance DSP Solutions 10
DSP48E Slice Power Savings 70.0 Virtex-4-25x25: 14.3 mw/100mhz Average Power (mw) 60.0 50.0 40.0 30.0 20.0 10.0 0.0 0 200 400 600 Frequency (MHz) Dynamic Power Saving: 40% per DSP48E Slice 70% per 25x25 Multiplier Virtex-5-25x25: 3.6 mw/100mhz Virtex-4-18x18: 3.0 mw/100mhz Virtex-5-25x18: 1.8 mw/100mhz Conditions: 25C, nominal Vcc, Fully pipelined, (50% input toggle rate) based on HW test results, dynamic power consumption High Performance DSP Solutions 11
Larger Memories are Benefit to Imaging Each BRAM block can be used as 36K BRAM / FIFO or 18K BRAM 18K BRAM / FIFO One 36K BRAM / FIFO Two independent 18K BRAMs, or One 18K FIFO and 18K BRAM 36K BRAM size is doubled from Virtex-4 Significant in FFTs, Beamformer delays and Image Buffers High Performance DSP Solutions 12
Essential DSP Building Block High Performance DSP Solutions 13
Key Building Block DSP Functions in Ultrasound FIR Filters 2-D FIR Filters FFTs Floating Point Operators CIC Filters Adaptive Filtering Video Mixing and general Video Functions Addressed in this presentation High Performance DSP Solutions 14
High Performance Filters Sample Rate (Mhz) Log Scale 500 400 300 200 100 50 10 5 Parallel FIR Filters Sequential FIR Filters Semi-Parallel FIR Filters FPGAs can implement a complete spectrum of differing performance filters using the DSP48E High Performance filters of greater than 50 MHz sample rate are of most interest in Medical Imaging 0.5 1 1 5 10 20 50 100 Number of Coefficients (N) Log Scale 200 500 1000 High Performance structures must using multiple DSP48E in parallel to achieve required compute speed High Performance DSP Solutions 15
Parallel Systolic FIR Filter Filter Specification: Sampling Frequency = 450 Mhz, Coefficients = 31 Input time delay series is created inside the DSP Slice for maximum performance irrespective of the number of coefficients This filter structure while referred to as a Systolic FIR filter, it is really a Direct Form with one extra stage of pipelining x(n) 18 Coefficients are from left to right. This causes the latency to be as large and grow with the increase of coefficients K0 K1 K2 K30 K31 0 DSP48E Slice opmode = 0000101 Max Sample Rate = Clock Rate DSP48E Slice opmode = 0010101 Dedicated cascade connections (PCOUT and PCIN) are exploited to achieve maximum performance 41 y(n) Filter Size: 31 DSP48E Slices High Performance DSP Solutions 16
4 Multiplier Semi-Parallel FIR Filter Specification: Sampling Frequency = 100 Mhz, Coefficients = 16 x(n) 16 Input time delay series is created outside the Xtreme DSP Slice as SRL16E are required to store the set of inputs to drive each engine 18 18 18 18 The important thing to note about the addressing is that each SRL16E and coefficient memory buffer have identical addressing K0 K1 K2 K3 K4 K5 K6 K7 18 18 18 18 K8 K9 K10 K11 CE 0 D Q K12 K13 K14 K15 40 40 y(n) DSP48E Slice opmode = 0000101 DSP48E Slice opmode = 0010101 DSP48 Slice opmode = 0010010 The adder chain pipeline register is compensated for in the addressing of the memories. Hence each buffers address is delayed by one clock cycle Max Sample Rate = Clock Rate x Number of Multipliers Number of Taps An extra Xtreme DSP Slice is require to accumulate the results over the 4 clock cycles required before the slower capture register grabs the final output Filter Size: 5 DSP48E Slices 208 LUT6-FF Pairs (24 for control) High Performance DSP Solutions 17
FIR Compiler v3.1 Max Performance at the push of a button Ensures maximum performance, smallest area and in a simple to use wizard flow Provides all aspects of FIR Filter algorithm: - Number of Taps - Number of Channels - Single, Multi or Fractional Rate - Bit Widths Clock Frequency control enables trade-off between performance and area Resource Estimation Panel enables rapid resource analysis Verify System Specification for implementation High Performance DSP Solutions 18
FFT v4.1 Architectures Sample Rate (MSPS) 450 Delivered through Core Generator and System Generator Pipeline FFTs (input and sample every clock cycle) Radix-2 Single Delay Feedback (SDF) Streaming IO 1K FFT Resources: 16 DSP48, 6 BRAM, 3374 LUT6-FF Pair 200 Loop Engine FFTs (single butterfly processes each ranks) Radix-4 Dragonfly Loop Engine (Max throughput ~85 MSPS) 1K FFT Resources: 12 DSP48, 6 BRAM, 1748 LUT6-FF Pair 100 50 25 0 Radix-2 Butterfly Loop Engine (Max throughput ~50 MSPS) 1K FFT Resources: 4 DSP48, 3 BRAM, 868 LUT6-FF Pair Radix-2 Lite Butterfly Loop Engine (Max throughput ~25 MSPS) 1K FFT Resources: 2 DSP48, 3 BRAM, 742 LUT6-FF Pair High Performance DSP Solutions 19
Typical FFTs in Ultrasound Below is a typical FFT that is desired in Ultrasound Imaging systems and the requirements they place on the hardware implementations. Note the low Performance and area requirements Number of points 256 512 Sample Rate Buffer size (words) 10 MHz 512 200 KHz 1,024 Number of stages 8 9 Butterflies per stage 128 256 Total number of butterflies 1,024 2,304 Number of Multiplications Clock Cycles (300MHz) How many multipliers are required? 4,096 9,216 7,680 768,000 1 1 High Performance DSP Solutions 20
Radix-2 Loop Engine Supports Data Rates from 25 MSPS to 45 MSPS ROM for Twiddles Input Data Data DPM 0 Data DPM 1 RADIX-2 BUTTERFLY - Burst Interface (can be streaming with FIFO buffering) 2 input Radix-2 Engine Output Data High Performance DSP Solutions 21
Latest Architecture Lowers Area by 30% Reduced architecture (Radix-2 Lite Loop Engine) is smallest size (~30% smaller) Input Data Store data in single RAM Data DPM 0 ROM for Twiddles RADIX-2 BUTTERFLY Sine one cycle, cosine the next Data DPM 1 - Multiply real one cycle, imaginary the next Output Data Generate one output each cycle High Performance DSP Solutions 22
DSP48 enables Complex Multiplier and Butterfly Large Adders >16 bit do NOT reach top clock performance. DSP48E Slice opmode = 0010101 DSP48E Slice opmode = 0010101 RXm 18 RYm 18 IYm DSP48E Slice opmode = 0000101 Add / Sub IXm 0 SIMD mode of DSP48E enables max speed in the butterfly for efficient cost Sin / Cos LUT 2 cycle engine enables time sharing of DSP48Es in Buttefly Addition and Complex Multiplier. Lower Cost! High Performance DSP Solutions 23
FFT v4.1 Complete FFT at the push of a button Ensures High performance, minimal area and in a simple to use wizard flow Provides all aspects of FFT algorithm: - Transform Length - Number of Channels - Rounding and Scaling - Bit Widths Clock Frequency control enables trade-off between performance and area Resource Estimation Panel enables rapid resource analysis High Performance DSP Solutions 24
Virtex-5 SP FP Adder Input select DSP48E Exponent Alignment and addition LOD 25x18 DSP48E reduces resources by 50% Logic Normalization and round Output conditioning Floating Point Adder Size and Performance: 2 DSP48E Slices 366 LUT6-FF Pairs 410 MHz Latency = 12 Cycles High Performance DSP Solutions 25
Floating Point Operators v3.0 Floating Point is actually possible Ensures High performance, minimal area and in a simple to use wizard flow Comprehensive set of arithmetic operators: Add / Subtract Multiply Compare Fixed Float Conversion Divide Square-root High Performance DSP Solutions 26
Virtex-4 vs Virtex-5 Floating Pt Resource usage ( LUT-FF Pair / DSP48E ) 177 / 3 375 / 2 80 / 0 226 / 0 237 / 0 1370 / 0 787 / 0 Single Precision V-5 235 / 5 466 / 4 94 / 0 233 / 0 238 / 0 1370 / 0 787 / 0 Single Precision V-4 654 / 13 967 / 3 142 / 0 504 / 0 446 / 0 6002 / 0 3234/ 0 759 / 17 1220 / 4 161 / 0 565 / 0 523 / 0 6002 / 0 3234/ 0 Double Precision V-5 Double Precision V-4 Performance Goal Single Precision V-5 22% Faster! Single Precision V-4 Double Precision V-5 28% Faster! Double Precision V-4 Note: Maximum Latency Cores used High Performance DSP Solutions 27
How much Floating Point can Virtex-5 do? V-5 SX95T V-5 SX50T V-5 SX35T FF FF DSP48E DSP48E FF DSP48E Resource Utilization LUTs LUTs LUTs >50 GFLOPs possible in an 5SX95T High Performance DSP Solutions 28
Summary of Building Blocks DSP Algorithm FIR Filter 450 MSPS, 31 Tap,18-Bit FIR Filter 100 MSPS, 16 Tap,18-Bit FFT 300 MSPS, 1K Pt,18-Bit FFT 300 MSPS, 4K Pt,18-Bit Floating Point Operators Mult / Add Single Precision Floating Point Operators Complete Set of Operators, Single Precision Area 31 DSP48E Slices 0 BRAM 0 LUT6-FF Pairs 5 DSP48E Slices 0 BRAM 208 LUT6-FF Pairs 36 DSP48E Slices 7 BRAM 3,742 LUT6-FF Pairs 44 DSP48E Slices 19 BRAM 4,560 LUT6-FF Pairs 5 DSP48E Slices 0 BRAM 552 LUT6-FF Pairs 5 DSP48E Slices 0 BRAM 1436 LUT6-FF Pairs Clock Performance 450 MHZ 450 MHZ 305 MHZ 280 MHZ 410 MHZ 365 MHZ High Performance DSP Solutions 29
Key Imaging Algorithms High Performance DSP Solutions 30
Modalities and Algorithms Ultrasound Digital Beamforming Demodulation Image Forming Image Reconstruction B-Mode Doppler Colour Flow Processing M-Mode Elastography 2-D Noise Filtering 3-D & 4-D Imaging Video Functions High Performance DSP Solutions 31
Ultrasound System Overview MPEG-2 Encoding for DVD Tx and Rx not at same time Tissue Analysis and Diagnoses Video Scaling To ADC / DAC TX Beamformer Front End RX Beamformer Beamformer Control Demodulator Image Pre Processing Gray Level Image Reconstruction and manipulation techniques Doppler Processing Colour Flow Processing B Mode Processing Backplane to PCI / PCIe 3-D Graphics (GPU) Host PC and Display M Mode Processing 50 MSPS 200 MSPS 50 MSPS Slow KSPS High Performance DSP Solutions 32
Digital Beamforming: A Compute Problem Ultrasound Rx Beamformer To Transducers 12-Bit Multi- Channel Serial ADC 12-Bit Multi- Channel Serial ADC 1 1 1 1 1 S P S P S P S P S P 4 4 4 4 4 LPF LPF LPF LPF LPF Variable Delay Variable Delay Variable Delay Variable Delay Variable Delay Apodization Apodization Apodization Apodization Apodization Demodulator Key Questions 1. How many channels can I fit into a Single FPGA? 2. What is the cost and power per channel? High Performance DSP Solutions 33
A High Performance Beamformer Architecture Serial Inputs greatly reduces the required Pins of the FPGA 1 1 Serial to Parallel 12 12 Double Date Rate (DDR) IOs and Serial to Parallel converters slow the input data stream down to manageable rates 4 20 Tap Interpolation Filter 2 Channels interleaved to exploit the available FPGA performance reducing cost 2K Variable Delay 2K Variable Delay 2K Deep Delays fit perfectly in the Virtex-5 Block RAM and provide good beam steering ability 18 1 1 Serial to Parallel 12 12 4 Interpolation Filter enables finer control of individual beams 20 Tap Interpolation Filter 2K Variable Delay 2K Variable Delay Window 18 Window ~600 MSPS ~50 MSPS ~100 MSPS ~400 MSPS ~200 MSPS ~400 MSPS ~200 MSPS High Performance DSP Solutions 34
Multi-Channel Multi-Rate Filter 2 channel, 20-Tap, Interpolate by 4 Filter Input data stream is 2 Channel Time Division Multiplexed (TDM) Re-loadable Coefficient memories created out of small Dual Port Distributed Memories, capable of storing 3 different sets Simple Output reorder buffer to make sure output is TDM like the input x(n) 12 24 x 16 24 x 16 24 x 16 24 x 16 24 x 16 8 x 16 8 x 16 8 x 16 8 x 16 8 x 16 Reloadable Reloadable Reloadable Reloadable Coefficients Coefficients Coefficients Coefficients Re-order 0 Buffer y(n) 8 x 18 18 DSP48E Slice 1 opmode = 0000101 ALU Mode = 0000 Dedicated cascade connections (PCOUT and PCIN) are exploited to achieve maximum performance DSP48E Slice 2 opmode = 0010101 ALU Mode = 0000 Only a single Phase of the Interpolator is implemented and each clock cycle yields a new result from a 5 Tap Polyphase Arm. Each Channel processed in order Filter Size: 5 DSP48E Slices 250 LUT6-FF Pairs (80 for control) 400 MHz High Performance DSP Solutions 35
Variable Delay Element Interpolated samples are streamed into the Variable Delay x(n) 200 MHz Samples 18 Dual Port aspect of the Block RAMs are excellent for Delay Elements Counter 0-2048 Beam Delay Value 11 11 Variable Delay 2K x 18 18 (50 x Output Beams) MHz Beam Value are written into little memory Enables rapid update rate 2K deep Delays are perfect fit for Ultrasound Beam Steering and Virtex-5 Memories Filter Size: 2 Block RAM 100 LUT6-FF Pairs (80 for control) 250 MHz High Performance DSP Solutions 36
What is the Total Cost? Structure LUT6-FF Pairs DSP48 BRAM Serial to Parallel Converter 24 0 0 2 Channel Interpolation Filter 250 5 0 Variable Delays 100 0 2 Windowing Function 120 1 1 Summation 24 0 0 Total for 2 Channels 538 6 3 Miscellaneous Functions (Control Interface, DDR Memory Controller, DMA) 3500 0 0 64 Channel Beamformer: 192 DSP48E Slices 96 BRAM 20,716 LUT6-FF Pairs 400 MHz High Performance DSP Solutions 37 128 Channel Beamformer: 384 DSP48E Slices 60 % of 5VSX95T 192 BRAM 79 % of 5VSX95T 37,932 LUT6-FF Pairs 64 % of 5VSX95T 400 MHz
Potential Architecture ADC 8 channels @40Mhz, ADC 8 channels 12 bits @40Mhz, ADC 8 channels 12 bits ADC 8 channels @40Mhz, 12 bits @50Mhz, 12 bits 1 32 Channels per chip 128 Channels in total FPGA 1 5VSX35T Total Power for 128 channel digital receiver beamformer estimated at: ADC 8 channels @40Mhz, ADC 8 channels 12 bits @40Mhz, ADC 8 channels 12 bits ADC 8 channels @40Mhz, 12 bits @50Mhz, 12 bits ADC 8 channels @40Mhz, ADC 8 channels 12 bits @40Mhz, ADC 8 channels 12 bits ADC 8 channels @40Mhz, 12 bits @50Mhz, 12 bits 1 1 FPGA2 5VSX35T FPGA 3 5VSX35T 24bit (50 MHz) FPGA: ADC and VGA: Total estimated at: 23.6W Further investigation needed ~2.7 x 4 = 10.8 W ~0.8 x 16 = 12.8 W ADC 8 channels @40Mhz, ADC 8 channels 12 bits @40Mhz, ADC 8 channels 12 bits ADC 8 channels @40Mhz, 12 bits @50Mhz, 12 bits 1 FPGA 4 5VSX35T To demodulation High Performance DSP Solutions 38
Other Aspects to Consider Ultrasound Rx Single Channel Beamformer 1 S P 4 LPF Variable Delay Apodization Delay Calculator Apodization Calculator Apodization Apodization Calculator Output Beams Apodization Every channel and output beam also needs a delay calculator and also and Apodization Calculator. Can be done using external memory storing tables, or can by dynamic would like to work with you on beamforming as we consider Virtex-6 - What is your target Cost and Power per channel? High Performance DSP Solutions 39
TX Signal Flow Block Diagram @ 80 Msps Counter 9 Stored gain value (REG) 10 Each pulse is read out of storage on a programmable count value Pulse storage 1K x 18 Unique storage for each channel s pulses 9 9 10 Stored gain value (REG) Stored gain value (REG) 10 DAC transducer chan 0 Pulse storage 1K x 18 9 10 DAC transducer chan 1 Stored gain value (REG) Stored gain value (REG) 9 10 Pulse storage 1K x 18 9 10 DAC transducer chan N-1 Stored gain value (REG) Control Interface High Performance DSP Solutions 40
What is the Total Cost? Structure LUT6-FF Pairs DSP48 BRAM Transmit Waveform Storage 0 0 2 Control 50 0 0 Complex Gain and Summation 34 1 0 DAC Interface 50 0 0 Total for 2 Channels 134 1 2 Miscellaneous Functions (Control Interface) 500 0 0 64 Channel Tx Beamformer: 32 DSP48E Slices 64 BRAM 4,788 LUT6-FF Pairs 200 MHz High Performance DSP Solutions 41 128 Channel Tx Beamformer: 64 DSP48E Slices 10 % of 5VSX95T 128 BRAM 52 % of 5VSX95T 9,576 LUT6-FF Pairs 16 % of 5VSX95T 200 MHz Pin count is the most concern
Ultrasound System Overview MPEG-2 Encoding for DVD Tx and Rx not at same time Tissue Analysis and Diagnoses Video Scaling To ADC / DAC TX Beamformer Front End RX Beamformer Beamformer Control Demodulator Image Pre Processing Gray Level Image Reconstruction and manipulation techniques Doppler Processing Colour Flow Processing B Mode Processing Backplane to PCI / PCIe 3-D Graphics (GPU) Host PC and Display M Mode Processing 50 MSPS 200 MSPS 50 MSPS Key Questions High Performance DSP Solutions 42 Slow KSPS 1. What is the cost and power per demodulation channel? 2. What is the rate change of the demodulator? 3. What are the filter specifications?
Demodulation FIR Compiler FIR Compiler DDS Compiler 17 17 FIR1 100 Tap, Decimate by 10 17 FIR2 48 Tap 17 I Input DDS cos (2.π.f 1.t) sin (2.π.f 1.t) 17 17 FIR1 100 Tap, Decimate by 10 17 FIR2 48 Tap 17 Q Sample Rates 50 MHz 1 MHz Key Questions: What is the input and output clock frequency? Are the IP being used? How many channels? Is the rate change programmable? High Performance DSP Solutions 43
High Level Design Tools High Performance DSP Solutions 44
DSP Tools and Flows Accelerate DSP Design ISE Platform Studio AccelDSP System Generator System Generator High Performance DSP Solutions 45
DSP Development Environment offers a complete DSP design flow from The Mathworks MATLAB/Simulink model based design environment AccelDSP Synthesis MATLAB to FPGAs MATLAB Algorithm acceleration System Generator for DSP Simulink to FPGAs Simulink algorithm acceleration DSP system design RTL verification DSP IP and Reference Designs Hardware platforms AccelDSP MATLAB to gates System Generator Simulink to Gates High Performance DSP Solutions 46
Visual Data Flow Paradigm Polymorphic Block Libraries Bit and Cycle True Modeling Seamlessly Integrated with Simulink and MATLAB Test bench and data analysis System Generator for DSP Automatic Code Generation Synthesizable VHDL IP cores HDL test bench Project and constraint files High Performance DSP Solutions 47
Hardware Accelerated Simulation System Generator supports automated HIL flows to an extensive set of commercially available boards Up to 1000x simulation performance improvement Offers an easy way to accelerate algorithms for data effect analysis Automatically create FPGA bitstream from Simulink Transparent use of FPGA implementation tools High Performance DSP Solutions 48
Embedded Processor Design DSP software components can be quickly implemented on an embedded processor Integration to platform studio Interface details abstracted away through a shared memory interface System Generator Platform Studio Platform Studio pcore High Performance DSP Solutions 49
Echo Data Signal Data Echo Data Signal Data Estimates Estimates Coefficient Echo Data Signal Data Echo Data Disable Disable Adapt Cancelled Data System Integration Platform System Generator provides a common platform for integrating the the RTL, algorithm, software, interface and processor components of a DSP system Co-simulate in a DSP modeling environment Single flow to implementation VHDL / Verilog C/C++ Models Models DRAM DRAM Interface Page Buffer Page Buffer ulaw/alaw Conversion ulaw/alaw Conversion Page Buffer ulaw/alaw Conversion Speech and Tone Detection MATLAB Models Echo Canceller NLP ulaw/alaw Conversion Adaptive Algorithm and Echo Estimation System Control System Generator Models System Generator High Performance DSP Solutions 50
AccelDSP Design Flow Customer proven to increase productivity up to 20X! Typical MATLAB DSP Design Flow Floating-Pt. Algorithm Fixed-Point Conversion Architecture Definition Create / Integrate IP Blocks Create RTL Design Refine Architecture Verify RTL RTL Synthesis Floating-Pt. Algorithm Steps performed by AccelDSP AccelDSP AccelDSP Design Flow RTL Synthesis Replaces manual steps Integrated design flow We saw a 30% reduction in the design cycle time. This equated to an overall project development reduction of 15 percent, which provides two very significant benefits: we get our products to market faster and our teams are freed up to work on other projects sooner. Dr. Paul Turner Principal Systems Engineer Powerwave Technologies High Performance DSP Solutions 51
Floating- to Fixed-point Conversion Floating-point MATLAB models automatically converted into fixed-point Fixed-point bit widths Binary point conversion Saturation and rounding logic Process is user interactive and controllable Fixed-point hardware is automatically generated Analysis features help address reducedprecision arithmetic errors Signal probes, fixed-point reports, histogram overflow and underflow reporting Accel Probe High Performance DSP Solutions 52
Summary Shortened verification time for RTL models of DSP applications Accelerate DSP designs developed using MATLAB or Simulink algorithms in FPGA hardware Create complete DSP systems using embedded processors or FPGA co-processors High Performance DSP Solutions 53