Design and Implementation of Signal Processing Systems: An Introduction

Similar documents
Dr. D. M. Akbar Hussain

REAL TIME DIGITAL SIGNAL PROCESSING. Introduction

Using Soft Multipliers with Stratix & Stratix GX

DSP Design Lecture 1. Introduction and DSP Basics. Fredrik Edman, PhD

Chapter 6: DSP And Its Impact On Technology. Book: Processor Design Systems On Chip. By Jari Nurmi

Digital Logic, Algorithms, and Functions for the CEBAF Upgrade LLRF System Hai Dong, Curt Hovater, John Musson, and Tomasz Plawski

Digital Systems Design

Lecture 1. Tinoosh Mohsenin

EE 470 Signals and Systems

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

ARM BASED WAVELET TRANSFORM IMPLEMENTATION FOR EMBEDDED SYSTEM APPLİCATİONS

DSP VLSI Design. DSP Systems. Byungin Moon. Yonsei University

Implementation of FPGA based Design for Digital Signal Processing

Evolution of DSP Processors. Kartik Kariya EE, IIT Bombay

Control Systems Overview REV II

Hardware-Software Co-Design Cosynthesis and Partitioning

DIGITAL SIGNAL PROCESSING WITH VHDL

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

EE 351M Digital Signal Processing

EECS150 - Digital Design Lecture 28 Course Wrap Up. Recap 1

EE19D Digital Electronics. Lecture 1: General Introduction

Introduction (concepts and definitions)

Lecture 2: Embedded Systems: An Introduction

Problem Point Value Your score Topic 1 28 Filter Analysis 2 24 Filter Implementation 3 24 Filter Design 4 24 Potpourri Total 100

Techniques for Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

DAV Institute of Engineering & Technology Department of ECE. Course Outcomes

Implementing Logic with the Embedded Array

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Digital Signal Processing. VO Embedded Systems Engineering Armin Wasicek WS 2009/10

A Survey on Power Reduction Techniques in FIR Filter

Channelization and Frequency Tuning using FPGA for UMTS Baseband Application

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi

Keywords: Adaptive filtering, LMS algorithm, Noise cancellation, VHDL Design, Signal to noise ratio (SNR), Convergence Speed.

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

An Efficient Method for Implementation of Convolution

Qäf) Newnes f-s^j^s. Digital Signal Processing. A Practical Guide for Engineers and Scientists. by Steven W. Smith

LLRF4 Evaluation Board

Course Outcome of M.Tech (VLSI Design)

GR14 COURSE OUTCOMES ECE BOS

Video Enhancement Algorithms on System on Chip

VLSI System Testing. Outline

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen

Audio Sample Rate Conversion in FPGAs

MULTIRATE IIR LINEAR DIGITAL FILTER DESIGN FOR POWER SYSTEM SUBSTATION

Stratix II DSP Performance

Advanced FPGA Design. Tinoosh Mohsenin CMPE 491/691 Spring 2012

ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION

FIR_NTAP_MUX. N-Channel Multiplexed FIR Filter Rev Key Design Features. Block Diagram. Applications. Pin-out Description. Generic Parameters

VLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K.

VLSI Implementation of Digital Down Converter (DDC)

Digital Logic ircuits Circuits Fundamentals I Fundamentals I

CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER

Socware, Pacwoman & Flexible Radio. Peter Nilsson. Program Manager Socware Research & Education

Abstract of PhD Thesis

MS Project :Trading Accuracy for Power with an Under-designed Multiplier Architecture Parag Kulkarni Adviser : Prof. Puneet Gupta Electrical Eng.

Digital Signal Processing of Speech for the Hearing Impaired

A FFT/IFFT Soft IP Generator for OFDM Communication System

4.4 Implementation Structures in FPGAs and DSPs. Presented by Lee Pucker President, ForwardLink Consulting

6. FUNDAMENTALS OF CHANNEL CODER

EMBEDDED DOPPLER ULTRASOUND SIGNAL PROCESSING USING FIELD PROGRAMMABLE GATE ARRAYS

ece 429/529 digital signal processing robin n. strickland ece dept, university of arizona ECE 429/529 RNS

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

SCUBA-2. Low Pass Filtering

A High Definition Motion JPEG Encoder Based on Epuma Platform

[Devi*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785

Computer Aided Design of Electronics

Chapter 1. Introduction

The University of Texas at Austin Dept. of Electrical and Computer Engineering Final Exam

Data Word Length Reduction for Low-Power DSP Software

B.Tech III Year II Semester (R13) Regular & Supplementary Examinations May/June 2017 DIGITAL SIGNAL PROCESSING (Common to ECE and EIE)

Merging Propagation Physics, Theory and Hardware in Wireless. Ada Poon

A Low-Power Broad-Bandwidth Noise Cancellation VLSI Circuit Design for In-Ear Headphones

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

II Year (04 Semester) EE6403 Discrete Time Systems and Signal Processing

ISSN Vol.07,Issue.08, July-2015, Pages:

FPGA implementation of DWT for Audio Watermarking Application

GUJARAT TECHNOLOGICAL UNIVERSITY

Mel Spectrum Analysis of Speech Recognition using Single Microphone

On-Chip Implementation of Cascaded Integrated Comb filters (CIC) for DSP applications

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

SYLLABUS. For B.TECH. PROGRAMME ELECTRONICS & COMMUNICATION ENGINEERING

Low-Power CMOS VLSI Design

AC : INTERACTIVE LEARNING DISCRETE TIME SIGNALS AND SYSTEMS WITH MATLAB AND TI DSK6713 DSP KIT

Master of Science in Electrical and Electronics Engineering Department of Electrical and Computer Engineering

An FPGA Based Architecture for Moving Target Indication (MTI) Processing Using IIR Filters

Data Acquisition & Computer Control

MPEG-4 Structured Audio Systems

Chapter 5: Signal conversion

EECS 452 Midterm Exam Winter 2012

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Introduction. Reading: Chapter 1. Courtesy of Dr. Dansereau, Dr. Brown, Dr. Vranesic, Dr. Harris, and Dr. Choi.

EECS 452 Midterm Exam (solns) Fall 2012

Fixed-Point Aspects of MIMO OFDM Detection on SDR Platforms

Microcomputer Systems 1. Introduction to DSP S

INTRODUCTION. In the industrial applications, many three-phase loads require a. supply of Variable Voltage Variable Frequency (VVVF) using fast and

Design and FPGA Implementation of High-speed Parallel FIR Filters

Area Efficient and Low Power Reconfiurable Fir Filter

Datorstödd Elektronikkonstruktion

Rapid Design of FIR Filters in the SDR- 500 Software Defined Radio Evaluation System using the ASN Filter Designer

Transcription:

Design and Implementation of Signal Processing Systems: An Introduction Yu Hen Hu (c) 1997-2013 by Yu Hen Hu 1

Outline Course Objectives and Outline, Conduct What is signal processing? Implementation Options and Design issues: General purpose (micro) processor (GPP) Multimedia enhanced extension (Native signal processing) Programmable digital signal processors (PDSP) Multimedia signal processors (MSP) Application specific integrated circuit (ASIC) Re-configurable signal processors Multi-core architecture System on chip (c) 1997-2013 by Yu Hen Hu 2

Course Objectives A survey of embedded system platforms and design methodologies for multimedia signal processing and wireless communication A introduction to modern multimedia and wireless communication algorithms with focus on implementation considerations In-depth discussion of interactions between the algorithm formulation and the underlying implementation platform: Formulate algorithm to match architecture. Customize architecture to match algorithm. (c) 1997-2013 by Yu Hen Hu 3

Personal Information Appliances Ubiquitous anytime, anywhere communication Impacts on society and humanity Social networks, virtual world (c) 1997-2013 by Yu Hen Hu 4

Embedded System-on-Chip www.opencores.org: MMSoC for H.264 baseline encoding Integration of multiple subsystems on a single chip. Processors (cores, DSPs, P) Memories (flash, RAM) IPs (special purpose licensable functional blocks) Peripheral, I/O controls Benefits Fewer parts, smaller size, lower power, higher performance, shorter time to market (TTM) using IPs. Challenges Higher NRE (non-recurring engineering) design cost, lower yield (bigger chip) Suitable for embedded applications (c) 1997-2013 by Yu Hen Hu 5

DIOPSIS 740 System-on-Chip Dual-core DSP Dual-core System Integrating an ARM7TDMI ARM Thumb Processor Core and a magic DSP for Audio, Communication and Beamforming Applications (c) 1997-2013 by Yu Hen Hu 6

Multimedia/Communication Appl. Multimedia Applications Audio/Video/image codec Graphics, rendering, visualization, virtual environment Content analysis Characteristics Data intensive rather than control intensive Bit operations High-speed, real time operations Continuous rather than intermittent operations Communication Applications Software defined radio Cell phone base station Wireless Lan (WiFi, WiMAX) Ad hoc network (Bluetooth) Characteristics Bit operations High speed Programmability Portability Low power (c) 1997-2013 by Yu Hen Hu 7

Observations Embedded, low power multimedia/communication processing systems are emerging applications that demand a SoC platform based solution. The high-level of integration and complexity of SoC require close match between the algorithm and the architecture. Multimedia/communication SoC design issues Algorithm design Hardware software co-design Communication and Interface (c) 1997-2013 by Yu Hen Hu 8

Course Objectives Understand multimedia and wireless communication algorithms, esp. in the state of art standards H.264, LTE-A. Be familiar with modern algorithm level design and implementation alternatives Vectoring, unrolling, retiming, parallelism exploitation, numerical and accuracy, recurrent equations Understand how different platforms impact on different ways of algorithm implementations GPU, SoC Expose to system level design methodology, esp. the use of SystemC. (c) 1997-2013 by Yu Hen Hu 9

Course Outline Signal processing computing algorithms image and video coding standards JPEG2000, MPEG: DCT and DWT, motion estimation, entropy coding, H.264 AVC communication standard: 802.11g, Blue-tooth, ZigBee, WiMAX: OFDM, convolution coding, RS coding, synchronization, channel estimation, viterbi (maximum likelihood) decoding Algorithm representations, transformations: retiming, unfolding, folding Systolic array and design methodologies Native signal processing and multimedia extension Programmable DSPs, very Long Instruction Word (VLIW) Architecture Re-configurable, SOC, multi-core architectures Signal Processing arithmetic: CORDIC, and distributed arithmetic. (c) 1997-2013 by Yu Hen Hu 10

Course Conduct Instructor will give an introduction to each topic. Power point notes will be published on the web. Three to four homework assignments A take home final examination Final project presentation during the last week of semester (c) 1997-2013 by Yu Hen Hu 11

Signal Processing: An Overview

What is Signal Processing? Addressing the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals by digital or analog devices or techniques The term "signal" includes audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and other signals. http://www.signalprocessingsociety.org/ (c) 1997-2013 by Yu Hen Hu 13

Signal Signal Processing a function of time or spatial coordinates Scalar or vector dimension, real or complex value, batch or sequential (stream) processing Often contains noise due to acquisition, processing (and quantization), transmission, etc. Signal processing (computational perspective) Numerical computations (most frequent) Symbolic processing: often for coding purposes High throughput for real time applications Repetitive, predictable operations (inherent parallelism) Can tolerate error! GPS L5 signal plots of spectral flux density versus frequency and amplitude versus time for the Q channel http://www.dlr.de/en/desktopdefault.aspx/tabid-5105/8598_read-16927/ (c) 1997-2013 by Yu Hen Hu 14

Signal Processing Applications Communications: Modulation/Demodulation (modem) Channel estimation, equalization Channel coding Source coding: compression Imaging: Digital camera, scanner HDTV, DVD Audio 3D sound, surround sound Speech Coding Recognition Synthesis Translation Virtual reality, animation, Control Hard drive, Motor (c) 1997-2013 by Yu Hen Hu 15

Signal Processing Algorithms Mathematical equations Convolution, FIR filtering : Discrete Fourier transform (DFT): Often can be expressed in matrix-vector form Concise representation Inherent parallelism needs to be exploited to expedite processing Symbolic processing (coding) Huffman encoding: symbol A 10 (variable length binary bit stream) Symbol B 0010, etc. Bit level manipulation, Boolean logic operation J 1 y( n) h( j) x( n j) j 0 N 1 2 nk X ( k) x( n)exp[ j ] n 0 N N 1 1 2 nk x( n) X ( k)exp[ j ] N k 0 N (c) 1997-2013 by Yu Hen Hu 16

Signal Processing Algorithms What an implementer should know... The purpose of applying a signal processing algorithm to a given set of data and the associated performance goal There are often different ways (alternatives) to achieve the same goal of signal processing 100% accuracy is not always (often not) required for signal processing Leaves lots of rooms for design space exploration! (c) 1997-2013 by Yu Hen Hu 17

Graphic Representation Block Diagram D z 1 Delay by 1 time unit Using a register Direction of signal + X Operations, +, Example: FIR filter Signal Flow Graph x(n) x(n) z 1 a Delay a x(n) + b y(n) b y(n) x(n) x(n) x(n) x(n) z 1 x(n-1) z 1 x(n-2) x(n) z 1 z 1 X X X h(0) h(1) h(2) + + y (n) h(0) h(1) h(2) y(n) (c) 1997-2013 by Yu Hen Hu 18

FIR, IIR Digital Filter Let {h[n}: impulse response {x(n)}: input, {y(n)}: output Finite impulse response (FIR) filter: J 1 y( n) h( j) x( n j) j 0 Computation is the same as convolution. Impulse input: ( n) if x(n)= (n), y(n)=h(n) is the impulse response that has finite extent. 1 n 0, 0 n 0. Infinite impulse response (IIR) filter P y( n) a( i) y( n i) b( k) x( n k) i 1 k 0 The length of {y(n)} may be infinite! Recursive formula will impact on computation methods Stability concerns: The magnitude of y(n) may become infinity even all x(n) are finite! coefficient values, quantization error Q (c) 1997-2013 by Yu Hen Hu 19

Digital Filter Implementation Issues Specifications: What are the tolerant range of deviation from frequency domain specification? Accuracy: How accurate the output should be? Error accumulates with iterations, cascaded stages, overflows. Speed Latency Throughput Robustness To soft failure Missing/erroneous input data Design space parameters Structures and coefficients FIR or IIR? Filter structures Coefficient quantization Register length Arithmetic algorithm Over-flow handling method Quantization method Hardware/software partitions Batch vs sequential processing (c) 1997-2013 by Yu Hen Hu 20

Discrete Fourier Transform Discrete Fourier Transform X ( k) x( n) N 1 n 0 1 N x( n) exp[ N 1 k 0 2 nk ] N 2 nk X ( k) exp[ ] N To compute the N frequencies {X(k); 0 k N 1} requires N 2 complex multiplications Fast Fourier Transform Reduce the computation to O(N log 2 N) complex multiplications Makes it practical to process large amount of digital data. Many computations can be Speed-up using FFT Dawn of modern digital signal processing (c) 1997-2013 by Yu Hen Hu 21

Discrete Wavelet Transform H 0 (z), H 1 (z): low pass and high pass FIR digital filters. Maintain same number of input samples and output samples 2: down-sampling by a factor 2. x(n) H 0 (z) 2 H 0 (z) 2 H 0 (z) 2 y 1 (n) H 1 (z) 2 H 1 (z) 2 H 1 (z) 2 y 2 (n) y 3 (n) y 4 (n) (c) 1997-2013 by Yu Hen Hu 22

Constraints and Performance Measures BIBO stability If x(n) <, it is required that y(n) <. Poles should be inside unit circle: p j < 1 (for causal systems where h(n)=0 for n < 0.) Dynamic range overflow Intermediate or final result should not cause overflow Quantization error Should be bounded. Should not cause instability. Speed: Throughput rate and number of operations per data sample Hardware: Memory I/O, address calculation, register footprint, special hardware, etc. (c) 1997-2013 by Yu Hen Hu 23

Signal Processing Platforms

Evolution of Micro-Processor Micro-processors implemented a central processing unit on a single chip. Performance improved from 1MFLOP (1983) to 1GFLOP or above Word length (# bits for register, data bus, addr. Space, etc) increases from 4 bits to 64 bits today. Clock frequency increases from 100KHz to 1GHz Number of transistors increases from 1K to 50M Power consumption increases much slower with the use of lower supply voltage: 5 V drops to 1.5V (c) 1997-2013 by Yu Hen Hu 25

Native Signal Processing Use GPP to perform signal processing task with no additional hardware. Example: soft-modem, soft DVD player, soft MPEG player. Reduce hardware cost! May not be feasible for extremely high throughput tasks. Interfering with other tasks as GPP is tied up with NSP tasks. MMX (multimedia extension instructions): special instructions for accelerating multimedia tasks. May share same data-path with other instructions, or work on special hardware modules. Make use sub-word parallelism to improve numerical calculation speed. Implement DSP-specific arithmetic operations, eg. Saturation arithmetic ops. (c) 1997-2013 by Yu Hen Hu 26

ASIC: Application Specific ICs Custom or semi-custom IC chip or chip sets developed for specific functions. Suitable for high volume, low cost productions. Example: MPEG codec, 3D graphic chip, etc. ASIC becomes popular due to availability of IC foundry services. Fab-less design houses turn innovative design into profitable chip sets using CAD tools. Design automation is a key enabling technology to facilitate fast design cycle and shorter time to market delay. (c) 1997-2013 by Yu Hen Hu 27

Programmable Digital Signal Processors (PDSPs) Micro-processors designed for signal processing applications. Special hardware support for: Multiply-and-Accumulate (MAC) ops Saturation arithmetic ops Zero-overhead loop ops Dedicated data I/O ports Complex address calculation and memory access Real time clock and other embedded processing supports. PDSPs were developed to fill a market segment between GPP and ASIC: GPP flexible, but slow ASIC fast, but inflexible As VLSI technology improves, role of PDSP changed over time. Cost: design, sales, maintenance/upgrade Performance (c) 1997-2013 by Yu Hen Hu 28

Re-configurable Computing using FPGA FPGA (Field programmable gate array) is a derivative of PLD (programmable logic devices). They are hardware configurable to behave differently for different configurations. Slower than ASIC, but faster than PDSP. Once configured, it behaves like an ASIC module. Use of FPGA Rapid prototyping: run fractional ASIC speed without fab delay. Hardware accelerator: using the same hardware to realize different function modules to save hardware Low quantity system deployment (c) 1997-2013 by Yu Hen Hu 29

SoC (System-on-Chip) With the continuing scaling of modern IC devices, it is now possible to incorporate Micro-processor cores + ASIC function blocks Analog + digital components Computation + communication functions I/O, memory + processor into the same chip to form a comprehensive system. Thus, the notion of Systemon-chip (SoC) Soc uses intellectual properties (IPs) that are pre-designed modules. Designing SoC thus becomes a task of system integration. Challenge issues in SoC design: Interface among IPs from different venders Verification of function Physical design challenges (c) 1997-2013 by Yu Hen Hu 30

Multi-Core Processors IBM power4 chip with 2 cores A multi-core processor (or chip-level multiprocessor, CMP) combines two or more CPU cores on a single silicone chip composed of a single integrated circuit (IC), called a die. (c) 1997-2013 by Yu Hen Hu 31

Implementation of Signal Processing Systems

Implementation of DSP Systems Platforms: Native signal processing (NSP) with general purpose processors (GPP) Multimedia extension (MMX) instructions Programmable digital signal processors (PDSP) Media processors Application-Specific Integrated Circuits (ASIC) Re-configurable computing System on Chip Multi-core Requirements: Real time Processing must be done before a pre-specified deadline. Streamed numerical data Sequential processing Fast arithmetic processing High throughput Fast data input/output Fast manipulation of data (c) 1997-2013 by Yu Hen Hu 33

How Fast is Enough for DSP? It depends! Real time requirements: Example: data capture speed must match sampling rate. Otherwise, data will be lost. Example: in verbal conversation, delay of response can not exceed 50ms end-to-end. Processing must be done by a specific deadline. A constraint on throughput. Different throughput rates for processing different signals Throughput sampling rate. CD music: 44.1 khz Speech: 8-22 khz Video (depends on frame rate, frame size, etc.) range from 100s khz to MHz. (c) 1997-2013 by Yu Hen Hu 34

Design Issues Given a DSP application, which implementation option should be chosen? For a particular implementation option, how to achieve optimal design? Optimal in terms of what criteria? Software design: NSP/MMX, PDSP/MSP Algorithms are implemented as programs. Often still require programming in assembly level manually Hardware design: ASIC, FPGA Algorithms are directly implemented in hardware modules. S/H Co-design: System level design methodology. (c) 1997-2013 by Yu Hen Hu 35

Design Process Model Design is the process that links algorithm to implementation Algorithm Operations Dependency between operations determines a partial ordering of execution Can be specified as a dependence graph Implementation Assignment: Each operation can be realized with One or more instructions (software) One or more function modules (hardware) Scheduling: Dependence relations and resource constraints leads to a schedule. (c) 1997-2013 by Yu Hen Hu 36

Observations Eventually, an implementation is realized with hardware. However, by using the same hardware to realize different operations at different time (scheduling), we have a software program! Bottom line Hardware/ software co-design. There is a continuation between hardware and software implementation. A design must explore both simultaneously to achieve best performance/cost trade-off. (c) 1997-2013 by Yu Hen Hu 37

A Theme Matching hardware to algorithm Hardware architecture must match the characteristics of the algorithm. Example: ASIC architecture is designed to implement a specific algorithm, and hence can achieve superior performance. Formulate algorithm to match hardware Algorithm must be formulated so that they can best exploit the potential of architecture. Example: GPP, PDSP architectures are fixed. One must formulate the algorithm properly to achieve best performance. Eg. To minimize number of operations. (c) 1997-2013 by Yu Hen Hu 38

Algorithm Reformulation Matching algorithm to architectural features Similar to optimizing assembly code Exploiting equivalence between different operations Reformulation methods Equivalent ordering of execution: (a+b)+c = a+(b+c) Equivalent operation with a particular representation: a*2 is the same as left-shift a by 1 bit in binary representation Algorithmic level equivalence Different filter structures implementing the same specification! (c) 1997-2013 by Yu Hen Hu 39

Algorithm Reformulation (2) Exploiting parallelism Regular iterative algorithms and loop reformulation Well studied in parallel compiler technology Signal flow/data flow representation Suitable for specification of pipelined parallelism (c) 1997-2013 by Yu Hen Hu 40

Mapping Algorithm to Architecture Scheduling and Assignment Problem Resources: hardware modules, and time slots Demands: operations (algorithm), and throughput Constrained optimization problem Minimize resources (objective function) to meet demands (constraints) For regular iterative algorithms and regular processor arrays -> algebraic mapping. 15 (c) 1997-2013 by Yu Hen Hu 41

Mapping Algorithms to Architectures Irregular multi-processor architecture: linear programming Heuristic methods Algorithm reformulation for recursions. Instruction level parallelism MMX instruction programming Related to optimizing compilation. (c) 1997-2013 by Yu Hen Hu 42