Lecture 3. FIR Design and Decision Feedback Equalization

Similar documents
Lecture 3. FIR Design and Decision Feedback Equalization

EE290C - Spring 2004 Advanced Topics in Circuit Design High-Speed Electrical Interfaces. Agenda

Direct and Recursive Filters

Circuit Design for a 2.2 GByte/s Memory Interface

To appear in IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, San Francisco, February 2002.

A Variable-Frequency Parallel I/O Interface with Adaptive Power Supply Regulation

ECEN720: High-Speed Links Circuits and Systems Spring 2017

DIGITAL SIGNAL PROCESSING WITH VHDL

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

ECEN720: High-Speed Links Circuits and Systems Spring 2017

IN SEVERAL wireless hand-held systems, the finite-impulse

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

Scalability of Programmable FIR Digital Filters

ECEN620: Network Theory Broadband Circuit Design Fall 2012

ECEN620: Network Theory Broadband Circuit Design Fall 2014

Tirupur, Tamilnadu, India 1 2

L15: VLSI Integration and Performance Transformations

Simulation technique for noise and timing jitter in phase locked loop

Techniques for Implementing Multipliers in Stratix, Stratix GX & Cyclone Devices

Computer Arithmetic (2)

MULTIRATE IIR LINEAR DIGITAL FILTER DESIGN FOR POWER SYSTEM SUBSTATION

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

A Comparative Study on Direct form -1, Broadcast and Fine grain structure of FIR digital filter

Source Coding and Pre-emphasis for Double-Edged Pulse width Modulation Serial Communication

International Journal of Advanced Research in Computer Science and Software Engineering

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.3

Overview of System Interfaces

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

A Survey on Power Reduction Techniques in FIR Filter

Digital Signal Processing. VO Embedded Systems Engineering Armin Wasicek WS 2009/10

Digital Systems Design

A 0.2-to-1.45GHz Subsampling Fractional-N All-Digital MDLL with Zero-Offset Aperture PD-Based Spur Cancellation and In-Situ Timing Mismatch Detection

CHAPTER 4 ANALYSIS OF LOW POWER, AREA EFFICIENT AND HIGH SPEED MULTIPLIER TOPOLOGIES

An area optimized FIR Digital filter using DA Algorithm based on FPGA

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

Self-Biased PLL/DLL. ECG minute Final Project Presentation. Wenlan Wu Electrical and Computer Engineering University of Nevada Las Vegas

BER-optimal ADC for Serial Links

Using Soft Multipliers with Stratix & Stratix GX

A Low Power Switching Power Supply for Self-Clocked Systems 1. Gu-Yeon Wei and Mark Horowitz

Another way to implement a folding ADC

A 50 MHz 8-Tap Adaptive Equalizer for Partial-Response Channels

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2010

2. ADC Architectures and CMOS Circuits

L15: VLSI Integration and Performance Transformations

To learn fundamentals of high speed I/O link equalization techniques.

Low Power Techniques and Design Tradeoffs in Adaptive FIR Filtering for PRML Read Channels

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Keywords: Adaptive filtering, LMS algorithm, Noise cancellation, VHDL Design, Signal to noise ratio (SNR), Convergence Speed.

Section 1. Fundamentals of DDS Technology

Lecture 160 Examples of CDR Circuits in CMOS (09/04/03) Page 160-1

SPIRO SOLUTIONS PVT LTD

Low-Power Digital CMOS Design: A Survey

Stratix II DSP Performance

Lecture 15: Clock Recovery

Implementation of FPGA based Design for Digital Signal Processing

Low Power FIR Filter Structure Design Using Reversible Logic Gates for Speech Signal Processing

Design of a Power Optimal Reversible FIR Filter for Speech Signal Processing

A New RNS 4-moduli Set for the Implementation of FIR Filters. Gayathri Chalivendra

Implementing Logic with the Embedded Array

IJMIE Volume 2, Issue 5 ISSN:

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

EE247 Lecture 23. EECS 247 Lecture 23 Pipelined ADCs 2008 H.K. Page 1. Pipeline ADC Block Diagram DAC ADC. V res2. Stage 2 B 2.

Accomplishment and Timing Presentation: Clock Generation of CMOS in VLSI

Area Efficient and Low Power Reconfiurable Fir Filter

6.976 High Speed Communication Circuits and Systems Lecture 21 MSK Modulation and Clock and Data Recovery Circuits

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Lecture 11: Clocking

Optimized FIR filter design using Truncated Multiplier Technique

Digital Integrated CircuitDesign

5Gbps Serial Link Transmitter with Pre-emphasis

Integrated Circuit Design for High-Speed Frequency Synthesis

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

Multiple Reference Clock Generator

Lecture 15: Clock Recovery

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

/$ IEEE

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Design and Implementation of Reconfigurable FIR Filter

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

b(n) a(n) y(n) + + x(n) (a) y(n) x(n) (b) b(2k) a(2k) y(2k) + + x(2k) b(2k+1) a(2k+1) y(2k+1) + + x(2k+1) (c)

15.3 A 9.9G-10.8Gb/s Rate-Adaptive Clock and Data-Recovery with No External Reference Clock for WDM Optical Fiber Transmission.

VLSI Broadband Communication Circuits

Digital Signal Processing

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Innovative Approach Architecture Designed For Realizing Fixed Point Least Mean Square Adaptive Filter with Less Adaptation Delay

Resource Efficient Reconfigurable Processor for DSP Applications

VLSI Implementation of Reconfigurable Low Power Fir Filter Architecture

EECS 427 Lecture 22: Low and Multiple-Vdd Design

Signals and Filtering

DSP Design Lecture 1. Introduction and DSP Basics. Fredrik Edman, PhD

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Delay-Locked Loop Using 4 Cell Delay Line with Extended Inverters

OPTIMIZATION OF LOW POWER USING FIR FILTER

Data Word Length Reduction for Low-Power DSP Software

Low Phase Noise CMOS Ring Oscillator VCOs for Frequency Synthesis

Design and Evaluation of Stochastic FIR Filters

LLRF4 Evaluation Board

Transcription:

Lecture 3 FIR Design and Decision Feedback Equalization Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2007 by Mark Horowitz, with material from Stefanos Sidiropoulos, and Bora Nikolic 1 Readings Readings (for next lecture on adders) Chandrakasan Chapter 10.1-10.2.10 Harris Taxonomy of adders (either paper on web or WH 10.2 to 10.2.2.9 Overview: Finish up some timing issues from high-speed links Your project will be the design of a decision feedback equalizer, but most of the hardware will be the same as a normal FIR filter. So the lecture will start talking about FIR filter design, and then will go into the added issues with building a DFE. WARNING: I am not an expert in this area, so there might be better ideas out there (and some bugs in these notes) The FIR notes are from Bora Nikolic at UCB. 2

I/O Clocking Issues Remember the clocking issues: Long path constraint (setup time) Short path constraint (hold time) Need to worry about them for I/O as well For I/O need to worry about a number of delays Clock skew between chips Data delay between chips Can be larger than a clock cycle (speed of light) Clock skew between external clock and internal clock This can be very large if not compensated It is essentially the insertion delay of the clock tree 3 System Clocking: Simple Synchronous Systems CK X d1 CK X D I CK C1 CK C2 D I d2 on-chip logic CK C1 CK C2 Long bit times compared to on chip delays: Rely on buffer delays to achieve adequate timing margin 4

PLLs: Creating Zero Delay Buffers PLL/DLL CK X CK C CK X D I on-chip logic D I CK C On-chip clock might be a multiple of system clock: Synthesize on-chip clock frequency On-chip buffer delays do not match Cancel clock buffer delay 5 Used to Argue About PLLs vs DLLs VCO VCDL clk clk ref clk N PD Filter Second/third order loop: Stability is an issue Frequency synthesis easy Ref. Clk jitter gets filtered Phase error accumulates ref clk PD Filter First order loop: Stability guaranteed Frequency synthesis problematic Ref. Clk jitter propagates Phase error does not accumulate 6

After Many Years of Research And many papers and products One can mess up either a DLL or PLL Each has it own strengths and weaknesses If designed correctly, either will work well Jitter will be dominated by other sources Many good designs have been published It is now a building block that is often reused We all have our favorites, mine is the dual-loop design And yes, people use ring oscillators Still an open question about how much LC helps (in system) 7 Clocking Structures Synchronous: Same frequency and phase Conventional buses t t F 0 Mesochronous Same frequency, unknown phase Fast memories Internal system interfaces MAC/Packet interfaces t A t A t B F 0 t B Plesiochronous: Almost the same frequency Mostly everything else today F 1 F 2 F 1 F 2 8

Source Synchronous Systems CK SRC PLL/DLL CK RCV data rcvr logic ref CK SRC data D 0 D 1 D 2 D 3 CK RCV Position on-chip sampling clock at the optimal point i.e. maximize timing margin 9 Serial Link Circuit rcvr logic D IN D 0 D 1 CK R D IN CDR CK R Recover incoming data fundamental frequency Position sampling clock at the optimal point 10

Finite Impulse Response Filters In DSP filters are done in the discrete time domain Instead of x(t), x n Filter is formed by convolution of input with filter h(t) Output at every point is the sum: y [ n] = a x[ n] + a x n 1] + a x[ n 2] +... + a x[ n N 1] 0 1 [ 2 N + This is generally called an FIR filter Finite impulse response filter (output depends only on input) IIR filters have output depend on prior output Infinite impulse response (like RC circuits) 11 Transversal Filter y [ n] = a x[ n] + a x n 1] + a x[ n 2] +... + a x[ n N 1] 0 1 [ 2 N + 12

Critical Path Digital FIR T = T mult + (N-1)T add 13 One Point To Keep In Mind We are working with small signal values For binary (2 PAM) x is in {0,1} For 4PAM x is in {0,1,2,3} So multiplication is generally not an issue For 2 PAM it is trivial For 4 PAM one shift and add The problem is the adds While x is one or two bits, the a are larger Generally larger then input precision Since you need to add many of them up and have small quantization errors. 14

Pipelining Pipelining can be used to increase throughput True for digital and mixed signal inplementations Pipelining: Adding same number of delay elements In each forward cutset (in the data-flow graph) From the input to the output Cutset: set of edges that if removed, graph becomes disjoint Forward cutset: cutset from input to output over all edges Plus - Increases frequency Minus - Increases latency and register overhead (power, area) 15 Pipelining 3-tap FIR 16

Pipelined Direct FIR Critical path T = T mult + T add 17 Multi-Operand Addition Adders form a tree T = T mult + (log 2 N)T add 18

Multi-Operand Addition Using 3:2 or 4:2 compression This is the same as a multiplier tree (in two lectures) Optional pipelining, 1-2 stages 19 Transposing FIR Transposition: Reversing the direction of all the edges In a signal-flow graph, Interchanging the input and output ports Functionality unchanged 20

Transposed FIR Represent as a signal-flow graph 21 Transposed FIR Critical path shortened Input loading increased T = T mult + T add 22

Parallel FIR Feed-forward algorithms are easy to parallelize Processing element representation of a transversal filter a 1 x[n] x[n-1] x[n-2] 0 a 0 a 1 a 2 y[n] Processing element Transversal filter 23 Parallel FIR Two parallel paths Two cycles to complete operation Can be extended to more Two parallel path FIR Processing element 24

Table Lookup If the input data is only one or two bits There are not that many input combinations Rather than adding the numbers together Add them before hand, and just store the results in a SRAM Address of SRAM is just sequence of inputs to filter x n x n-1 x n-2 x n-3 x n-4 Values in memory 00000 0 00001 a4 00010 a3 00011 a3+a4 Replaces adds and multipliers by memory But it grows exponentially with number of bits needed 25 Decision Feedback Equalization The main problem with DFE You need the output of the FIR filter NOW Need it to generate the next bit Latency in the FIR filter is a problem 26

Practical Digital Equalizers Mita, ISSCC 96, two parallel paths 150Mb/s 0.7µm BiCMOS 27 Practical Digital Equalizers Moloney, JSSC 7/98, 2 parallel paths, 3:2 Wallace 150Mb/s 0.7µm BiCMOS 28

Practical Digital Equalizers Wong, Rudell, Uehara, Gray JSSC 3/95, 4 parallel paths 50Mb/s, 1.2µm CMOS 29 Practical Digital Equalizers Thon, ISSCC 95 Transposed filter, 240Mb/s 0.8µm 3.7V CMOS, 150mW Semi-static coefficients, Booth-encoded 30

Practical Digital Equalizers Staszewski, JSSC 8/00 2 parallel transposed paths, Booth encoded data 550Mb/s 0.21µm CMOS, 36mW 31 Practical Digital Equalizers Rylov, ISSCC 01 2.3Gb/s, 1.2W, 0.18µm domino CMOS Distributed arithmetic 32

Practical Digital Equalizers Tierno, ISSCC 02 1.3Gb/s, 450mW, 0.18µm 2.1V domino CMOS 33 TI DFE Design ISSCC 07 Uses Memory lookup Runs at 12Gs/s Binary Check it out 34

References from Bora Nikolic R. Jain, P.T. Yang, T. Yoshino, "FIRGEN: a computer-aided design system for high performance FIR filter integrated circuits," IEEE Transactions on Signal Processing, vol.39, no.7, pp.1655-1668, July 1991. R.A. Hawley, B.C. Wong, T.-J. Lin, J. Laskowski, H. Samueli, "Design techniques for silicon compiler implementations of high-speed FIR digital filters," IEEE Journal of Solid-State Circuits, vol.31, no.5, pp.656-667, May 1996. W.L. Abbott, et al, A digital chip with adaptive equalizer for PRML detection in hard-disk drives IEEE International Solid- State Circuits Conference, Digest of Technical Papers, ISSCC 94, San Francisco, CA, Feb. 16-18, 1994, pp. 284-285. D.J. Pearson, et al, Digital FIR filters for high speed PRML disk read channels, IEEE Journal of Solid-State Circuits, vol.30, no.12, pp.1517-1523, May 1995. S. Mita, et al, A 150 Mb/s PRML chip for magnetic disk drives, IEEE International Solid-State Circuits Conference, Digest of Technical Papers, ISSCC 96, San Francisco, CA, Feb. 8-10, 1996, pp. 62-63, 418. D. Moloney, J. O'Brien, E. O'Rourke, F. Brianti, "Low-power 200-Msps, area-efficient, five-tap programmable FIR filter," IEEE Journal of Solid-State Circuits, vol.33, no.7, pp.1134-1138, July 1998. C.S.H. Wong, J.C. Rudell, G.T. Uehara, P.R. Gray, "A 50 MHz eight-tap adaptive equalizer for partial-response channels," IEEE Journal of Solid-State Circuits, vol.30, no.3, pp.228-234, March 1995. L.E. Thon, P. Sutardja, F.-S. Lai, G. Coleman, "A 240 MHz 8-tap programmable FIR filter for disk-drive read channes," 1995 IEEE International Solid-State Circuits Conference, Digest of Technical Papers, ISSCC '95, pp.82-3, 343, San Francisco, CA, 15-17 Feb. 1995. R. B. Staszewski, K. Muhammad, P. Balsara, "A 550-MSample/s 8-Tap FIR Digital Filter for Magnetic Recording Read Channels," IEEE Journal of Solid-State Circuits, vol. 35, no. 8, pp. 1205-1210, August 2000. S. Rylov, et al, A 2.3 GSample/s 10-tap digital FIR filter for magnetic recording read channels, IEEE International Solid- State Circuits Conference, Digest of Technical Papers, ISSCC 01, San Francisco, CA, Feb. 5-7, 2001, pp. 190-191. J. Tierno, et at, A 1.3 GSample/s 10-tap full-rate variable-latency self-timed FIR filter with clocked interfaces, IEEE International Solid-State Circuits Conference, Digest of Technical Papers, ISSCC 02, San Francisco, CA, Feb. 3-7, 2002, pp. 60-61, 444. 35