QAM Receiver Reference Design V 10 Copyright 2011 2012 Xilinx Xilinx
Revision date ver author note 9-28-2012 01 Alex Paek, Jim Wu Page 2
Overview The goals of this QAM receiver reference design are: Easily scalable/parameterizable generic design: the design can easily adapt to changes in algorithm or specification by the use of parameters and sensible hierarchy As the data rate changes, the utilized resource will change accordingly Efficient design methodology and tools: Use the right tool depending on the nature of the module - Sysgen, Vivado HLS and RTL SOC: Easily integrate both logic fabric and the processor within a single chip (to be included in the next version) The specifications for the reference design: Programmable QAM setting: QPSK, 16,64,256QAM with the variable symbol rate up to 625 Msps IF input stream centered at 125 MHz (= Fs/4, Fs=50Msps) No DC offset or AGC control Timing Recovery loop Carrier/Phase Recovery loop LMS algorithm based adaptive equalizer with 16 symbol spaced taps Blind acquisition for carrier loop and equalizer No FEC The target device: Z7020 (has 220 DSP48, XX FF, XX LUTs, XX BRAMs) Clock rate: 200 MHz Page 3
Block Diagram Page 4
Design Consideration Select the right methodology with the following criteria: Time to prototype: Utilize the existing IP, reference design, such as FIRcompiler, DDScompiler, FFT Use the right type of design entry RTL, C, Sysgen Maintainability: it is inevitable that things change specification, algorithms, target device, etc; thus, the design should be created such that it can be easily understood and can be modified Consider: 1 make it as generic as possible by use of parameters, 2 create sensible hierarchy Debug capability: Create a high level golden model of the design (matlab/simulink or C), which should help cross-checking at the modular level and top level Recommendation for the demo design: Use Sysgen as a top level tool where all the modules are integrated, for its capability to create a sophisticated testbench At the module level, use: Vivado HLS: where complexity of the algorithm is high RTL: for any control type of design, and any existing RTL design where it make sense to use as-is Sysgen Block: FIR, FFT, etc Embedded processor: for MAC and higher layer processing Utilize the right resource: Use as wide data/coeff width as the FPGA macro like DSP48 allows (25X18) Take advantage of BRAM configuration (ie, 36Kx1, 18Kx2, etc) Often times, the BRAM usage is low in this type of design Page 5
Sysgen Top Level All the RTL modules generated by Vivado HLS are imported into the Sysgen toplevel using blackbox Page 6
Sysgen Top Level The default data width between the major blocks is: 1613 Page 7
Digital Down Converter (DDC) Includes: Mixer: frequency shift down by Fs/4 The real input stream is multiplied by 1,0,- 1,0, repeats, to produce I output; and 0,1,0,-1, repeats, to produce Q output decimate by 2 filter, 21 tap Takes advantage of zeros in the input stream Sysgen model shown below (using FIRcompiler, RTL for mixer) Resource: Processing clock rate of 200 MHz Page 8
DDC Decimate by 2 filter response Input spectrum to the DDC (with white Gaussian noise added in blue), the output of DDC Page 9
Timing Recovery Loop (TREC) Designed in C++, and synthesized to RTL by Vivado HLS The processing clock: 100 MHz Includes: Interpolation filter: 64 phases, 4 taps filter Phase NCO: generates the 1x symbol enable, 2x symbol enable for the rest of the RX Square raised root cosing filter, 48 tap Timing error detector PI Loop filter Resource: Page 10
SRRC (square raised root cosine) filter Data rate = 2x symbol rate (125 Msps) 33 symmetric taps RMS ISI = -38 db Peak ISI = -33 db Page 11 Copyright 2011 2012 Xilinx Xilinx
TREC Implementation Show VHLS implementation Page 12 Copyright 2011 2012 Xilinx Xilinx
TREC Behavior The output constellation of the SRRC output Transmitted source is 16 QAM with 50 PPM offset wrt symbol rate The output of the integrator term in the PI loop filter The output spectrum of the SRRC filter (sampled at 2x symbol rate) Page 13
Carrier Recovery Loop (CREC) Designed in C++, and synthesized to RTL by Vivado HLS Mode to bypass EQ while CREC is in acquisition mode for faster acquisition time helpful when the carrier offset is high CREC and EQ can operate in autonomous mode by Acquisition/Tracking Control block Uses RCA Reduced Constellation Algorithm for blind acquisition The processing clock: 100 MHz Includes: De-rotator Slicer PI loop filter Phase detection VCO Acquisition/Tracking control Resource: Page 14
CREC Behavior Input, output of the CREC The input source is 100 ppm off from IF frequency (125 KHz = 100e-6*125e6) VCO output, CREC loop filter integrator term output X axis is the symbol unit Page 15
CREC Implementation Show VHLS implementation Page 16 Copyright 2011 2012 Xilinx Xilinx
Adaptive Equalizer (EQ) symbol spaced EQ, 16 tap 24 bit coefficients DLMS (Delayed LMS) algorithm, allowing pipelining the error feedback term for LMS update Switching between blind acquisition mode using MMA (multilevel modulus algorithm) and tracking mode decision directed mode, based on the average slicer error Resource: Page 17
DLMS Algorithm The critical path in LMS equalizer consists of computation of the filter output - y(n), and error term - e(n), which is multiplied with the step size and used to compute the next set of coefficients to apply in the FIR operation In DLMS, we can introduce delays in computation of error term, such that instead of applying e(n) to compute the next set of coefficients - C(n+1), we can use the error term D samples ago - e(n-d), which thus allows D pipe stages It is critical that when we use e(n-d), we need to align the equalizer input accordingly Below is the DLMS algorithm ' C( n 1) C( n) * e ( n D)* X( n D) e( n D) d( n D) y( n D) y( n) C( n)* X( n) C( n) :coefficien t array X( n) :data array y( n) : equalizer output d( n) :desired value, can be a training data or sliced data e( n) :error term : updatestep size
EQ Behavior The input source is 16QAM and going thru slight multipath channel with AWGN Spectrum at the RX input and the output of SRRC Input/output of the equalizer Page 19
EQ Behavior The equalizer coefficients update Slicer SNR, CREC integral term Page 20
EQ Implementation Show VHLS implementation Page 21 Copyright 2011 2012 Xilinx Xilinx
HW-cosimulation ZC702: x speed up Describe several options to speed up the simulation Matlab callable HWcosim Using script for batch simulation Using frame based input/output Real time HW platform Page 22
Overall Resource Page 23
Reference 1 UG902, Vivado Design Suite User Guide, High-Level Synthesis 2 DS795, FIR Compiler v63 Data Sheet 3 JYang, JJWerner, and GADumont, The Multi modulus blind equalizer and its generalized algorithms IEEE Journal on selected areas on commun, Vol20, NO5, pp997-1015, June2002 4 G Long, The LMS algorithm with delayed coefficient adaptation, IEEE transaction on acoustics, speech and signal processing 37, 1989 5 R Poltman, Conversion of the delayed LMS algorithm into the LMS Algorithm, IEEE signal processing letters 2, 1995 Page 24
Backup slides Page 25