Xilinx Answer Link Tuning For UltraScale and UltraScale+

Xilinx Answer 70918 Link Tuning For UltraScale and UltraScale+ Important Note: This downloadable PDF of an Answer Record is provided to enhance its usability and readability. It is important to note that Answer Records are Web-based content that are frequently updated as new information becomes available. You are reminded to visit the Xilinx Technical Support Website and review (Xilinx Answer 70918) for the latest version of this Answer. Introduction High Speed Serial IO must support datarates that were not conceivable only ten years ago. But even if a fundamental help is provided by adapting equalizers, there is still room for the engineer s contribution, who as a craftsperson tries to optimize and make the transmission of the signal more robust. This Long Answer Record is an introduction to link tuning of UltraScale and UltraScale+ GTH and GTY and defines a standard process link optimization, starting from a deeper receiver architecture knowledge. The first part of the AR will summarize the knobs for link tuning. It will give a priority order of the tuning parameters and ports. It will also review the adaptation blocks that are the key elements in a good tuning procedure. The link tuning can be achieved in different ways: manually or automatically directly on the hardware, or with IBIS-AMI simulations. Both methods will be addressed. Xilinx Answer 70918 Link Tuning 1

Table of Contents Xilinx Answer 70918... 1 Introduction... 1 List of controllable knobs and priority... 3 User controllable knobs... 3 Receiver Architecture... 5 RX Termination... 6 CTLE Stage... 7 CTLE1 KH... 7 CTLE2 KL... 8 CTLE3 - AGC... 8 DFE... 9 Adaptation... 11 VP UT loops... 11 H(n)... 12 KH loop for CTLE1... 13 KL loop for CTLE2... 13 DFE mode... 13 LPM mode... 13 AGC loop for CTLE3... 13 Link tuning... 15 Link Tuning based on the adaptation coefficients... 15 Tuning of Target VP... 15 Tuning of Far End Transmitter... 23 Link Tuning based on IBIS-AMI Simulation... 25 References... 26 Revision History... 26 Xilinx Answer 70918 Link Tuning 2

List of controllable knobs and priority When doing a link optimization the user should start from a known and well documented set of parameters and ports and not try to reverse engineer or hack the transceiver reserved parameters. The principal reason why hacking the GT is never a good idea is that the final configuration should work in all temperature corner cases, with all silicon devices, in all power conditions. Only the characterized and documented configurations guarantee optimal transceiver performances in PVT space. Always start a new design from the Wizard GUI. The Wizard GUI asks basic information about the GT setup and the channel behavior. If you leave the Wizard to decide, the result is not likely to be the optimal choice for your channel. For example, the channel insertion loss (IL) at Nyquist frequency (half of the data rate) will drive the choice of LPM or the DFE equalizer. Two completely different receiver configurations can be generated, depending on the IL initial choice. It is highly recommended to spend some effort in modelling the channel and to drive the equalizers selection manually. Also, the CTLE adaptation mode will be configured depending on IL. The low band and high band filters adapt together at low losses; and vice versa they adapt independently for high insertion losses. User controllable knobs Basic knobs can fix most problems. Always start from tuning this basic list of user controllable ports and parameters. The first one (Target VP) belongs to the receiver, the other three are the main configuration of the transmitter. Definition UltraScale UltraScale+ Target VP RXDFE_GC_CFG1[6: 0] RXDFE_GC_CFG2[6: 0] Main Cursor TXDIFFCTRL TXDIFFCTRL Pre Cursor TXPRECURSOR TXPRECURSOR Post Cursor TXPOSTCURSOR TXPOSTCURSOR Table 1. Basic User controllable knobs Xilinx Answer 70918 Link Tuning 3

Advanced controllable knobs In rare cases the manual override or hold of the receiver equalizers could help. Please refer to the User Guide (UG576), table 4-10, *HOLD and *OVRDEN ports. Port name Description Equalizer {RXOSHOLD, RXOSOVRDEN} 2'b00: OS Offset cancelation loop adapt 2'b10: Freeze current adapt value 2'bx1: Override OS value DFE {RXDFEAGCHOLD, RXDFEAGCOVRDEN} {RXDFELFHOLD, RXDFELFOVRDEN} {RXDFEUTHOLD, RXDFEUTOVRDEN} {RXDFEVPHOLD, RXDFEVPOVRDEN} {RXDFETAP*HOLD, RXDFETAP*OVRDEN} {RXDFECFOKHOLD, RXDFECFOKOVREN} {RXDFEKHHOLD, RXDFEKHOVRDEN} Table 2. DFE related advanced knobs {HOLD, OVRDEN} RX DFE 2'b00: Automatic gain control (AGC) loop adapt 2'b10: Freeze current AGC adapt value 2'bx1: Override AGC value {HOLD, OVRDEN} RX DFE 2'b00: KL Low frequency loop adapt 2'b10: Freeze current KL adapt value 2'bx1: Override KL {HOLD, OVRDEN} RX DFE 2'b00: UT Unrolled threshold loop adapt 2'b10: Freeze current UT adapt value 2'bx1: Override UT {HOLD, OVRDEN} RX DFE 2'b00: VP Voltage peak loop adapt 2'b10: Freeze current VP adapt value 2'bx1: Override VP value {HOLD, OVRDEN} RX DFE 2'b00: TAP* loop adapt 2'b10: Freeze current TAP* adapt value 2'bx1: Override TAP* value UltraScale+ only {HOLD, OVRDEN} RX DFE 2'b00: CFOK adapt 2'b10: Freeze current CFOK adapt value 2'bx1: Override CFOK value UltraScale+ only {HOLD,OVRDEN} RX DFE 2'b00: KH high-frequency loop adapt 2'b10: Freeze current KH adapt value 2'bx1: Override KH DFE DFE DFE DFE DFE DFE DFE Xilinx Answer 70918 Link Tuning 4

Port name Description Equalizer {RXLPMLFHOLD, RXLPMLFKLOVRDEN} {HOLD, OVRDEN} RX LPM 2'b00: KL Low frequency loop adapt 2'b10: Freeze current adapt value LPM {RXLPMHFHOLD, RXLPMHFOVRDEN} {RXLPMOSHOLD, RXLPMOSOVRDEN} {RXLPMGCHOLD, RXLPMGCOVRDEN} Table 3. LPM related advanced knobs 2'bx1: Override KL value {HOLD, OVRDEN} RX LPM 2'b00: KH High frequency loop adapt 2'b10: Freeze current adapt value 2'bx1: Override KH value {HOLD, OVRDEN} RX LPM 2'b00: OS Offset cancelation loop adapt 2'b10: Freeze current adapt value 2'bx1: Override OS value {HOLD, OVRDEN} RX LPM 2'b00: Gain control loop adapt 2'b10: Freeze current adapt value 2'bx1: Override GC value LPM LPM LPM Special knobs The following parameters are listed for completeness. The User should not bypass the Termination Calibration block, unless the PVT variation is low: for example, one single device and controlled temperature. This solution is the last chance when the PCB calibration resistor is missing by mistake. The RX Termination can be modified by setting the TERM_RCAL_OVRD=1 and its value with TERM_RCAL_CFG. Receiver Architecture Understanding the complex receiver architecture is fundamental for link optimization. Below a brief description of the main blocks is given. RX Termination: provides wide-range RX Termination with automatic calibration Analog passive 3 stages CTLE: o AGC: wide bandwidth equalizer o KL: low frequency range emphasis o KF: high frequency range emphasis Capture FF o Analog FFs as a slicer input DFE Taps Adaptation Blocks controlling all blocks (apart the RX term, that is calibrated once after FPGA configuration) Xilinx Answer 70918 Link Tuning 5

Building blocks of the GT receiver RX Termination The RX Termination provides a wide range RX resistance and is needed to minimize the channel to receiver impedance mismatch consequently the Return Losses. Its code can vary from 0 to 31 and the target resistance goes from 150ohm to 90ohm. The auto-calibration after FPGA configuration covers PVT variation of the resistors. The resistor value might slightly change with temperature: keep this in mind when the automatic calibration is performed at extremely low or high temperatures. Although the adaptation circuit can span a wide termination range, because the package impedance is fixed, the RX termination should not be used for values other than 100 Ohm differential. Xilinx Answer 70918 Link Tuning 6

Differential Resistance 7 Series GTX (UltraScale GT has similar value) RX termination vs. RCAL 160 140 120 100 80 60 40 20 0 0 5 10 15 20 25 30 35 RCAL Code 100C -40C 25C Differential termination resistance VS. RCAL code CTLE Stage The CTLE Block consists of 3 passive CTLE stages: CTLE Stage1: KH Boosting High Frequency Range CTLE Stage2: KL Boosting Middle Frequency Range CTLE Stage3: AGC Boosting DC All figures below referring to CTLE transfer functions are indicative only. Three stages of the linear equalizer CTLE1 KH Traditional CTLE which is compensating high frequency energy. The peaking frequency is at around 14GHz. There 32 Codes, from flat response (code 0) to highest high frequency boost (code 31). Xilinx Answer 70918 Link Tuning 7

KH transfer function CTLE2 KL This stage boosts the middle frequency range from 100M to 10GHz. There 32 Codes, from flat response (code 0) to highest middle frequency boost (code 31). KL transfer function CTLE3 - AGC This is the Broad Band DC Gain Amplifier. It adjusts DC Gain to achieve the best swing level at the summing node. The 3dB bandwidth is around 27GHz. The peaking exists at lower AGC code than 16 There are 32 Codes, from 6dB gain (code 31) to 10 db attenuation (code 0). AGC transfer function Xilinx Answer 70918 Link Tuning 8

DFE UltraScale GTY has 15 taps architecture with half rate architecture. H1=UT is the unrolled speculation voltage to solve the tough timing constraint. A mux selects between +h1 and h1 according to the previous bit value. DFE conceptual diagram Xilinx Answer 70918 Link Tuning 9

Code Notes Tap 1 (UT) 0 ~ 127 Unrolled Speculation Voltage; Only positive value Tap 2 0 ~ 63 Only positive value Tap 3-31 ~ +31 Tap 4-15 ~ +15 Tap 5-15 ~ +15 Tap 6-15 ~ +15 Tap 7-15 ~ +15 Tap 8-15 ~ +15 Tap 9-15 ~ +15 Tap A -15 ~ +15 Tap B -15 ~ +15 Tap C -15 ~ +15 Tap D -15 ~ +15 Tap E -15 ~ +15 Tap F -15 ~ +15 Table 4. Range of DFE taps The DFE main goal is to remove the Inter-Symbolic Interference (ISI). The DFE can also be efficient in correcting the noise due to reflections. From a Time domain point of view, it numerically corrects the value of the impulse response cancelling the tail. (Figure 8) Xilinx Answer 70918 Link Tuning 10

Adaptation Impulse response modified by DFE The adaptation block is a combination of various digital control circuits, with different adaptation speeds. Baseline wander Cancellation: this circuit goal is to optimize the p-n transition crossing point (fastest loop) CDR Loop: this loop allows the CDR to track the incoming data phase and frequency VP Loop: this circuit continuously measures the signal envelope amplitude UT Loop: check the VP measurement and set the UT value H(n) Loops: set the H(n) taps for an ideal impulse response KL Loops KH Loops AGC Loops (slowest loop) GTH and GTY in UltraScale and UltraScale+ have two different adaptation modes 1. LPM Mode a. KL loop b. KH loop c. AGC is hardcoded to 16 in GTH and to 32 in GTY 2. DFE Mode a. KL loop b. KH loop c. AGC is hardcoded to 16 in GTH and to 32 in GTY In DFE mode, depending on IL, KH and KL adaptation loops can be linked (low IL) or independent (high IL) VP UT loops VP loop and UT loop are closely linked. The VP loop constantly measures the signal envelope average peak. The VP loop finds 2 separate values: Xilinx Answer 70918 Link Tuning 11

VP1 is measured when the previous bit is equal to 1 VP0 is measured when the previous bit is equal to 0 Once VP1 and VP0 are known, the UT loop tries to find the unrolled threshold value which makes the closest VP1 And VP0 (Figure 9). This is the condition that guarantees the highest open area. Original eye (left); two eyes colored based on the previous bit value (middle); the UT loop succeeds in making VP1 = VP0 The UT loop modifies the capture flip flop threshold. If the current symbol would be +1, the impulse tail at the tap1 location should be positive due to ISI and this needs a negative DFE H1 tap correction. As consequence, an equivalent positive threshold voltage in capture FF = +h1_ut will be set. If the current would be -1, the impulse tail at the tap1 location should be negative by ISI and this needs a positive DFE H1 tap. In this case the threshold voltage in capture FF will be set to -h1_ut. (Figure 10) UT activity on FF thresholds, depending on the previous symbol value H(n) H(n) Loop is the traditional DFE tap loop to the summing node using the MMSE method to minimize error term between the average of peak values and the expected peak value. Each H(n) Loop has the averaging period which determines loop bandwidth. If the current bit is located inside the High Correlated Area, it means that the DFE Tap(n) is still under-doing with (n-bitago=1). If the current bit is located inside the Less Correlated Area, it means that the DFE Tap(n) is over-doing with (nbit-ago=1). If the location of current bits would be 50%-50% after averaging, DFE H(n) is optimized. (Figure 11) Xilinx Answer 70918 Link Tuning 12

Interpretation of H(n) Loop KH loop for CTLE1 The KH loop covers the same DFE high frequency range. It checks the correlation between the previous bit, current bit and right crossing point. (Figure 12) KL loop for CTLE2 The KH loop checks the correlation with the signal right crossing point. Right Crossing Data is used instead of error-term DFE mode The adaptation algorithm is similar to the H(n) Loop. It checks the correlation between the previous-n th bit, the current bit and the Error-term (difference between the average of peak values and the expected peak value). LPM mode Similarly to the KH loop, the KL loop in LPM mode checks the correlation between the previous-n th bit, current bit and right crossing point. It covers the Low Frequency range which cannot cover high speed DFE. The adaptation loop cancels the ISI residual after DFE equalization, thus becoming an active part in the equalization process. AGC loop for CTLE3 The AGC Loop role is to maintain an optimized swing level at the summing node. It compares the merged VP from VP/UT Loop and target VP and then scales up/down the entire eye amplitude with the desired target. There are cases where, even after signal scaling, the swing is not ideal but still in an acceptable range (Figure 13) The Target VP determines the goal of the AGC Loop. Higher target VP: Good for low loss channel / usually higher eye height It causes a smaller DFE equalization power because of the relatively small tap size Lower target VP: Good for high loss channel It causes a higher DFE equalization power because of the relatively large tap size Xilinx Answer 70918 Link Tuning 13

Ideal and practical role of AGC loop Xilinx Answer 70918 Link Tuning 14

Link tuning This last part of the Answer Record is finally focused on link tuning. There are several procedures, and we should know the benefits and drawbacks of each. Find the best driver setting based on eye scan analysis: this method requires you to repeat the scan procedure many times. We need a more efficient way. Find the best driver setting based on the adaptation coefficients. This method determines the best far-end driver setting based on near-end RX coefficients. It is a smart method that requires an insightful knowledge of the adaptation mechanisms. Find the best driver setting based on trend analysis of IBIS-AMI simulation. Time consuming and highly dependent on models (TX, package, channel, RX) precision. Verify the worst case PVT variation using distribution analysis of IBIS-AMI simulation Link Tuning based on the adaptation coefficients Link tuning based on adaptation coefficients moves mostly to three directions: tune Target VP tune far-end TX avoid saturation Tuning the target VP means providing the right channel Insertion Loss expectations to the receiver. With this information, the Receiver can better leverage the AGC and DFE resources. Tuning the far-end transmitter means pre-distorting the signal to achieve the best condition at receiver pins. Avoiding saturation allows you to take advantage of a wider adaptation loops dynamic. Tuning of Target VP The Target VP is mapped to the attribute RXDFE_GC_CFG1[6:0] in the case of UltraScale transceivers. The expected Target VP value per Insertion Loss is summarized in Table 5. Table 5. Target VP (decimal) per IL (db) However, the best value for the Target VP can be found by reading the RX coefficients. The user should reduce the Target VP until the UT value becomes less than the VP value. The effect of having a too high UT value can be revealed by a shifted eye. By default, the value of the Target VP in IBERT or in the IBIS-AMI model, or in the GT-Wizard is set to 96 (decimal), but this value must be changed according to the channel insertion loss. All of the adaptation Coefficients of the UltraScale GT are visible in IBERT since Vivado 16.1. As a result, IBERT has become a simple tool for manual analysis. An automatic tuning based on Tcl scripts can be also achieved with GT_Debugger (XAPP1295 and XAPP1322). The Tcl based approach has the advantage of flexibility, it allows you to manipulate important characteristics of the measurements, make extrapolations and measures in the background with a large number of channels. (Figure 14) Xilinx Answer 70918 Link Tuning 15

Example of multichannel analysis with GT_Debugger (please refer to AR 70915) Xilinx Answer 70918 Link Tuning 16

Example: VP/UT tuning Let s review step by step the process to tune the VP/UT. We will change the Target VP until the UT value is acceptable compared to VP. Step 1: Eye Scan: Coefficients analysis: Target VP DFE_VP_AVG DFE_AGC_AVG DFE_UT_AVG UT / VP Eye Area Open Symmetric 0x60 0x46 0x1F 0x68 1.48 1392 No Analysis and action: the eye is asymmetric and the UT is bigger than VP. UT is the main reason for eye offset. We also notice that the AGC is saturating and this is an index of sub-optimal equalization. We will reduce the Target VP with the goal of reducing the UT/VP ratio. Xilinx Answer 70918 Link Tuning 17

Step 2: Eye Scan: Coefficients analysis: Target VP DFE_VP_AVG DFE_AGC_AVG DFE_UT_AVG UT / VP Eye Area Open Symmetric 0x50 0x46 0x1F 0x68 1.48 1392 No Analysis and action: the eye is still asymmetric and the UT is bigger than the VP. The AGC is saturating. We will further reduce the Target VP with the goal of reducing the UT/VP ratio. Xilinx Answer 70918 Link Tuning 18

Step 3: Eye Scan: Coefficients analysis: Target VP DFE_VP_AVG DFE_AGC_AVG DFE_UT_AVG UT / VP Eye Area Open Symmetric 0x40 0x40 0x1C 0x5E 1.46 1172 No Analysis and action: the eye is still asymmetric and the UT is bigger than the VP. We will further reduce the Target VP with the goal of reducing the UT/VP ratio. Xilinx Answer 70918 Link Tuning 19

Step 4: Eye Scan: Coefficients analysis: Target VP DFE_VP_AVG DFE_AGC_AVG DFE_UT_AVG UT / VP Eye Area Open Symmetric 0x38 0x38 0x16 0x48 1.28 1162 Yes Analysis and action: The UT is bigger than the VP. We will further reduce the Target VP with the goal of reducing the UT/VP ratio. Xilinx Answer 70918 Link Tuning 20

Step 5: Eye Scan: Coefficients analysis: Target VP DFE_VP_AVG DFE_AGC_AVG DFE_UT_AVG UT / VP Eye Area Open Symmetric 0x30 0x30 0x0C 0x30 1.0 1006 Yes Analysis and action: UT is equal to VP. This is the condition for the best Target VP. Further consideration: The Eye Open Area is not relevant for Eye qualification and channel margin. What really matters in margin calculation is the horizontal and vertical distance from the eye limit to the data sampler point (middle of the UI). Thus, having a symmetric eye is always a preferred condition. (Figure 15) Measurement of the channel time margin: the closest eye corner should be used. Xilinx Answer 70918 Link Tuning 21

Low loss channels with high reflections This case should be considered in the Target VP tuning chapter. When the channel insertion loss is low and crosstalk and reflections are present, there is a large contribution of non-attenuated noise to the signal. In cases like this, the DFE equalizer might be the best solution. A lower Target VP can give better results. With high crosstalk and reflection a lower Target VP might help Xilinx Answer 70918 Link Tuning 22

Tuning of Far End Transmitter The transmitter setup indirectly changes the receiver AGC, UT, VP and Hx adapted values. By measuring the AGC value, or the UT/VP and UT/KH ratio it is possible to modify the transmitter equalization for a better channel margin. The Transmitter is made of a main cursor ruled by TXDIFFCTRL, and post and pre-cursors, ruled by TXPOSTCURSOR and TXPRECURSOR ports. For a complete description of the transmitter programmable driver, please refer to (UG576) and (UG578). TXDIFFCTRL tuning The rules for main cursor tuning are The adapted VP needs to be close to the Target VP Avoid AGC code that is less than 16 in a high crosstalk application The actions are summarized in Figure 17 below and are organized in a flow chart in Figure 18. Conditions driving the TXDIFFCTRL tuning Flow diagram for TXDIFFCTRL tuning TXPOSTCURSOR tuning Again, both UT and H2 loops must not saturate. This is summarized in Figure 19. Xilinx Answer 70918 Link Tuning 23

RX adaptation loops analysis: actions on TX emphasis TX emphasis tuning flow chart Example: Acting on TXPOSTCURSOR for optimal BER In Figure 21 we find a link tuning history, where the initial H2 is saturated. By following the diagram of Figure 20, the increasing TXPOSTCURSOR brings H2 outside of the saturation condition and finally to a proper UT/VP ratio, allowing for an error free channel. Progressive tuning of the TXPOSTCURSOR, driven by H2, UT and VP values Xilinx Answer 70918 Link Tuning 24

Link Tuning based on IBIS-AMI Simulation Garbage in, garbage out The simulation is a good option for link tuning, as long as models are accurate, and the user can analyze the simulation results critically. Apart the models precision caveat, this analysis is often performed as a parameter sweep simulation and might require a huge amount of computational time. In a Trend Analysis, the model parameters are progressively updated, and the interesting measurements of the signal are saved in a spreadsheet. Usually the sweep happens on transmitter main cursor and emphasis, while the receiver amplitude and time margin after equalization are recorded. The Method has the benefit of quickly showing the trend of the interesting quantities for a channel. (Figure 22) Example of IBIS-AMI trend analysis A good habit is to include the crosstalk model, and not to forget about the transmitter and receiver jitter, power supply noise and PVT variations. The IBIS-AMI PVT distribution must represent the worst case: for this reason, the simulation result can be conservative. (Figure 23) Xilinx Answer 70918 Link Tuning 25

The distribution of IBIS-AMI PVT corners is the worst distribution References Hong Ahn: Link Tuning presentation (STG) Erik Schidlack: XAPP1295: Automatic Insertion of Debug Logic for Transceivers in Synthesis DCP Giovanni Guasti Antonello Di Fresco: XAPP1322: Transceiver Link Tuning Revision History 05/22/2018 - Initial release Xilinx Answer 70918 Link Tuning 26