WA 17.6: A Variable-Frequency Parallel I/O Interface with Adaptive Power Supply Regulation Gu-Yeon Wei, Jaeha Kim, Dean Liu, Stefanos Sidiropoulos 1, Mark Horowitz 1 Computer Systems Laboratory, Stanford University, Stanford, CA 1 also with Rambus Inc., Mountain View, CA Adaptive power supply regulation reduces power dissipation in DSP and microprocessor cores [1,2]. A technique extends this concept to a high-performance parallel input/output (I/O) interface. An inverter, used as the basic delay element in the core of a dualloop delay-locked loop (DLL), has delay controlled by the supply voltage [3]. This control voltage is replicated by a high-efficiency switching supply to power the rest of the interface and to maximize energy-efficient operation. Figure 17.6.1 presents a detailed diagram of the dual-loop DLL and an adaptive power supply regulator. The core loop locks the delay through the delay line to half a reference clock period, which is equal to the TX_clk period, by adjusting its supply voltage [3]. A unity gain amplifier buffers the charge pump voltage to drive the delay-line, and an efficient switching regulator replicates this voltage to the rest of the system. To align the on-chip receiver clocks with the incoming data, a finite state machine (FSM), embedded in the peripheral loop of the DLL, controls multiplexors that select adjacent pairs of evenly spaced edges from the core loop. From those edges, digital interpolators generate quadrature phased clock edges [3]. Adjusting the relative drive strength between two sets of tristate inverters enables variable-weight interpolation. A phasedetecting receiver in the feedback path of the peripheral loop optimally places the receive clock by cancelling out the data receiver set-up time. A performance limitation of this adaptive regulation technique may arise from the supply ripple induced by the switching regulator, which could cause additional timing jitter. Since the peripheral loop responds with a slew rate higher than the slew rate of the induced power supply ripple, however, the end-to-end receiver timing margin of the link, measured with and without supply ripple, confirms that the implemented DLL design completely compensates for this potential source of jitter. The transmitter consists of a pmos current-mode driver with a digitally configurable, parallel set of binary-weighted nmos termination transistors (Figure 17.6.2). All swings are referenced to ground to enable communication between chips with different supply voltages. Parallel binary-weighted transmitter legs in the pmos current source adjust output swing magnitude and compensate for lower output currents due to a reduction in gate overdrive at lower supply voltages. Since the transmitter and preceding circuitry in the signal path all operate off of the regulated supply and gate delays track the DLL inverter delay, delays and transition times of edges remain a fixed percentage of the cycle, providing automatic slew rate control over a wide range of operating frequencies. The transmitters can be configured in single-ended mode by transmitting a reference voltage along with the single-ended signals or as pseudo-differential pairs that transmit complementary signals. A preamplifier followed by a regenerative latch, shown in Figure 17.6.2, receives the transmitted data. All components of the receiver operate off of the regulated supply, and a replica self-biasing scheme adjusts the preamplifier current to ensure its swing remains a fixed percentage of the supply [4]. Receiver bandwidth is set by the product of the output impedance (1/g m ) of the supplyconnected nmos loads and the capacitive loading on outputs a and a. Since the transconductance (g m ) of the loads tracks the g m of an inverter across voltage, the bandwidth of the preamplifier tracks an inverter delay to first order, and hence the bit rate. Setting the preamplifier bandwidth close to the clock rate of the incoming data filters out unwanted high frequency noise. A pair of nmos gates configures the preamplifier to receive either differential signals or a single-ended signal with a reference shared by all the receivers. The I/O interface is fabricated in a MOSIS 0.35µm process. The prototype die micrograph is shown in Figure 17.6.3 and performance is summarized in Figure 17.6.4. The test chip consists of four parallel sets of data receiver and transmitter pairs and one clock link. 8b data sequence generators and a 20b pseudo-random bit sequence (PRBS) generator and verifier test functionality and performance of the links. A digital PID controller matches the regulated supply to the input control voltage through on-chip power transistors and an off-chip inductor and capacitor pair that form a buck converter [1]. A combination of shifters, adders, and latches implement the PID control functions on binary representations of the control and regulated voltages, as shown in Figure 17.6.5. Effective A/D conversion is achieved by counting pulses out of a long delay chain ring operating at the control voltage over a fixed period to generate the higher-order bits of the digital output. Detecting how far an edge has propagated through the delay chain by reading out its contents at the end of each period determines the lower-order bits [2]. The D/A conversion from the PID output to a pulse width modulated (PWM) square wave input to the buck employs a similar mechanism. This technique improves resolution of the A/D and D/A to a fraction of a ring-oscillator period and significantly reduces switching activity to only consume 1.5mW of total overhead power. The I/O interface successfully operates over a frequency range from 200Mb/s to 1Gb/s, with regulated voltages ranging from 1.2V to 3.2V. Figure 17.6.6 plots the total regulated power consumed per link operating in differential and single-ended configurations versus clock frequency and presents the power savings potential of adaptive supply regulation. It also shows that single-ended transmitters require more than double the swing, and hence consume higher power than differential transmitters which require a pair of links. A breakdown of the power consumed by the test chip in Figure 17.6.7 shows that the CV 2 f power of the clock distribution, test logic, and DLL dominates. Therefore, considerable energy savings is possible when operating at lower speeds. Figure 17.6.8 plots an eye diagram of a PRBS transmitted at 0.8Gb/s. The transmitters are designed with rising and falling transitions that each consume approximately 30% of the cycle time. The figure also lists transition times as a percentage of the cycle time versus bit rate, showing the automatic slew rate control enabled by adaptive regulation of the supply with frequency. References: [1] Wei, G., M. Horowitz, A Fully Digital, Energy Efficient, Adaptive Powersupply Regulator, IEEE JSSC, pp. 520-528, April 1999. [2] Chandrakasan, A., et al., Data Driven Signal Processing: An Approach for Energy Efficient Computing, IEEE ISLPED, pp. 347-352, Aug. 1996. [3] Sidiropoulos, S., M. Horowitz, A Semidigital Dual Delay-locked Loop, IEEE JSSC, pp. 1683-1692, Nov. 1997. [4] Johnson, M. Bias Circuit and Differential Amplifier having Stabilized Output Swing, US Patent 5452898, Sep. 1995. Figure 17.6.4: Test chip performance summary.
Figure 17.6.1: Dual-loop DLL and adaptive power supply regulator. Figure 17.6.2: pmos current mode transmitter and frequency tracking receiver. Figure 17.6.5: Digitally-controlled power supply regulator. Figure 17.6.3: Test chip micrograph. Figure 17.6.7: Total I/O power breakdown at 0.8Gb/s and 2.7V regulated supply. Figure 17.6.6: Regulated power per link and minimum swing vs. frequency. Overhead represents average power consumed by the DLL, clock distribution and test circuitry amortized across 5 links. Figure 17.6.8: Data-eye diagram and slew rate vs. bit rate.
Figure 17.6.1: Dual-loop DLL and adaptive power supply regulator.
Figure 17.6.2: pmos current mode transmitter and frequency tracking receiver.
Figure 17.6.3: Test chip micrograph.
Figure 17.6.4: Test chip performance summary.
Figure 17.6.5: Digitally-controlled power supply regulator.
Figure 17.6.6: Regulated power per link and minimum swing vs. frequency. Overhead represents average power consumed by the DLL, clock distribution and test circuitry amortized across 5 links.
Figure 17.6.7: Total I/O power breakdown at 0.8Gb/s and 2.7V regulated supply.
Figure 17.6.8: Data-eye diagram and slew rate vs. bit rate.