Distributed clock generator for synchronous SoC using ADPLL network

Similar documents
MODELING THE PHASE STEP RESPONSE OF BANG-BANG DIGITAL PLLS

A 100MHz voltage to frequency converter

Compound quantitative ultrasonic tomography of long bones using wavelets analysis

STUDY OF RECONFIGURABLE MOSTLY DIGITAL RADIO FOR MANET

Development of a TDC to equip a Liquid Xenon PET prototype

Wireless Energy Transfer Using Zero Bias Schottky Diodes Rectenna Structures

Design Space Exploration of Optical Interfaces for Silicon Photonic Interconnects

Power- Supply Network Modeling

A New Approach to Modeling the Impact of EMI on MOSFET DC Behavior

Analysis of the Frequency Locking Region of Coupled Oscillators Applied to 1-D Antenna Arrays

Application of CPLD in Pulse Power for EDM

A high PSRR Class-D audio amplifier IC based on a self-adjusting voltage reference

Gis-Based Monitoring Systems.

A 180 tunable analog phase shifter based on a single all-pass unit cell

Indoor Channel Measurements and Communications System Design at 60 GHz

A Switched-Capacitor Band-Pass Biquad Filter Using a Simple Quasi-unity Gain Amplifier

PHASE-LOCKED loops (PLLs) are widely used in many

A low power 12-bit and 25-MS/s pipelined ADC for the ILC/Ecal integrated readout

3D MIMO Scheme for Broadcasting Future Digital TV in Single Frequency Networks

A DPLL-based per Core Variable Frequency Clock Generator for an Eight-Core POWER7 Microprocessor

RFID-BASED Prepaid Power Meter

An All-digital Delay-locked Loop using a Lock-in Pre-search Algorithm for High-speed DRAMs

QPSK-OFDM Carrier Aggregation using a single transmission chain

Sub-Threshold Startup Charge Pump using Depletion MOSFET for a low-voltage Harvesting Application

On the role of the N-N+ junction doping profile of a PIN diode on its turn-off transient behavior

L-band compact printed quadrifilar helix antenna with Iso-Flux radiating pattern for stratospheric balloons telemetry

A sub-pixel resolution enhancement model for multiple-resolution multispectral images

SUBJECTIVE QUALITY OF SVC-CODED VIDEOS WITH DIFFERENT ERROR-PATTERNS CONCEALED USING SPATIAL SCALABILITY

Design of Cascode-Based Transconductance Amplifiers with Low-Gain PVT Variability and Gain Enhancement Using a Body-Biasing Technique

MODELING OF BUNDLE WITH RADIATED LOSSES FOR BCI TESTING

A new inductorless DC-DC piezoelectric flyback converter

Dedication. To Mum and Dad

Small Array Design Using Parasitic Superdirective Antennas

3-axis high Q MEMS accelerometer with simultaneous damping control

Fast-lock all-digital DLL and digitally-controlled phase shifter for DDR controller applications

Susceptibility Analysis of an Operational Amplifier Using On-Chip Measurement

Long reach Quantum Dash based Transceivers using Dispersion induced by Passive Optical Filters

Tutorial: Using the UML profile for MARTE to MPSoC co-design dedicated to signal processing

Linear MMSE detection technique for MC-CDMA

A Fast Locking Digital Phase-Locked Loop using Frequency Difference Stage

Lecture 160 Examples of CDR Circuits in CMOS (09/04/03) Page 160-1

All Digital Phase Locked Loop Architecture Design Using Vernier Delay Time-to- Digital Converter

Improvement of The ADC Resolution Based on FPGA Implementation of Interpolating Algorithm International Journal of New Technology and Research

A technology shift for a fireworks controller

Arcing test on an aged grouted solar cell coupon with a realistic flashover simulator

A Low Power Digitally Controlled Oscillator Using 0.18um Technology

On the robust guidance of users in road traffic networks

A 0.40 pj/cycle 981 µm2 Voltage Scalable Digital Frequency Generator for SoC Clocking

DUAL-BAND PRINTED DIPOLE ANTENNA ARRAY FOR AN EMERGENCY RESCUE SYSTEM BASED ON CELLULAR-PHONE LOCALIZATION

High finesse Fabry-Perot cavity for a pulsed laser

Convergence Real-Virtual thanks to Optics Computer Sciences

New Structure for a Six-Port Reflectometer in Monolithic Microwave Integrated-Circuit Technology

A Novel Piezoelectric Microtransformer for Autonmous Sensors Applications

BANDWIDTH WIDENING TECHNIQUES FOR DIRECTIVE ANTENNAS BASED ON PARTIALLY REFLECTING SURFACES

PMF the front end electronic for the ALFA detector

Modelling and Hazard Analysis for Contaminated Sediments Using STAMP Model

Assessment of Switch Mode Current Sources for Current Fed LED Drivers

A Low-Profile Cavity-Backed Dual-Polarized Spiral Antenna Array

A Low-cost Through Via Interconnection for ISM WLP

Influence of ground reflections and loudspeaker directivity on measurements of in-situ sound absorption

A wide-range all-digital duty-cycle corrector with output clock phase alignment in 65 nm CMOS technology

Self-timed rings as low-phase noise programmable oscillators

Design of an Efficient Rectifier Circuit for RF Energy Harvesting System

A Novel Low Power Digitally Controlled Oscillator with Improved linear Operating Range

Design and Realization of Autonomous Power CMOS Single Phase Inverter and Rectifier for Low Power Conditioning Applications

Floating Body and Hot Carrier Effects in Ultra-Thin Film SOI MOSFETs

Measures and influence of a BAW filter on Digital Radio-Communications Signals

A custom 12-bit cyclic ADC for the electromagnetic calorimeter of the International Linear Collider

A design methodology for electrically small superdirective antenna arrays

IN RECENT years, the phase-locked loop (PLL) has been a

Optical component modelling and circuit simulation

Simulation and Numerical Analysis and Comparative Study of a PID Controller Based on Ziegler-Nichols and Auto Turning Method

UML based risk analysis - Application to a medical robot

Implementation techniques of high-order FFT into low-cost FPGA

A Frequency Synthesis of All Digital Phase Locked Loop

INVESTIGATION ON EMI EFFECTS IN BANDGAP VOLTAGE REFERENCES

A fast lock-in all-digital phase-locked loop in 40-nm CMOS technology

Robust Optimization-Based High Frequency Gm-C Filter Design

VR4D: An Immersive and Collaborative Experience to Improve the Interior Design Process

Demand Response by Decentralized Device Control Based on Voltage Level

Globalizing Modeling Languages

MAROC: Multi-Anode ReadOut Chip for MaPMTs

Low temperature CMOS-compatible JFET s

Concepts for teaching optoelectronic circuits and systems

100 Years of Shannon: Chess, Computing and Botvinik

Two Dimensional Linear Phase Multiband Chebyshev FIR Filter

An On-Line Wireless Impact Monitoring System for Large Scale Composite Structures

Stewardship of Cultural Heritage Data. In the shoes of a researcher.

FeedNetBack-D Tools for underwater fleet communication

Wide-band multipath A to D converter for Cognitive Radio applications

Multiple Reference Clock Generator

A Comparison of Phase-Shift Self- Oscillating and Carrier-based PWM Modulation for Embedded Audio Amplifiers

FPGA IMPLEMENTATION OF POWER EFFICIENT ALL DIGITAL PHASE LOCKED LOOP

Indoor MIMO Channel Sounding at 3.5 GHz

An improved topology for reconfigurable CPSS-based reflectarray cell,

DOUBLE DATA RATE (DDR) technology is one solution

A Clock and Data Recovery Circuit With Programmable Multi-Level Phase Detector Characteristics and a Built-in Jitter Monitor

Low Power CMOS Digitally Controlled Oscillator Manoj Kumar #1, Sandeep K. Arya #2, Sujata Pandey* 3 and Timsi #4

Opening editorial. The Use of Social Sciences in Risk Assessment and Risk Management Organisations

A Low Power Multi-Channel Single Ramp ADC With Up to 3.2 GHz Virtual Clock

Transcription:

Distributed clock generator for synchronous SoC using ADPLL network Eldar Zianbetov, Dimitri Galayko, François Anceau, Mohammad Javidan, Chuan Shan, Olivier Billoint, Anton Korniienko, Eric Colinet, Gérard Scorletti, Jean-Michel Akre, et al. To cite this version: Eldar Zianbetov, Dimitri Galayko, François Anceau, Mohammad Javidan, Chuan Shan, et al.. Distributed clock generator for synchronous SoC using ADPLL network. Custom Integrated Circuits Conference (CICC), 013 IEEE, Sep 013, San Jose, CA, United States. IEEE, pp.1-4, 013, <10.1109/CICC.013.66843>. <hal-010368> HAL Id: hal-010368 http://hal.upmc.fr/hal-010368 Submitted on 4 Aug 014 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Distributed clock generator for synchronous SoC using ADPLL network E. Zianbetov 1, D. Galayko 1, F. Anceau 1, M. Javidan, C. Shan 1, O. Billoint 3, A. Korniienko, E. Colinet 3, G. Scorletti, J. M. Akré 1, J. Juillard 4 1 UPMC, LIP6 lab, Paris, France Ampère lab, Lyon, France 3 CEA-LETI, Grenoble, France 4 Supélec, Gif-Sur-Yvette, France eldar.zianbetov@lip6.fr Abstract In this paper a novel architecture of on-chip clock generation employs a network of oscillators synchronized by a network of all-digital PLLs (ADPLLs). In the implemented prototype 16 local clock generators are synchronized by the ADPLL network, with an output frequency of 1.1-.4 GHz. The synchronization error between the neighboring clock domains is less than 60 ps. The fully digital architecture of the generation offers flexibility and efficient synchronization control suitable for use in synchronous SoCs. I. INTRODUCTION Clock generation and distribution are one of the main challenges in the design of modern large scale SoC [1]. The increase of relative dimensions of digital SoCs, together with power limitations, makes the techniques of centralized clocking prohibitive. While long transmission lines are needed for chip-wise clock distribution, the associated delays must be perfectly mastered. This is very difficult to obtain without unacceptable power consumption costs. This is the main reason for the popularity of globally asynchronous SoC architectures, which however present a number of fundamental drawbacks related with reliability and verification issues, as well as suffering from reduced communication speed. Distributed clock generators are based on the local oscillators distributed over the chip area, mutually coupled with its immediate neighbors so that all oscillators have the same phase and loaded with the local clock tree. The network of oscillators is sufficiently dense, so that: (i) geometric distance between each couple of neighboring oscillators is small enough for delays associated with the network oscillator links to be negligible, (ii) distribution of the clock signal inside of each clocking domain is done by conventional techniques without difficulties and (iii) synchronous communications between neighboring zones is possible as far as the corresponding local oscillators are synchronous in phase. Here, long clock distribution lines employed by conventional architectures are replaced by local short network links which connect small local clock trees. When local oscillators are coupled in the phase domain, the system is called Distributed PLL Network. The latter is a good candidate for on-chip distributed clock generation, because of better compatibility with digital circuits. This paper presents a first digital implementation of an array of 16 oscillators coupled through a network of alldigital phase locked loops (ADPLLs) intended for distributed clock generation. The proof-of-concept chip generating 1-. GHz clock is implemented in 6 nm CMOS technology. CLK i-1,j Reference clock Distributed ADPLL Fig. 1. CLK i,j+1 CLK i,j-1 CLK i+1,j CLK i,j Local divided clock Error combiner Total error Loop filter Node i,j Local oscillator 10 Divider /N Local high freq. clock - filter and oscillator - phase-frequency detector () - clock domain Architecture of the ADPLL network and of a single node. The theoretical basis for this system was provided by the studies [], [3], [4], while the only silicon implementation of this concept employed a network of analog PLLs []. II. DISTRIBUTED CLOCKING ARCHITECTURE A. System description The structure of the clocking network is presented in Fig. 1. The local clocks are generated by digitally controlled oscillators (DCOs). 4 digital Phase-Frequency Detectors () measure the timing error between each couple of neighboring DCOs. The network is coupled with the external reference clock through a placed in upper left corner of the network. The digital error signals from s are processed by the digital proportional-integral (PI) loop filters. Each filter processes the weighted sum of the errors from up to 4 s (depends on topological position of the node) and generates the digital control code for the DCO. The control objective of the filter is to maintain the sum of the errors close to zero. Such clock to circuitry

Reference clock Reference clock f f ref div BB SIGN Arith. block (a) TDC 14 MODE Absolute error (a) (b) Fig.. Mode-lock elimination technique: (a) power up in unidirectional configuration and after convergence swithcing to (b) bidirectional configuration. +1 (b) a network, if properly designed, synchronizes at the phase of the reference clock. B. Stability issues The paramount question about the stability of such a complex dynamical system was addressed in theoretical studies through control theory tools [3], [4]. A formal proof of stability, together with an algorithm of choice of the block parameter was proposed. In addition to this, several studies of PLL networks highlight the existence of multiple synchronized modes in which all oscillators have the same frequency and a fixed (zero or not) phase errors. Only the mode with a zero phase error is required for the clocking application. The selection of the desirable synchronized mode is particularly easy in digital PLL network because of its ability of reconfiguration. C. Desirable mode selection The proposed method is based on an on-fly dynamic reconfiguration of the network [8]. The reconfiguration procedure is performed during the start-up and consists of two steps: Step 1. The clocking network is powered up and programmed into a unidirectional configuration. This is achieved by programming the feedback links between nodes; by disabling or enabling them. For example, each node receives the information about errors from upper and left neighbors Fig. (a). In such a mode information about reference frequency and phase propagates from the left upper node to the lowest right corner of the network. This mode excludes the cycles of propagation of information, hence eliminates the possibility of undesired locking. However, in such an operation mode the suppression of perturbations is weak [8] and clocking network has the accumulative errors. They increase with the distance from the reference point and introduce an undesired skew. Step. Once the network is synchronized with certain timing errors, it is programmed into a bidirectional configuration Fig. (b). In this mode the reverse links are activated, and the network operates in a fully synchronous mode with distributed feedback (coupling) maintaining the synchronization. Thus, when the required synchronization mode is selected, the timing errors between neighboring local clocks are close to zero. Fig. 3. -1 +1-1 : (a) block diagram and (b) its transfer function. III. NODE ARCHITECTURE A. Phase-frequency detector overview The is an analog-to-digital converter quantifying the synchronization error into a digital -bit signed number. According to its transfer function, Fig. 3(b), its range is limited by the boundaries ± φ r, which are derived from the constraints of precision and hardware complexity. The detail block diagram of the is shown in Fig. 3(a). The consists of a bang-bang phase detector (BB) measuring the sign of the phase error and a time-to-digital converter (TDC) for the quantification of the absolute time error between two clocks. The arithmetic block combines the signals from these blocks and produces two binary signed signals (straight and inverted) thereafter used by the local and neighboring nodes. The bang-bang detector is a state automaton which decides which of two input rising edges of clock signals arrived first. The use of an arbiter circuit and metastability filter in this circuit makes it robust to metastability effects []. The TDC is based on a tapped delay line followed by the sampling register. The delay elements are CMOS buffers with delay 3 ps. B. Digitally controlled oscillators The implemented DCOs are the ring CMOS oscillators employing width-modulated technique for the digital frequency tuning [6], []. Their structure is based on a -stage ring oscillator (Fig. 4) with a parallel connection of the tuning inverters (Fig. 4, CTI0-CTI6 and FTI0-FTI) in each stage of oscillator. The main inverters (Fig. 4, MI0-MI6) are always active and define the lowest oscillation frequency. The tuning inverters distributed over all stages of oscillator and divided on two arrays: 6 coarse tuning (CTI) and 3 fine tuning (FTI) inverters. They provide respectively 6 MHz and 1. MHz frequency tuning steps with a total of 6 4=104 steps. The cells are controlled by three thermometer codes obtained from the binary to thermometer decoders. The monotonicity of the code-frequency characteristic is guaranteed by an appropriate

MI0 MI1 MI MI3 MI4 MI MI6 CTI0 CTI1 CTI CTI3 CTI4 CTI CTI6 38 38 38 38 38 38 38 FTI0 FTI1 FTI EN CLK from s 1 3 Kw 1 Kw Kw 3 8 8 9 Z -1 Kp 14 1 1 Ki Z -1 : : 1 9 10 10 FCW to DCO Kw 4 FCW 10 Control circuitry 41 Z -1 div 41 Array of parallel tuning inverters with digital control EN 4 to s -8 : CLK from DCO Fig. 4. oscillator. Digitally controlled oscillator based on a stage CMOS ring Fig.. The error combiner and proportional-integral filter. choice of the control algorithm. The designed in 6 nm technology oscillator has a frequency tuning range 1-. GHz. C. Error processing The error processing in node is performed in two steps by an error combining block and a loop filter. The first block receives up to four -bit -complement coded errors. They are passed through four variable gain blocks and then summed using four-input adder. The weighting coefficients of the variable gain Kw 1 Kw 4 are programmable. Each gain can take independently a value from the set {0,1,,4} and implemented as a binary shift, so introducing a very small delay. Programming these coefficients, we can control the connectivity between the nodes of the network. Then the four-input adder operates with -bit operands and produces a 9-bit sum. The output of the adder is buffered with a register. We mention that each node is an auto-sampled system: the filter is sampled with the generated local clock divided by 8 and s compare the clocks at this rate. The PI filter processes the 9-bit sum of the errors. It has coefficients K p and K i programmable with respectively and 1-bit resolution. Both blocks have been designed in a common digital design flow with a use of standard cells. The measured initial frequencies of the oscillators are distributed within a 4 MHz range, which gives good conditions for the fast start-up of the network. This range is explained mainly by the sensitivity of the oscillators to the supply voltage, which is measured to be 900 MHz/V. However, even with this result, the desired frequency adjusting range and convergence of the network are assured under ±10 % supply variation. The achieved average nominal frequency in the clocking domains is 144 MHz, while the tuning range is 1100-380 MHz. Fig. presents the captured waveforms of divided by 16 local clocks when the network is synchronized. The observed timing errors between neighboring clocks were in the range of 30-60 ps for 1.6 GHz output frequency. This is less than 10% of the clock period. This result can be improved by increasing the resolution. As predicted by theory, the error is a zerocentered random process, i.e. the skew is zero. An important result is that for 00 cases of network start-ups we haven t observed the undesired locking states. This result is in a contradiction with modeling and the theory: the possibility 140 um IV. TEST CIRCUIT DESIGN 800 um A prototype of the distributed clock generator with 16 nodes has been designed and manufactured in 6 nm CMOS technology. It has an area of mm where the clock network itself occupies 0.8 0.9 mm (Fig. 6). Besides the clocking network, the on-chip digital circuitry includes design-for-test block, the and bang-bang detector for their characterization. The microphotograph of the fabricated silicon prototype is presented in Fig. 6. DFT FLT DCO 900 um 1390 um V. MEASUREMENT RESULTS The goal of the experiments were the characterization of the phase synchronization between the DCOs and an investigation of the sensitivity of the network to different perturbations. Fig. 6. Die microphotograph.

Output voltage [V] 1 0 1 0 10 1 0 30 3 40 4 Time [ns] DIV1 DIV10 DIV11 DIV1 DIV13 DIV14 DIV1 DIV16 Output frequency [MHz] 30 0 10 00 190 180 Perturbation @ 8 us Reference frequency acquisition Reference phase tracking 10 0 10 0 Time [us] 30 40 Fig.. Captured phase locked divided clocks. Fig. 9. Transitional process in Node 11. Output voltage [V] 1 0 1 0 10 1 0 30 3 40 4 Time [ns] Fig. 8. Captured divided clocks in mode-lock state. DIV1 TABLE I NETWORK TEST CHIP MEASUREMENTS SUMMARY AND COMPARISON Parameter [] This work Number of nodes 16 16 Central frequency of the SCA, MHz 100 144 Frequency range, MHz 1100 1300 1100 380 Timing error, ps 30 < 60 Power consumption, mw 390 186. Technology, nm 30 6 Clocking core area, mm - 0. Chip area, mm 9.04 Circuitry nature analog digital between neighbor nodes F clk = 1. GHz mixed and digital @ F clk = 1.6 GHz of mode-locking is one of particular properties of the PLL networks which is always mentioned in theoretical studies [] and reproduced in prototype [9]. Therefore, the research of mode-lock states was repeated with reduced network configuration, where theoretically mode-lock must occur with high probability. In such a configuration, for 00 attempts of perturbation by global reset we have observed a mode-lock 4 times. One of these states has been captured and it is shown in Fig. 8. Fig. 9 shows the transient process in the one of the oscillators. The perturbation has been introduced in a network in order to study the robustness of the network at t =8 µs. After this perturbation, the clocking network returns to the reference frequency and phase after 1 µs. The frequency acquisition speed may be increased by employing special techniques more efficient than a simple PI filtering. In order to check the robustness of the network operation in presence of variation of the block parameters, several experiments were done. In particular, the network was tested under 10% variation of the filter coefficients: no degradation in the quality of the oscillator synchronizaton was observed. The power consumption of the clocking network has been measured for 1.6 GHz oscillation frequency under 1. V supply voltage. The s and PI filters consume 3 mw ( mw per node). The DCO consumption is 9.8 ma/node ( 6.1 mw/ghz). We note that the power optimization of the DCO was not an objective of this prototype and better results can be obtained by a more involved design. Table I shows a performance summary of the measured results and comparison with existing implementation of the distributed clock generator. VI. CONCLUSION A distributed clock generator for synchronous SoC based on the network of coupled in phase oscillators has been demonstrated. The synchronization of the oscillators is achieved by the ADPLL network. The problem of undesirable synchronization modes is solved by a dynamic reconfiguration of the network interconnection topology at the start-up stage. The advantage of the proposed system is compatibility with the digital environment, its flexibility of reconfiguration and possibility of advanced control over the clock generation. The fabricated prototype has proved the reliability of the proposed clock generation methodology. It has 16 nodes and operates in a frequency range 1.1-.4 GHz. The measured timing accuracy between neighboring clocking domains of the circuit is less than 60 ps. ACKNOWLEDGMENTS The authors would like to thank the members of CEA-Leti for the help with the experiments on fabricated circuit. This work was supported by a French National Agency of Research in a framework of HERODOTOS research project. REFERENCES [1] N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar. Next generation intel core micro-architecture (Nehalem) clocking. IEEE JSSC vol. 44, no. 4 (009): 111-119. [] G. Pratt and J. Nguyen. Distributed synchronous clocking. Parallel and Distributed Systems, IEEE Trans. on 6, no. 3 (199): 314-38. [3] A. Korniienko et al., Control Law Synthesis for Distributed Multi-Agent Systems: Applications to Active Clock Distribution Network, Automatic and Control Conference, San Francisco, CA, 010

[4] J. M. Akre, J. Juillard, D. Galayko and E. Colinet. Synchronization analysis of networks of self-sampled all-digital phase-locked loops. Circuits and Systems I, IEEE Trans. on 9, no. 4 (01): 08-0. [] V. Gutnik and A. P. Chandrakasan. Active GHz clock network using distributed PLLs. IEEE JSCC, vol. 3, no. 11 (000): 13-160. [6] E. Zianbetov et al., A Digitally Controlled Oscillator in a 6-nm CMOS process for SoC clock generation., IEEE Int. Symp. on Circuits and Systems, pp. 84-848, 011. [] J. A. Thierno et al., A Wide Power Supply Range, Wide Tuning Range, All Static CMOS All Digital PLL in 6 nm SOI, IEEE JSSCC, vol. 43, no. 1, January 008. [8] M. Javidan et al., All-digital PLL array provides reliable distributed clock for SOCs. IEEE ISCAS conf., pp. 89-9, 011. [9] C. Shan et al., FPGA implementation of reconfigurable ADPLL network for distributed clock generation. IEEE Int. Conf. on Field-Programmable Technology, pp. 1-4, 011.