JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.6, DECEMBER, 2017 ISSN(Print) 1598-1657 https://doi.org/10.5573/jsts.2017.17.6.825 ISSN(Online) 2233-4866 An All-digital Delay-locked Loop using a Lock-in Pre-search Algorithm for High-speed DRAMs Jongsun Kim Abstract A new harmonic-free, fast-locking, alldigital delay-locked loop (DLL) that uses a lock-in pre-search (LPS) algorithm is presented for DDR3 and DDR4 SDRAMs. By adopting a new LPS algorithm that changes the propagation delay of the course delay line (CDL) with five delay steps, the DLL is able to find the approximate locking point before the normal operation. The DLL then performs a binary search and a sequential search to achieve fast locking without the harmonic lock problem. Fabricated in a 0.13-mm CMOS process, the simple digital DLL architecture achieves a wide frequency range of 0.15-to-2.2 GHz and a measured peak-topeak clock jitter of 7 ps at 2.2 GHz. It achieves a maximum locking time of 52 clock cycles, consumes 3.1 mw at 1 GHz from a 1.2 V supply, and occupies an active area of 0.046 mm 2. delay-locked loop (DLL) that can operate at a wide frequency range from 0.3 GH to 1.6 GHz. Also, the DDR3/DDR4 DLL [1-4, 7] requires a fast locking time of less than 512 clock cycles and the supply voltage should be less than 1.2 V, while maintaining small-area, low power consumption, and low-jitter characteristics. Table 1 compares the key specifications of DDR-x SDRAMs. Although many digital DLLs have been introduced for DDR3 and DDR4 SDRAMs [1-4], only a handful can support the frequency ranges of both DDR3 and DDR4 simultaneously. The maximum operating frequency of Ref. [2] is only 800 MHz and therefore cannot cover the operating frequency range required by DDR3 and DDR4. A major drawback of sref. [4] is that it does not consider fast locking issues. Therefore, the DLL architecture of Ref. [4] is not applicable to DDR4 because it cannot Index Terms Delay locked loop, DLL, DRAM, digital DLL, DDR3, DDR4, SDRAM I. INTRODUCTION Currently, the double data rate 3 (DDR3) and double data rate 4 (DDR4) synchronous dynamic random access memories (SDRAMs) are the most widely used low-cost main memory solution for personal computers, servers, and other embedded system applications. One of the main challenges in the design of DDR3 and DDR4 SDRAMs is the implementation of a low-cost, all-digital, Manuscript received Feb. 19, 2017; accepted Aug. 11, 2017 School of Electronic and Electrical Engineering, Hongik University E-mail : js.kim@hongik.ac.kr Table 1. Comparison of DDR-x SDRAM specifications Min./Max. Clock Frequency (MHz) Dara Rate per pin (Mbps) Max. Transfer Rate (GB/s) Supply Voltage (V) Memory Interface Standard DLL Locking Time (tdllk) DDR1 DDR2 DDR3 DDR4 100/200 200/400 400/800 800/1600 200 400 400 800 800 1600 1600 3200 3.2 6.4 14.9 21.3 2.5 1.8 tck DLL Enabled - - Release (year) 1.5 (1.35 DDR3L) 1.2 SSTL_2 SSTL_1.8 SSTL_1.5 POD_1.2-200 cycles 512 cycles 300 800 MHz 597 cycles @1333 MHz 667 1600 MHz 2000 2003 2007 2013
826 JONGSUN KIM : AN ALL-DIGITAL DELAY-LOCKED LOOP USING A LOCK-IN PRE-SEARCH ALGORITHM FOR Fig. 1. Proposed all-digital DLL using a lock-in pre-search (LPS) algorithm Overall architecture, Locking process when the locking point is between Step3 and Step4. satisfy the fast locking time specification. Since the DDR-x SDRAMs have a long clock distribution network (CDN) which is connected to the output drivers (DQs) of the SDRAM, the replica clock path (RCP), located in the feedback path of the DLL, should be considered in the design of DLL architectures for skew cancellation [2]. Unfortunately, many DLLs, such as [5, 6, 8], did not consider the RCP overhead in their architecture design. Moreover, the harmonic lock problem must be eliminated to reduce the power consumption and the clock jitter [2, 5, 8-10]. This paper proposes a new low-cost wide-range alldigital DLL that is suitable for use in both DDR3 and DDR4 SDRAMs [13]. In order to achieve both a wide frequency range and fast locking capability without the harmonic lock problem, the proposed DLL utilizes a new lock-in pre-search (LPS) algorithm. When compared to the anti-harmonic algorithms introduced in [5, 8], which are very complex and sensitive to supply noise, the proposed LPS is simple, noise-tolerant and can find the approximate locking point more easily. With any amount of propagation delay in the RCP, the proposed DLL achieves a wide frequency range from 0.15 GHz to 2.2 GHz with a maximum locking time of only 52 clock
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.6, DECEMBER, 2017 827 Fig. 2. Proposed CDL structure and the number of active DEs in the LPS mode. cycles without the harmonic lock problem. This paper is organized as follows. Section II describes the proposed all-digital DLL architecture. Section III shows the implementation results of the fabricated DLL chip. Finally, the conclusions are given in Section IV. II. PROPOSED DIGITAL DLL ARCHITECTURE Fig. 1 illustrates the proposed all-digital DLL, which consists of a digitally controlled delay line (DCDL) comprising a coarse delay line (CDL) and a fine delay line (FDL), a DCDL control logic, a 1/4 frequency divider, and a phase detector (PD). The DCDL control logic includes a 4-bit ring counter, 9-bit successive approximation register (SAR), a mode control block, and a 5-to-32 thermometer decoder. Fig. 1 shows the initial locking process of the proposed DLL using the LPS and binary search (BS) mode. When the DLL begins operation, the DLL first performs the LPS mode: The DCDL control logic finds the approximate locking point by changing R[3:0] of the 4-bit ring-counter from [0000] to [1000] in five steps as shown in Fig. 1. In Fig. 1, we assume that the locking point is located between Step3 and Step4. The R[3:0] code bits are loaded to the 4 most-significant-bits (MSBs) of the 9-b SAR, S[8:5], at every rising edge of the CLK 4 which has 1/4 th frequency of CLK IN. Then the 5 MSBs of the SAR, S[8:4], are converted to the thermometer codes, C[31:0], by the 5-to-32 decoder with an initial value of S[4] = [1]. Fig. 2 depicts the operation of the LPS mode in greater detail. The CDL consists of 32 cascaded NAND-based delay elements (DEs) and the C[31:0] bits are used for controlling the number of active DEs. Depending on the C[31:0] bits, the number of active DEs is changed from 1 (@ Step1) to 16 (@ Step5). The LPS algorithm is simple and insensitive to supply noise. With this algorithm, the harmonic locking problem can be inherently avoided because it searches for the locking point by monotonically increasing the CDL delay in five steps before normal operation. Referring to Fig. 1 and 2, the DLL starts at Step 1 with one active DE (#1) when the control codes R[3:0] = [0000] and S[8:4] = [00001]. In step 1, the position of CLK OUT is at point A as shown in Fig. 2. Since the CLK OUT leads CLK IN at this point A, the output signal Comp of the PD remains in logic low, which means the DLL needs to increase the DCDL delay for phase locking. Therefore, the ringcounter moves to the next sequence Step 2 by increasing the number of active DEs to two (#1 and #2) with R[3:0] = [0001] and S[8:4] = [00010], resulting in the position of CLK OUT being moved to point B. Since the Comp still remains in low in Step 2, the LPS mode moves to Step 3 (S[8:4] = [00100]) and the number of active DE becomes four (#1 ~ #4) and the position of CLK OUT is at point C. When the LPS mode moves to Step 4 (S[8:4] = [01000]) with eight active DEs (#1 ~ #8), the position of CLK OUT is at point D and the Comp becomes logic high. This means the CLK OUT lags CLK IN and the proper locking point is located in between points C and D Then the binary search mode enable (BSM EN ) signal becomes logic high, and this completes the LPS mode
828 JONGSUN KIM : AN ALL-DIGITAL DELAY-LOCKED LOOP USING A LOCK-IN PRE-SEARCH ALGORITHM FOR Fig. 3. Schematic of the DCDL with CDL and FDL. and performs the BS mode by resetting the S[8:4] bits to the previous state (= Step 3 with S[8:4] = [00100]) at the rising edge of CLK 4. Here, when the DLL enters the BS mode with = [001000000], only the six leastsignificant-bits (LSBs), S[5:0], are used for controlling the binary search since S[k] is 1 where k = 7. After the BS mode, the 9-b SAR is transformed into a sequential counter and the DLL performs the sequential search, maintaining a closed loop to preserve the fine phase locking. The variable propagation delay of the FDL is controlled by the 4 LSBs of the SAR, S[3:0]. To support the frequency ranges (0.3 to 1.6 GHz) of both DDR3 and DDR4, the programmable DLL delay needs to be in the range from 0.625 ns to 3.33 ns. The propagation delay of the DE (t DE ) is around 120 ps and the programmable delay amount of the FDL is equal to one t DE, resulting in a small delay resolution of t DE /2 4 = 7.5 ps in this design. The LPS mode requires a maximum five CLK 4 cycles and the BS mode requires a maximum eight CLK 4 cycles. Fig. 3 shows the schematic of the DCDL, which consists of a CDL and a FDL. The CDL is a cascaded lattice delay unit (LDU) which is a NDND-based delay cell [9-11]. The CDL contains thirty-two conventional LDUs. The FDL consists of two inverters (INV1 and INV2) and a 4-bit feedback delay element (FDE) introduced in [12]. This FDE utilizes positive feedback to achieve a variable switching threshold, resulting in a digitally adjustable linear propagation delay [12]. The DCDL shows monotonic delay behavior with respect to the 9-bit digital control input. (c) Fig. 4. Die microphotograph and layout, Test CoB of the proposed DLL, (c) Schematic of the test CoB. III. EXPERIMENTAL RESULTS The proposed all-digital DLL was implemented in a 0.13-mm 1.2 V CMOS process and tested in a chip-onboard (CoB) assembly. Fig. 4 shows a die microphotograph, layout, and the test CoB of the proposed DLL, which occupies an active area of 0.046 mm 2. Fig. 4(c) displays the schematic of the test CoB to measure the CLK IN and CLK OUT signals. Fig. 5 shows the simulated locking process of the proposed digital DLL in detail. Fig. 5 shows the case where the initial locking point is between Step3 and Step4 at 500 MHz, where it takes 44 input clock cycles for phase locking. Fig. 5 is the simulation result when the initial locking point is after Step5 at 250 MHz. As shown in Fig. 5, if the initial locking point is not found in the LPS mode, the mode
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.6, DECEMBER, 2017 829 Step1 Step2 Step3 Step4 0000 0001 0010 0100 0010 CLKIN COMP BSMEN Start END End of process CLKOUT Lock-in Pre-Search (LPS) Sequential Step1 Step2 Step3 Step4 Step5 0000 0001 0010 0100 1000 1100 CLKIN COMP BSMEN Start END End of process CLKOUT Lock-in Pre-Search (LPS) Fig. 5. Simulated locking process When the initial locking point is located between Step3 and Step4 at 500 MHz, When the initial locking point is located after Step5 at 250 MHz. proceeds and a maximum locking time of 52 input clock cycles is required. After locking, the DLL maintains lock-in status in the sequential mode. Fig. 6 shows the measured locking process of the proposed DLL. After locking, the DLL maintains lock-in status in the sequential mode. As shown in Fig. 7, the measured peak-to-peak (p-p) output clock jitters are 20 ps and 7.0 ps at 0.15 GHz and 2.2 GHz, respectively. The proposed DLL achieves a frequency range of 0.15 2.2 GHz and dissipates 3.1 mw and 7.2 mw at 1 GHz and 2.2 GHz, respectively. As shown in Table 2, compared with the state-of-the-art DDR3/DDR4 DLLs, the proposed DLL achieves lower jitter and consumes a lower level of power, while maintaining a fast locking Start IN CLK OUT CLK 1 2 3 4 1 2 3 4 Fig. 6. Measured locking process. Phase Locked Sequential mode time of 52 cycles at the most without exhibiting any harmonic lock problem. Although other recent digital DLL architectures [6, 7] achieve good performances, [6]
830 JONGSUN KIM : AN ALL-DIGITAL DELAY-LOCKED LOOP USING A LOCK-IN PRE-SEARCH ALGORITHM FOR all-digital DLL for DDR3 and DDR4 is presented in this paper. By adopting a new LPS algorithm, the DLL achieves a wide operating frequency range without incurring the harmonic lock problem. The proposed LPS algorithm is very simple but effective in finding the approximate locking point before the normal binary search tracking, enabling a noise-tolerant, anti-harmonic and wide-range frequency operation. The proposed alldigital DLL can be easily adopted in DDR3 and DDR4 SDRAMs. ACKNOWLEDGMENTS Fig. 7. Measured peak-to-peak output clock jitter 0.15 GHz, 2 GHz. Table 2. Performance summary and comparison of state-of-theart DDR3/DDR4 DLLs [1] TCAS-I 12 [2] TCAS-II 15 [3] JSSC 13 Application DDR3 DDR3 DDR4 Process & Supply 45 nm 1.1 V 65 nm 1.1 V 30 nm 1.14 V is not able to deal with the RCP delay and the harmonic lock issue has not been considered as well. Although [7] achieves the highest operating frequency of 5 GH, it has a limited low frequency range of 1.5 GHz. IV. CONCLUSION [4] TCAS-II 16 DDR 3&4 65 nm 1.2 V This Work DDR 3&4 130 nm 1.2 V Active area (mm 2 ) 0.01 0.017 N/A 0.04 0.046 Frequency range (GHz) Locking time (cycles) 0.4-0.8 0.4-0.8 1.65 0.12-2.0 0.15-2.2 N/A 41 500 N/A 52 Anti-harmonic Lock O O X O O p-p jitter (ps) @GHz Power (mw@ghz) 17.8 @0.8 3.3 @ 0.8 26.1 @0.8 3.52 @ 0.8 < 50 ps @1.3 N/A 14 @2 6.6 @2 7.0 @2.2 3.1@1 7.2@2.2 A harmonic-free, fast-locking, wide frequency range, This work was supported by the KIAT grant funded by the Korean government (MOTIE: Ministry of Trade, Industry & Energy, HRD Program for Software-SoC convergence. No. N0001883). The EDA tools were supported by IDEC. REFERENCES [1] H. Kang, et al, Process variation tolerant alldigital 90 phase shift DLL for DDR3 interface, IEEE Trans. Circuits Syst, 59, pp. 2186-2196, 2012. [2] D. Jung, et al, All-digital fast-locking delaylocked loop using a cyclic-locking loop for DRAM, IEEE Trans. Circuits Syst. II, 62, pp. 1023-1027, 2015. [3] K. Sohn, et al, A 1.2 V 30 nm 3.2 Gb/s/pin 4 Gb DDR4 SDRAM with dual-error detection and PVTtolerant data-fetch scheme, IEEE Trans. Circuits Syst., 48, pp. 168-177, 2006. [4] J. Lim, et al, A delay locked loop with a feedback edge combiner of duty-cycle corrector with a 20% 80% input duty cycle for SDRAMs, IEEE Trans. Circuits Syst. II, 63, pp. 141-145, 2016. [5] R. Yang, et al, A 40 550 MHz harmonic-free alldigital delay-locked loop using a variable SAR algorithm, IEEE J. Solid-State Circuits, 42, pp. 361-373, 2007. [6] J. Wang, et al, An all-digital delay-locked loop using an in-time phase maintenance scheme for low-jitter gigahertz operations, IEEE Trans. Circuits Syst., 62, pp. 395-404, 2015. [7] D. Lee and Jongsun Kim, 5 GHz all-digital delaylocked loop for future memory systems beyond
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.6, DECEMBER, 2017 831 double data rate 4 synchronous dynamic random access memory, IET Electronics Letters, 51, pp. 1973-1975, 2015. [8] L. Wang, et al, An implementation of fast-locking wide-range 11-bit reversible SAR DLL, IEEE Trans. Circuits Syst. II, 57, pp. 421-425, 2010. [9] S. Han and Jongsun Kim, A high-resolution widerange dual-loop digital delay-locked loop using a hybrid-search algorithm, IEEE Asian Solid State Circuits Conference, pp. 293-296, 2012. [10] Jongsun Kim and S. Han, A high-resolution dualloop digital DLL, J. of Semiconductor Technology and Science, 16, pp. 520-527, 2016. [11] Rong-Jyi Yang, Shen-Iuan Liu, A 40-550MHz harmonic-free all-digital delay-locked loop using a variable SAR algorithm, IEEE J. Solid-State Circuits, vol. 42, No. 2, Feb 2007, pp. 361-373. [12] S. Han, T. Kim, and Jongsun Kim, A 0.1-1.5 GHz all-digital phase inversion delay-locked loop, IEEE Asian Solid State Circuits Conference, pp. 341-344, 2013. [13] D. Park, G. Park, and Jongsun Kim, A 0.15 to 2.2 GHz all-digital delay-locked loop, IEEE International NEWCAS Conference, pp. 261-264, 2017. Jongsun Kim received his Ph.D. degree in electrical engineering from the University of California, Los Angeles (UCLA) in 2006 in the field of Integrated Circuits and Systems. He was a postdoctoral fellow at UCLA from 2006 to 2007. From 1994 to 2001 and from 2007 to 2008, he was with Samsung Electronics as a senior research engineer in the DRAM Design Team, where he worked on the design and development of Synchronous DRAMs, SGDRAMs, Rambus DRAMs, DDR3 and DDR4 DRAMs. Dr. Kim joined the School of Electronic & Electrical Engineering, Hongik University in March 2008. Professor Kim s research interests are in the areas of high-performance mixed-signal circuits and systems design. His current research areas include high-speed and low-power transceiver circuits for chip-to-chip communications, clock recovery circuits (PLLs/DLLs/ CDRs), frequency synthesizers, signal integrity and power integrity, ultra low-power memories, power-management ICs (PMICs), RF-interconnect circuits, and low-power memory interface circuits and systems.