Reduction of Minimum Operating Voltage (V DDmin ) of CMOS Logic Circuits with Post-Fabrication Automatically Selective Charge Injection

Similar documents
Difficulty of Power Supply Voltage Scaling in Large Scale Subthreshold Logic Circuits

CURRENTLY, near/sub-threshold circuits have been

0.5-V Input Digital Low-Dropout Regulator (LDO) with 98.7% Current Efficiency in 65 nm CMOS

SCALING power supply has become popular in lowpower

HARVESTING energy from the environment by using

Process-sensitive Monitor Circuits for Estimation of Die-to-Die Process Variability

A Design Comparison of Low Power 50 nm Technology Based Inverter with Sleep Transistor and MTCMOS Scheme

Analysis and Design of Low Power Ring Oscillators with Frequency ~ khz

NOVEL OSCILLATORS IN SUBTHRESHOLD REGIME

Robust Subthreshold Circuit Designing Using Sub-threshold Source Coupled Logic (STSCL)

DESIGN OF MODIFY WILSON CURRENT MIRROR CIRCUIT BASED LEVEL SHIFTERS USING STACK TECHNIQUES

Technical Paper FA 10.3

Leakage Power Reduction for Logic Circuits Using Variable Body Biasing Technique

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

Analyzing Combined Impacts of Parameter Variations and BTI in Nano-scale Logical Gates

A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique

BIOLOGICAL and environmental real-time monitoring

Sub-threshold Logic Circuit Design using Feedback Equalization

CMOS Inverter & Ring Oscillator

SIZE is a critical concern for ultralow power sensor systems,

Low Power Design of Schmitt Trigger Based SRAM Cell Using NBTI Technique

Dynamic-static hybrid near-threshold-voltage adder design for ultra-low power applications

TO ENABLE an energy-efficient operation of many-core

12-nm Novel Topologies of LPHP: Low-Power High- Performance 2 4 and 4 16 Mixed-Logic Line Decoders

Design of low power SRAM Cell with combined effect of sleep stack and variable body bias technique

THE energy consumption of digital circuits can drastically

Design of Low-Power High-Performance 2-4 and 4-16 Mixed-Logic Line Decoders

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm

An accurate track-and-latch comparator

Computer Architecture (TT 2012)

Extremely Low Power Digital and Analog Circuits

A Clock Generating System for USB 2.0 with a High-PSR Bandgap Reference Generator

Low Power VLSI Circuit Design with Fine-Grain Voltage Engineering

A 315 MHz Power-Gated Ultra Low Power Transceiver in 40 nm CMOS for Wireless Sensor Network

ISSCC 2004 / SESSION 15 / WIRELESS CONSUMER ICs / 15.7

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

A Variable-Frequency Parallel I/O Interface with Adaptive Power Supply Regulation

Reduce Power Consumption for Digital Cmos Circuits Using Dvts Algoritham

[Vivekanand*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Extreme Temperature Invariant Circuitry Through Adaptive DC Body Biasing

Low Power Realization of Subthreshold Digital Logic Circuits using Body Bias Technique

Electronic Circuits EE359A

[Singh*, 5(3): March, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785

Variation-sensitive Monitor Circuits for Estimation of Die-to-Die Process Variation

Study of Pattern Area of Logic Circuit. with Tunneling Field-Effect Transistors

ESD-Transient Detection Circuit with Equivalent Capacitance-Coupling Detection Mechanism and High Efficiency of Layout Area in a 65nm CMOS Technology

Atypical op amp consists of a differential input stage,

REDUCING power consumption and enhancing energy

A Novel Low-Power Scan Design Technique Using Supply Gating

AS THE semiconductor process is scaled down, the thickness

A gate sizing and transistor fingering strategy for

Read/Write Stability Improvement of 8T Sram Cell Using Schmitt Trigger

Design of Low Power Vlsi Circuits Using Cascode Logic Style

LOW POWER HIGH PERFORMANCE DECODER USING SWITCH LOGIC S. HAMEEDA NOOR 1, T.VIJAYA NIRMALA 2, M.V.SUBBAIAH 3 S.SALEEM 4

! Review: MOS IV Curves and Switch Model. ! MOS Device Layout. ! Inverter Layout. ! Gate Layout and Stick Diagrams. ! Design Rules. !

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

! Review: MOS IV Curves and Switch Model. ! MOS Device Layout. ! Inverter Layout. ! Gate Layout and Stick Diagrams. ! Design Rules. !

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

Jack Keil Wolf Lecture. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. MOSFET N-Type, P-Type.

A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation

Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance

A Low-Jitter Phase-Locked Loop Based on a Charge Pump Using a Current-Bypass Technique

A novel high performance 3 VDD-tolerant ESD detection circuit in advanced CMOS process

NEW WIRELESS applications are emerging where

DESIGN AND PERFORMANCE VERIFICATION OF CURRENT CONVEYOR BASED PIPELINE A/D CONVERTER USING 180 NM TECHNOLOGY

Characterization of 6T CMOS SRAM in 65nm and 120nm Technology using Low power Techniques

A Comparative Study of Dynamic Latch Comparator

Implementation of dual stack technique for reducing leakage and dynamic power

Leakage Diminution of Adder through Novel Ultra Power Gating Technique

Dynamic Voltage and Frequency Scaling for Power- Constrained Design using Process Voltage and Temperature Sensor Circuits

Designing of Low-Power VLSI Circuits using Non-Clocked Logic Style

ISSCC 2001 / SESSION 11 / SRAM / 11.4

Intellect Amplifier, Current Clasped and Filled Current Approach Sense Amplifiers Techniques Based Low Power SRAM

Design and Analysis of Sram Cell for Reducing Leakage in Submicron Technologies Using Cadence Tool

Low Power Design of Successive Approximation Registers

A 82.5% Power Efficiency at 1.2 mw Buck Converter with Sleep Control

PERFORMANCE ANALYSIS ON VARIOUS LOW POWER CMOS DIGITAL DESIGN TECHNIQUES

Pramoda N V Department of Electronics and Communication Engineering, MCE Hassan Karnataka India

A HIGH EFFICIENCY CHARGE PUMP FOR LOW VOLTAGE DEVICES

Full-Custom Design Fractional Step-Down Charge Pump DC-DC Converter with Digital Control Implemented in 90nm CMOS Technology

Fractional- N PLL with 90 Phase Shift Lock and Active Switched- Capacitor Loop Filter

Analysis of Low Power-High Speed Sense Amplifier in Submicron Technology

A Robust Oscillator for Embedded System without External Crystal

Bootstrapped ring oscillator with feedforward inputs for ultra-low-voltage application

Design of a Low Power 5GHz CMOS Radio Frequency Low Noise Amplifier Rakshith Venkatesh

Variation-Aware Design for Nanometer Generation LSI

TECHNICAL REPORT. On the Design of a Negative Voltage Conversion Circuit. Yiorgos E. Tsiatouhas

Design of Multiplier Using CMOS Technology

VARIOUS subthreshold circuits have been proposed for

Techniques for On-Chip Process Voltage and Temperature Detection and Compensation

DESIGNING OF SRAM USING LECTOR TECHNIQUE TO REDUCE LEAKAGE POWER

/$ IEEE

Power Spring /7/05 L11 Power 1

Quadrature GPS Receiver Front-End in 0.13μm CMOS: The QLMV cell

All Digital on Chip Process Sensor Using Ratioed Inverter Based Ring Oscillator

Separation of Effects of Statistical Impurity Number Fluctuations and Position Distribution on V th Fluctuations in Scaled MOSFETs

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

DESIGN AND ANALYSIS OF LOW POWER CHARGE PUMP CIRCUIT FOR PHASE-LOCKED LOOP

A 0.6 V Input CCM/DCM Operating Digital Buck Converter in 40 nm CMOS

Transcription:

Reduction of Minimum Operating Voltage (V min ) of CMOS Logic Circuits with Post-Fabrication Automatically Selective Charge Injection Kentaro Honda, Katsuyuki Ikeuchi, Masahiro Nomura *, Makoto Takamiya and Takayasu Sakurai University of Tokyo, Tokyo, Japan *Semiconductor Technology Academic Research Center (STARC), Yokohama, Japan WL=V Abstract In order to reduce minimum operating voltage (V min ) of CMOS logic circuits, a new method reducing the within-die random threshold (V TH ) variation of transistors by a post-fabrication automatically selective charge injection using substrate hot electrons (SHE) is proposed along with novel circuitry to utilize this. In the new circuit, switches are added to combinational logic circuits in order to turn them into latch loops. In order to reduce V min, design guides on the optimal (1) loop topology, (2) number of stages in a loop, (3) V TH shift per charge injection, and (4) number of charge injection trials are explored through simulations. By applying the proposed scheme to 96- stage inverter chain fabricated in 65-nm CMOS, the measured reduction of V min from 94mV to 74mV is successfully demonstrated for the first time. I. INTROUCTION Energy efficient operation of CMOS logic circuits enabled by reducing the power supply voltage (V ) is strongly required and a lot of sub/near-threshold logic circuits are reported [1-5]. The V scaling, however, is hindered by the minimum operating voltage (V min ) [6] of CMOS logic gates. V min is the minimum power supply voltage when the circuits operate without function errors. Timing errors are not considered in this paper. V min increases with increasing number of logic gates and CMOS technology down-scaling, because V min is determined by the random transistor variations [6]. The trend of increasing V min is a serious problem in the design of future ultra low voltage (V <.4V) logic circuits. A straightforward method to reduce the random transistor variations is to increase the size of transistors, which is not practical. An alternative post-fabrication selfconvergence scheme for suppressing the random variability is proposed in [7-8]. The threshold voltage (V TH ) variation is reduced by the substrate hot electron (SHE) stress [7] or BTI stress [8] for SRAM cells and the drain avalanche hot carrier (AHC) stress for logic transistors [7], respectively. SHE or BTI stress is effective only for two inverter latch in the SRAM cell and is not effective for logic circuits, because it is difficult to form the two inverter latch in random logic circuits. AHC is not practical for logic circuits, because AHC requires half V C biasing to the gate of all transistors in logic circuits and AHC has large C current during the stress. In this paper, in order to reduce V min of CMOS logic circuits, a new method reducing the within-die random V TH variation of transistors by a post-fabrication automatically selective charge injection using SHE is proposed along with novel circuitry to utilize this. BL Voltage 3.5V V Threshold voltage of nmos INV1 V TH1 V V TH1 V TH2 V V 1 V 2 M1 M2 V pwell (= -7V) Initial : V TH1 >V TH2 V 2 V 1 INV2 V TH2 Time BL Selective charge injection V TH1 is constant. V TH2 is increased due to SHE. V TH1 >V TH2 is detected. V SHIFT Initial 1 2 3 4 5 6 7 (c) Fig. 1 Automatically selective charge injection scheme in SRAM cell. Schematic of SRAM cell. Waveforms applied to SRAM cell for automatically selective charge injection scheme. (c) ependence of V TH1 and V TH2 on number of charge injection trials. 978-1-61284-66-6/11/$26. 211 IEEE 175

The remainder of this paper is organized as follows. Section II presents the concept of the proposed post-fabrication automatically selective charge injection scheme and the proposed circuit. Section III presents design guides for the proposed circuit on the optimal (1) loop topology, (2) number of stages in a loop, (3) V TH shift per charge injection, and (4) number of charge injection trials. Section IV describes the details of the fabricated 96-stage inverter chain test chips in 65- nm CMOS and the measured reduction of V min. Finally, Section V concludes this paper. Combinational logic II. PROPOSE POST-FABRICATION AUTOMATICALLY SELECTIVE CHARGE INJECTION SCHEME Original concept of automatically selective charge injection scheme in SRAM cell is explained. Then, the concept is expanded to logic circuit applications. A. Original Concept of Automatically Selective Charge Injection Scheme for SRAM Cell Fig. 1 shows a schematic of an SRAM cell and Fig. 1 shows waveforms applied to the SRAM cell for the automatically selective charge injection scheme [7]. A negative (e.g. -7V) p-well bias (V pwell ) is applied to M1 and M2. Then, V is increased from V to a high voltage (e.g. 3.5V) and the high voltage is kept for a while (e.g. 1 min). When V TH of M2 (V TH2 ) is lower than V TH of M1 (V TH1 ), V 1 goes to V during the ramp of V, thereby only V TH2 is increased due to the SHE stress, because 3.5V is applied to V 2 instead of V 1. This is the concept of automatically selective charge injection, because either M1 or M2 with lower V TH is automatically selected and V TH of the transistor with the lower V TH is increased by the charge injection due to the SHE stress. The V TH shift due to the charge injection is nonvolatile. As shown in Fig. 1(c), by repeating the charge injection process, the mismatch between V TH1 and V TH2 is reduced [8]. B. Proposed Automatically Selective Charge Injection Scheme for Logic Circuits Fig. 2 shows a schematic of a normal logic circuit. In order to apply the concept of automatically selective charge injection scheme for SRAM cell into the logic circuit, latch loops should be introduced in the logic circuit. Figs. 2 and (c) show schematics of the proposed logic circuit with the automatically selective charge injection scheme, where switches are added to combinational logic circuits in order to turn them into latch loops. Fig. 2 shows a normal logic operation mode and Fig. 2(c) shows a latch mode for automatically selective charge injection scheme. Ideally, all logic gates should be included in the latch loops. The inputs of each latch loop should be adequately clamped to V or V SS in order to achieve the latch operation. For example, the input of 2NAN is clamped to V and the input of 2NOR is clamped to V SS. How to exhaustively add the switches to random combinational logic circuits in order to form the latch loops is out of the scope of this paper. By repeating the charge injection process as shown in Figs. 1 and (c), the within-die random V TH variation is reduced, thereby reducing V min of the logic circuit. The charge injection could be performed at preshipment test, because the charge injection is nonvolatile. Combinational logic Combinational logic (c) Fig. 2 Schematic of a logic circuit. Normal logic circuit. Proposed logic circuit with automatically selective charge injection scheme in normal logic operation mode. (c) Proposed logic circuit in latch mode. 176

In In In 2 2 2 Fig. 3 Cascaded loop 2 Probability density function (PF).3.25.2.15.1.5 V SHIFT / INIT =4% Initial = INIT Shift of mv SHIFT /2 m=5 =.62 INIT m=4 =.29 INIT Thereshold voltage of nmos(a.u.) Fig. 5 Simulated distributions of V TH of nmos with different number of charge injection trials (m) in staggered loop with and V SHIFT / INIT =4%. In 1 1 Because the high voltages shown in Fig. 1 would be supplied from a tester, high voltage generators are not required. III. OPTIMAL IMPLEMENTATION OF AUTOMATICALLY SELECTIVE CHARGE INJECTION SCHEME In this section, in order to effectively reduce V min, design guides on the optimal (1) loop topology, (2) number of stages in a loop, (3) V TH shift per charge injection, and (4) number of charge injection trials are explored through simulations. Two loop topologies for the charge injection scheme are compared. Fig. 3 shows a cascaded loop topology and Fig. 4 shows a staggered loop topology. 2n-stage inverters are included in each latch loop. In Figs. 3 and 4, the combinational logic circuit is simplified to an inverter chain. In Fig. 3, each latch loop is serially connected and the cascaded loop has only one latch mode. In contrast, the staggered loop in Fig. 4 has two latch modes. Fig. 4 shows a normal logic operation mode, Fig. 4 shows an odd-loop latch mode, and Fig. 4(c) shows an even-loop latch mode. In order to investigate the V min reduction by the charge injection scheme, V TH variation of nmos is simulated with a Monte Carlo simulation using Matlab. Reducing V TH variation of either nmos or pmos is enough, because V min of each logic gate is determined by the balance between nmos and pmos transistors in each logic gate [9]. Therefore, the automatically selective charge injection is applied to only nmos transistors. Fig. 5 shows simulated distributions of V TH of nmos with different number of charge injection trials (m) in a staggered loop with. The normal distribution is assumed for the initial distributions of V TH. The initial and current (c) Fig. 4 topology. Normal logic operation mode. Odd-loop latch mode. (c) Even-loop latch mode. / INIT (%) / INIT (%) 8 6 4 2 V SHIFT / INIT =4% 1 2 3 4 5 6 Fig. 6 Simulated dependence of / INIT on number of charge injection trials of the cascaded loop and the staggered loop at and V SHIFT / INIT =4%. 14 12 1 8 6 4 2 n=6 n=2 n=3-42% Cascaded loop V SHIT / NIT =4% 1 2 3 4 5 6 Fig. 7 Simulated dependence of / INIT on number of charge injection trials with different n at V SHIFT / INIT =4%. 177

/ INIT (%) / INIT (%) 14 12 1 8 6 4 2 14 12 1 8 6 4 2 V SHIFT / INIT =2% 1% Minimum 1% 4% 2% 1 1 1 1 V SHIFT / INIT =2% n=2 Minimum 1 1 1 1 Fig. 8 Simulated dependence of / INIT on number of charge injection trials with different V SHIFT / INIT.. n=2. standard deviation of V TH is defined as INIT and, respectively. As shown in Fig. 1(c), V TH shift per charge injection is defined as V SHIFT and V SHIFT / INIT =4% is assumed in Fig. 5. The simulation steps to calculate the distributions of V TH using Matlab are: (1) 1k random numbers are generated, (2) the random numbers are divided into groups including 2n numbers, (3) the minimum number in the 2n numbers is selected in each group, and (4) the minimum number and the every other numbers are increased by V SHIFT. In Fig. 5, is successfully reduced by increasing m, while average V TH increases by mv SHIFT /2. In the proposed charge injection scheme, the average V TH increase is compensated by the forward body bias to nmos. Fig. 6 shows the simulated dependence of / INIT on number of charge injection trials of the cascaded loop and the staggered loop at and V SHIFT / INIT =4%. The / INIT of the staggered loop is reduced by 42% compared with that of the cascaded loop, because the cascaded loop can not compensate for an inter-loop mismatch. Therefore, only the staggered loop is used in the rest of this paper. Minimum / INIT (%) 1 8 6 4 V SHIFT / INIT =1% 2% 4% 2% 1% 4% 2% 1% n=2 4% 2 Staggered 1% 2% 4% loop 2% 1 1 1 1 Optimum number of charge injection trials Fig. 9 Simulated dependence of minimum / INIT on optimum number of charge injection trials with different V SHIFT / INIT at and n=2. Fig. 7 shows the simulated dependence of / INIT on number of charge injection trials with different n at V SHIFT / INIT =4%. The minimum / INIT at is 29%, while the minimum / INIT at n=2, 3, and 6 are 87%, 94%, and 99%, respectively. The large difference between and 2 is investigated in details. Fig. 8 shows the simulated dependence of / INIT on number of charge injection trials with different V SHIFT / INIT at (Fig. 8) and n=2 (Fig. 8). In order to clarify the difference between and 2, the minimum / INIT point is extracted from Fig. 8 and plotted in Fig. 9. Fig. 9 shows the simulated dependence of minimum / INIT on optimum number of charge injection trials with different V SHIFT / INIT at and n=2. The minimum / INIT reduces with decreasing V SHIFT at. The minimum / INIT is 6.2% at V SHIFT / INIT = 2%, while the optimum number of charge injection trials is 3515, which is not practical because large number of charge injection trials increases the pre-shipment test cost. Therefore, The minimum / INIT of 52% at V SHIFT / INIT = 1% and the number of trials of 9 or the minimum / INIT of 29% at V SHIFT / INIT = 4% and the number of trials of 4 will be a practical choice. In contrast, at n=2, the minimum / INIT is more than 8% even if V SHIFT / INIT is 2%, because the mismatch within each loop is not completely compensated at n=2. Therefore, is used in the rest of this paper. IV. MEASUREMENT RESULTS The proposed automatically selective charge injection scheme is verified with measurements. Fig. 1 shows measured dependence of drain current on gate voltage of nmos transistor in 1.2V 65nm CMOS process before and after the charge injection by SHE. V TH of 36mV was obtained at the charge injection condition of V GS =3.5V, V S =V, V pwell = -7V, and 5 min. 1 3 178

rain current 1 A 1 A Charge injection condition Before Injection 1 A After 5-min Injection ΔVTH=36mV In V 1nA.2.4.6.8 1. Vpwell=-7V 3.5V 1nA Charge injection by SHE 1nA 1pA. V=3.2V Vpwell= -7V 1min 96-stage inverters Fig. 11 Fabricated 96-stage inverter chain with the staggered loop. 191 CMOS transfer gates are added to original 96 inverters for the chain. 1.2 VGS(V) Fig. 1 Measured dependence of drain current on gate voltage of nmos transistor of 1.2V 65nm CMOS process before and after the charge injection by SHE. 67µm Fig. 11 shows a schematic of a fabricated 96-stage inverter chain with the staggered loop. When both and are H, the circuit operates in the normal logic operation mode as shown in Fig. 4. When is H and is L, the circuit operates in the odd-loop latch mode as shown in Fig. 4. When is L and is H, the circuit operates in the evenloop latch mode as shown in Fig. 4(c). The charge injection is applied at V=3.2V, Vpwell= -7V, and 1 min per injection. 191 CMOS transfer gates to make the staggered loop are added to original 96 inverters for the chain. Area penalty due to the proposed circuit for the automatically selective charge injection scheme is discussed. The area of the proposed circuit is about three times of that of the original 96-stage inverter chain, because the number of logic gates increase from 96 to 287. According to the Pelgrom plot, / INIT is reduced to 1 3 (=.58) by tripling the transistor area. Therefore, the proposed charge injection scheme makes sense when / INIT is less than 1 3. As shown in Fig. 9, / INIT less than 1 3 is achieved at the optimum number of charge injection trials larger than 9. Thus, the proposed charge injection scheme is more effective in reducing than simply increasing the transistor area. 28µm (Layout) 8µm 32µm Fig. 12 The chip micrograph and core area layout of the 96-stage inverter chain. 12 Vmin (mv) 1 The chip micrograph and core layout of the 96-stage inverter chain shown in Fig. 11 are shown in Fig. 12. The test chip was implemented in 1.2V 65-nm CMOS process. The size of core is 32 m by 8 m. Fig. 13 shows measured dependence of Vmin of the inverter chain shown in Fig. 11 on Vpwell. The number of charge injection trials is varied. Charge injection trials of odd-loop latch mode and even-loop latch mode are performed alternately. Vmin is defined as the minimum operating V whether 1-Hz rectangular wave is observed or not from the output of the inverter chain. To compensate for the global variation between pmos and nmos, Vpwell is tuned to find the minimum Vmin. Vpwell of the minimum Vmin is increased as the numbers of trials increases, because the average VTH of nmos is increased. 179 8 Initial 4times 6times 4 Number of charge injection trials 2-21% Minimum Vmin 6 1 2 Vpwell (mv) 3 4 Fig. 13 Measured Vmin of the inverter chain with various number of charge injection trials.

Minimum V min (mv) 1 9 8 7 6 5 4 3 2 1 In order to clarify the trend of the minimum V min, the minimum V min point is extracted from Fig. 13 and plotted in Fig. 14. In Fig. 13, all the measured points are not shown for simplicity. Fig. 14 shows the measured dependence of minimum V min on number of charge injection trials. The minimum V min is the lowest at 6-time charge injection trials. The initial minimum V min is 94mV when V pwell is 12mV. After 6-time charge injection trials, the minimum V min is 74mV when V pwell is 25mV. Therefore, V min is reduced by 21% from 94mV to 74mV. V. CONCLUSION Best -21% Initial 1 2 3 4 5 6 7 8 Fig. 14 Measured dependence of minimum V min on number of charge injection trials. The minimum V min points are extracted from Fig. 13. In order to reduce minimum operating voltage (V min ) of CMOS logic circuits, a new method to reducing the within-die random threshold (V TH ) variation of transistors by the postfabrication automatically selective charge injection using substrate hot electrons (SHE) is proposed along with novel circuitry to utilize this. The charge injection could be performed at pre-shipment test. The circuit with the staggered loop topology and is the best implementation for the automatically selective charge injection scheme. The minimum / INIT of 29% at V SHIFT / INIT = 4% and the number of trials of 4 is one of a practical design choices. By applying the proposed scheme to 96-stage inverter chain fabricated in 65-nm CMOS, the measured V min is successfully reduced by 21% from 94mV to 74mV. ACKNOWLEGMENT This work was carried out as a part of the Extremely Low Power (ELP) project supported by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology evelopment Organization (NEO). REFERENCES [1] J. Kwong, Y. Ramadass, N. Verma, M. Koesler, K. Huber, H. Moormann, and A. Chandrakasan, A 65 nm sub-vt microcontroller with integrated SRAM and switched capacitor C-C converter, IEEE J. Solid-State Circuits, vol. 44, pp. 115-126, Jan. 29. [2] Y. Pu, J.P. Gyvez, H. Corporaal, and H. Yajun, An ultra-lowenergy/frame multi-standard JPEG co-processor in 65nm CMOS with sub/near-threshold power supply, International Solid-State Circuits Conference (ISSCC), pp. 146-147, Feb. 29. [3] A. Agarwal, S.K. Mathew, S.K. Hsu, M.A. Anders, H. Kaul, F. Sheikh, R. Ramanarayanan, S. Srinivasan, R. Krishnamurthy, and S. Borkar, A 32mV-to-1.2V on-die fine-grained reconfigurable fabric for SP/media accelerators in 32nm CMOS, International Solid-State Circuits Conference (ISSCC), pp. 328-329, Feb. 21. [4] N. Lotze and Y. Manoli, A 62mV.13μm CMOS standard-cell-based design technique using schmitt-trigger logic, International Solid-State Circuits Conference (ISSCC), pp. 34-341, Feb. 211. [5] M. Seok,. Jeon, C. Chakrabarti,. Blaauw, and. Sylvester, A.27V 3MHz 17.7nJ/transform 124-pt complex FFT core with superpipelining, International Solid-State Circuits Conference (ISSCC), pp. 342-343, Feb. 211. [6] T. Niiyama, P. Zhe, K. Ishida, M. Murakata, M. Takamiya, and T. Sakurai, Increasing minimum operating voltage (Vmin) with number of CMOS logic gates and experimental verification with up to 1Mega-stage ring oscillators, International Symposium on Low Power Electronics and esign (ISLPE), pp. 117-122, Aug. 28. [7] M. Suzuki, T. Saraya, K. Shimizu, T. Sakurai, and T. Hiramoto, Postfabrication self-convergence scheme for suppressing variability in SRAM cells and logic transistors, IEEE Symposium on VLSI Technology, pp.148-149, June, 29. [8] J. Wang, S. Nalam, Z. i, R. Mann, M. Stan, and B. Calhoum, Improving SRAM Vmin and yield by using variation-aware BTI stress, IEEE Custom Integrated Circuits Conference (CICC), pp. 5-8, Sep, 21. [9] H. Fuketa, S. Iida, T. Yasufuku, M. Takamiya, M. Nomura, H. Shinohara, and T. Sakurai, A closed-form expression for estimating minimum operating voltage (V min ) of CMOS logic gates, ACM esign Automation Conference, Session 53.1, June 211. 18