CMOS System-on-a-Chip Voltage Scaling beyond 50nm Abstract Circuit and Device Models Introduction

Similar documents
Channel Engineering for Submicron N-Channel MOSFET Based on TCAD Simulation

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

INTERNATIONAL JOURNAL OF APPLIED ENGINEERING RESEARCH, DINDIGUL Volume 1, No 3, 2010

EEC 216 Lecture #10: Ultra Low Voltage and Subthreshold Circuit Design. Rajeevan Amirtharajah University of California, Davis

Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance

Low Power Design of Successive Approximation Registers

4 principal of JNTU college of Eng., JNTUH, Kukatpally, Hyderabad, A.P, INDIA

Digital Electronics. By: FARHAD FARADJI, Ph.D. Assistant Professor, Electrical and Computer Engineering, K. N. Toosi University of Technology

Characterization of Variable Gate Oxide Thickness MOSFET with Non-Uniform Oxide Thicknesses for Sub-Threshold Leakage Current Reduction

NAME: Last First Signature

Why Scaling? CPU speed Chip size R, C CPU can increase speed by reducing occupying area.

Fin-Shaped Field Effect Transistor (FinFET) Min Ku Kim 03/07/2018

A Review of Low-Power and High-Density System LSI

Drive performance of an asymmetric MOSFET structure: the peak device

An Overview of Static Power Dissipation

Session 10: Solid State Physics MOSFET

Semiconductor Memory: DRAM and SRAM. Department of Electrical and Computer Engineering, National University of Singapore

Sleepy Keeper Approach for Power Performance Tuning in VLSI Design

Semiconductor TCAD Tools

Sub-threshold Leakage Current Reduction Using Variable Gate Oxide Thickness (VGOT) MOSFET

Design of 45 nm Fully Depleted Double Gate SOI MOSFET

ECE520 VLSI Design. Lecture 2: Basic MOS Physics. Payman Zarkesh-Ha

CHAPTER 3 TWO DIMENSIONAL ANALYTICAL MODELING FOR THRESHOLD VOLTAGE

Sub-Threshold Region Behavior of Long Channel MOSFET

Design of High Performance Arithmetic and Logic Circuits in DSM Technology

Performance Evaluation of MISISFET- TCAD Simulation

Lecture 33 - The Short Metal-Oxide-Semiconductor Field-Effect Transistor (cont.) April 30, 2007

Reducing the Sub-threshold and Gate-tunneling Leakage of SRAM Cells using Dual-V t and Dual-T ox Assignment

EE241 - Spring 2013 Advanced Digital Integrated Circuits. Announcements. Sign up for Piazza if you haven t already

RECENT technology trends have lead to an increase in

A NEW APPROACH FOR DELAY AND LEAKAGE POWER REDUCTION IN CMOS VLSI CIRCUITS

UNIT-1 Fundamentals of Low Power VLSI Design

Total reduction of leakage power through combined effect of Sleep stack and variable body biasing technique

LOW POWER VLSI TECHNIQUES FOR PORTABLE DEVICES Sandeep Singh 1, Neeraj Gupta 2, Rashmi Gupta 2

Charge-Based Continuous Equations for the Transconductance and Output Conductance of Graded-Channel SOI MOSFET s

45nm Bulk CMOS Within-Die Variations. Courtesy of C. Spanos (UC Berkeley) Lecture 11. Process-induced Variability I: Random

COMPARISON AMONG DIFFERENT CMOS INVERTER WITH STACK KEEPER APPROACH IN VLSI DESIGN

Separation and Extraction of Short-Circuit Power Consumption in Digital CMOS VLSI Circuits

Comparison of Power Dissipation in inverter using SVL Techniques

Two Dimensional Analytical Threshold Voltages Modeling for Short-Channel MOSFET

Performance Optimization of Dynamic and Domino logic Carry Look Ahead Adder using CNTFET in 32nm technology

Alternative Channel Materials for MOSFET Scaling Below 10nm

Low Power Realization of Subthreshold Digital Logic Circuits using Body Bias Technique

Practical Information

Variation-Aware Design for Nanometer Generation LSI

Technical Paper FA 10.3

problem grade total

IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 03, 2014 ISSN (online):

EE241 - Spring 2013 Advanced Digital Integrated Circuits. Projects. Groups of 3 Proposals in two weeks (2/20) Topics: Lecture 5: Transistor Models

LEAKAGE POWER REDUCTION IN CMOS CIRCUITS USING LEAKAGE CONTROL TRANSISTOR TECHNIQUE IN NANOSCALE TECHNOLOGY

Low Power High Performance 10T Full Adder for Low Voltage CMOS Technology Using Dual Threshold Voltage

NOVEL OSCILLATORS IN SUBTHRESHOLD REGIME

MOSFET & IC Basics - GATE Problems (Part - I)

DG-FINFET LOGIC DESIGN USING 32NM TECHNOLOGY

3-D Modelling of the Novel Nanoscale Screen-Grid Field Effect Transistor (SGFET)

A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting

Lecture 6. Technology Trends and Modeling Pitfalls: Transistors in the real world

EEC 216 Lecture #8: Leakage. Rajeevan Amirtharajah University of California, Davis

Performance Comparison of CMOS and Finfet Based Circuits At 45nm Technology Using SPICE

Design and Analysis of Sram Cell for Reducing Leakage in Submicron Technologies Using Cadence Tool

Analog Performance of Scaled Bulk and SOI MOSFETs

Introduction. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

Leakage Power Reduction for Logic Circuits Using Variable Body Biasing Technique

DESIGN AND ANALYSIS OF SUB 1-V BANDGAP REFERENCE (BGR) VOLTAGE GENERATORS FOR PICOWATT LSI s.

Minimizing the Sub Threshold Leakage for High Performance CMOS Circuits Using Stacked Sleep Technique

Temperature-adaptive voltage tuning for enhanced energy efficiency in ultra-low-voltage circuits

Designing and Simulation of Full Adder Cell using Self Reverse Biasing Technique

Pramoda N V Department of Electronics and Communication Engineering, MCE Hassan Karnataka India

Parameter Optimization Of GAA Nano Wire FET Using Taguchi Method

EC 1354-Principles of VLSI Design

SCALING AND NUMERICAL SIMULATION ANALYSIS OF 50nm MOSFET INCORPORATING DIELECTRIC POCKET (DP-MOSFET)

CHAPTER 2 LITERATURE REVIEW

Jeffrey Davis Georgia Institute of Technology School of ECE Atlanta, GA Tel No

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

Leakage Control for Deep-Submicron Circuits

Lecture 31 - The Short Metal-Oxide-Semiconductor Field-Effect Transistor (cont.) April 25, 2007

2009 Spring CS211 Digital Systems & Lab 1 CHAPTER 3: TECHNOLOGY (PART 2)

BICMOS Technology and Fabrication

+1 (479)

Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University

FUTURE PROSPECTS FOR CMOS ACTIVE PIXEL SENSORS

DESIGNING OF SRAM USING LECTOR TECHNIQUE TO REDUCE LEAKAGE POWER

Pushing Ultra-Low-Power Digital Circuits

EECS 427 Lecture 13: Leakage Power Reduction Readings: 6.4.2, CBF Ch.3. EECS 427 F09 Lecture Reminders

Substrate Bias Effects on Drain Induced Barrier Lowering (DIBL) in Short Channel NMOS FETs

TECHNO INDIA BATANAGAR (DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING) QUESTION BANK- 2018

MOS Capacitance and Introduction to MOSFETs

Low Power Design for Systems on a Chip. Tutorial Outline

ECE 5745 Complex Digital ASIC Design Topic 2: CMOS Devices

Trends and Challenges in VLSI Technology Scaling Towards 100nm

Beyond Transistor Scaling: New Devices for Ultra Low Energy Information Processing

Design and Optimization of Half Subtractor Circuits for Low-Voltage Low-Power Applications

Future MOSFET Devices using high-k (TiO 2 ) dielectric

MOSFET short channel effects

DURING the past decade, CMOS technology has seen

Larger-than-Vdd Forward Body Bias in Sub-0.5V Nanoscale CMOS

SUBTHRESHOLD CIRCUIT DESIGN FOR HIGH PERFORMANCE

Contents 1 Introduction 2 MOS Fabrication Technology

3084 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 60, NO. 4, AUGUST 2013

Variability in Sub-100nm SRAM Designs

Transcription:

CMOS System-on-a-Chip Voltage Scaling beyond 50nm Azeez J Bhavnagarwala, Blanca Austin, Ashok Kapoor and James D Meindl Microelectronics Rserch. Cntr. and School of Elec. and Comp. Engr., Georgia Institute of Technology, Atlanta GA 30332 LSI Logic Corporation, Milpitas CA 95035 Abstract The limits on CMOS energy dissipation imposed by subthreshold leakage currents and by wiring capacitance are investigated for CMOS generations beyond 50nm at NTRS projected local and global clock rates for high performance processors. Physical short-channel MOSFET models that consider high-field effects, threshold voltage roll-off and reverse subthreshold swing roll-off are employed in tandem with stochastic interconnect distributions to calculate optimal supply voltage, threshold voltage and gate sizes that minimize total CMOS power dissipation by exploiting trade-offs between saturation drive current and subthreshold leakage current and between device size and wiring capacitance. CMOS power dissipation at its lower limit, increases exponentially with clock frequency imposing limits on performance set by heat removal. Heat removal constraints at high local clock rates, limiting the average wire length and device size within a local zone of synchrony, or macrocell, in a shortwire cellular array architecture are used to project the maximum macrocell size and count for generations beyond 100nm. 1. Introduction The supply voltage for future gigascale integrated systems are projected to scale to 0.37V for the 35nm, 17GHz generation [1] to reduce electric field strengths and also power dissipation (Fig. 1), increases of which are projected to be driven by higher clock rates, higher overall capacitance and larger chip sizes. A key challenge in the design of bulk Si CMOS logic circuits will be to meet the projected performances given the competing requirements of high performance and low standby power at low voltages [1,2,3] in the presence of threshold voltage reductions due to short-channel effects and subthreshold swing increases due to the 2D electrostatic charge coupling between gate and source/drain terminals of the MOSFET. A methodology [4] simultaneously considering the device, circuit and system levels of the design hierarchy and distinguishing local and global clock rates, is employed to minimize total power dissipated from a static CMOS critical path gate during a clock cycle. This methodology assumes a realistic environment of chip size, logic gate count, clock frequency, wiring capacitance, critical path depth and range of operating temperature. This analysis uses physical and stochastic models, verified by HSPICE, MEDICI and actual microprocessor implementations to investigate opportunities to scale V dd to the optimal point corresponding to the limits of CMOS power dissipation where leakage power balances switching power dissipation, and when device capacitance balances wiring capacitance. This work was supported by the Defense Advanced Research Project Agency (Contract: F3361595C1623) and the Semiconductor Research Corporation (SJ-374-002) The analysis considers Retrograde Doped (RD) (Fig.2) MOSFETs the bulk Si alternative to a Uniformly Doped (UD) MOSFET that promises, higher performance and superior scalability [5] (Fig 3). 2. Circuit and Device Models The performance of a generic CMOS processor is modeled assuming a global critical path of 15 [6], 2-way NAND stages, each stage driving average wire lengths (Fig 4). Average wire lengths, in units of gate pitches, are determined (Table 1) from stochastic interconnect distributions [7], derived recursively using Rent s rule, and verified for an actual microprocessor in Fig. 5. In logicintensive CMOS chips, packing densities are interconnect limited [8] where the effective size of a gate is determined by its wireability [9]. The gate pitch is estimated from NTRS projections for microprocessor chip size, and logic transistor count after discounting the extrapolated increases in cache size and cache area for high performance processors (Fig 6). Assuming equal interconnect crosssectional dimensions, and that neighboring wiring planes in a multi-level network provide an approximate ground plane, total capacitance per unit length, including fringing effects, is estimated using analytical models in [10]. Device performance is modeled using compact low-voltage Transregional MOSFET models [11,12] (Figs 7,8,9) that predict circuit performance in the sub-threshold, saturation and linear regions of operation providing continuous and smooth transitions across region boundaries. High fieldeffects on carrier mobility are incorporated by adopting the mobility reduction model in [13]. Smoothness and continuity of the drain current expressions in the triode, saturation and the subthreshold regions are obtained by requiring differentiability and continuity of the product of the effective mobility and the areal charge density of inversion layer carriers. Low field mobility dependence on temperature and doping concentration is estimated using empirical models reported in [14]. The doping profile for the RD structure is selected as one that yields the smallest depletion depth, corresponding to the least DIBL effects for a given V to and gate oxide thickness [15]. Increases in leakage current due to DIBL (Drain Induced Barrier Lowering) effects are calculated using 2D subthreshold models [6] that accurately predict the threshold voltage roll-off and subthreshold swing increase (Fig 10) dependence on supply voltage, device geometries and doping profile. The 2-way NAND gate, as a basic circuit building block in the critical path, has a performance that parallels that of any other circuit actually used in processor critical paths in reflecting technology improvements [16]. The improved delay dependence on fan-in at short channel lengths [17] due to a smaller reduction in the saturation drain current with a rise in the source voltage of the topmost series-connected MOSFET is modeled physically

by calculating the fractional reduction of the normalized saturation drain current for the series-connected struc. [18]. 3. Minimum Power CMOS Random Logic Networks Power drain of a static CMOS gate is minimized by scaling the supply voltage while meeting the performance required by scaling the threshold voltage and increasing the channel widths until further decrease in threshold voltage, increases total power due to a dominating static component [3] (Figs 10, 11) and further increases in device size increase total power due to larger gate sizes [19] (Fig 12). Optimal supply voltage (Fig 13), device threshold voltage and gate sizes are calculated corresponding to a simultaneous solution at these minima (Table 2). For a given wiring load, the performance of a static CMOS gate increases asymptotically with increasing (W/L) ratios, with gate delays reaching past the knee of the asymptotic dependence of delay on channel width. for wiring capacitance less than or equal to 40% of the total load capacitance. This point corresponds to minimum power with respect to gate size where further increases in gate size increases power linearly while permitting only asymptotic reductions in supply voltage. Critical path gates clocked at high local frequencies are assumed to be only 5 stages long and drive wire lengths averaged within a macrocell of a short-wire cellular array architecture (Fig 14). Assuming gates are sized so that wiring capacitance is 40% of the total load, the cell count (Table 3) is calculated using the stochastic interconnect distribution by imposing a maximum heat removal coefficient of 50 W/cm 2 on the average wire length of the cell, calculated using the stochastic distribution. Total CMOS power increases exponentially (Fig 15) for a given generation, with increases in clock frequency due to an exponential rise in the supply voltage necessary to meet shrinking cycle times and the accompanying increases in leakage current due to threshold voltage reductions and subthreshold swing increases. The maximum heat removal coefficients of the package thus impose limits on CMOS performance. 4. Summary and Conclusions The limits on CMOS energy dissipation shown to be imposed by static power and by wiring capacitance, are investigated using a methodology that conjointly employs physical short-channel MOSFET drain current and threshold voltage roll-off and subthreshold swing roll-up models in tandem with stochastic wiring distributions. Optimum supply voltages, device threshold voltages, and device channel widths corresponding to minimum total power are calculated out to year 2014 for local and global critical paths. These projections are consistent with technology and cycle time forecasts by the NTRS. Limits on the performance of CMOS logic circuits are shown to be imposed by total power dissipation which increases exponentially with clock frequency. Limits on the cycle time performance imposed by power dissipation are projected for the same period. Constraints imposed by NTRS projected package heat removal coefficients, permit local clock rates to apply only within a macrocell whose size and total number are calculated using the stocahstic distribution. 5. References 1] The 1997 NTRS, Semiconductor Industry Association, Dec 1997 2] J D Meindl, 'Low Power Microelectronics - Retrospect and Prospect', Proceedings of the IEEE, Vol. 83, No 4 Apr 1995, pg 619. 3] J Burr and J Shott, 'A 200mV Encoder-Decoder circuit Using Stanford Ultra Low Power CMOS' ISSCC Dig Tech Papers, Feb 1994, pp 84-85. 4] A Bhavnagarwala, V. De, B Austin and J Meindl, Circuit Techniques for Low Power CMOS GSI, IEEE ISLPED, Aug 1996 Dig, pp 193-197 5] B Agrawal, V. De and J Meindl, Opportunities for Scaling FET s for Gigascale Integration, Proceedings of the 23 rd ESSDERC, Sept 1993, pp 919 926. 6] P E Gronowski et al, High performance microprocessor design, IEEE Journal of Solid State Circuits, Vol 33, No 5, pp 676-686, May 1988. 7] J Davis, V. De & J. Meindl, A stochastic wire-length distribution for gigascale integration (GSI) parts I & II, IEEE Transactions on Electron Devices, Vol 45, No. 3, pp580-597, March 1998 8] R W Keyes, ''The Wire Limited Logic Chip'', IEEE JSSC, Vol SC-17, Dec 1982, pp 1232-1233 9] B Bakoglu, Circuit Interconnections and Packaging for VLSI, Addison Wesley, 1990 10] J Chern et al, ''Multilevel Metal Capacitance Models for CAD Design Synthesis Systems'' IEEE EDL Vol 13, No 1, Jan 1992, pg 32. 11] R Swanson & J Meindl, Ion-Implanted Complentary MOS Transistors in Low Voltage Circuits, IEEE JSSC, Vol. SC-7, pp. 146-153, Apr. 1972 12] B. Austin, K. Bowman, Xinghai Tang, and J. D. Meindl, "A Low Power Transregional MOSFET Model for Complete Power-Delay Analysis of CMOS Gigascale Integration (GSI)," Proc. of the 11th Annual IEEE Intl. ASIC Conf., pp. 125-129, Sept. 1998 13] C Sodini, P Ko and J Moll, The Effect of High Fields on MOS Device and Circuit Performance, IEEE TED, Vol ED- 31, No 10, October 1984, pp 1386 14] C Jacoboni et al, A review of some charge transport properties in silicon, Solid State Electronics, No 20, Vol 77, 1977 15] B Agrawal V. De and J Meindl, Device Parameter Optimization for Reduced Short Channel Effects in Retrograde Doped MOSFETs, IEEE TED, Vol 43, No 2, Feb 1996, pg 365 16] G Sai Halasz, Performance Trends in High-end Processors, Proceedings of the IEEE, Vol 83, Jan 1995, pp 20-36 17] T Sakurai & R Newton, ''Delay Models for Series Connected MOSFET Structures'' IEEE JSSC, Vol 28, No 1, Jan 1993, pg 40 18] A Bhavnagarwala, B Austin, J Meindl, Minimum Supply Voltage for bulk Si CMOS GSI, IEEE ISLPED, Aug 1998 Dig, pp 100-103 19] A Chandrakasan, S Sheng and R Broderson, 'Low-Power CMOS Digital Design', IEEE JSSC Vol 27, No 4, April 1992, pp 473-484

d0 N a d1 N a - d N a + Figure 1: Historical trends with 1997 NTRS projections Figure 2: Shallow junction Uniform Doped (UD) and Retrograde Doped (RD) MOSFETs. Figure 3: Calculated V to roll-off for bulk Si at NTRS projected gate oxide thickness [6] Figure 4 : Subthreshold swing increases accompany threshold voltage reductions increasing stand-by currents substantially Figure 5 [7]: Stochastic wiring distribution comparison with an actual microprocessor implementation. The distribution is used to calculate the average interconnect length between two logic gates

Figure 6: Cache size extrapolations to discount SRAM cell transistors from total transistor count when calculating average wire length of a logic network Figure 7 : Comparison of 0.25 micron CMOS HSPICE gate characteristics with the Transregional model (TRM). W=0.5µm Figure 8 : Comparison of 0.25 micron CMOS HSPICE drain characteristics with the Transregional model (TRM). W=0.5µm Figure 9 : Comparison of 0.25 micron CMOS HSPICE simulations with propagation delay models used from [4]

Yea r Year 97 99 02 05 08 11 14 F(mm).25.18.13.10.07.05.035 Tox (A) 45 32 22 15 11 8 6 f clk (GHz).75 1.2 1.6 2.0 2.5 3.0 3.7 V topt (V) 0.22 0.21 0.2 0.18 0.17 0.16 0.16 V ddopt (V) 1.23 1.01 0.91 0.72 0.64 0.52 0.41 -DV TO (V) 103 95 88 75 54 39 42 DS (mv/dec) 2.0 2.2 2.7 3.2 3.2 3.0 5.0 P total (mw) 17.1 14.3 11.4 8.6 6.3 4.9 3.8 Table-2: Optimal V dd, V topt, W/L n,p for across-chip global clock rates. NTRS projected gate oxide thickness are assumed. F (mm) Chip size, cm 2 N gates Cw 10 6 (ff) 1997 0.25 3.0 1.07 33.4 1999 0.18 3.4 3.1 24.3 2002 0.13 4.3 9.1 18.1 2005 0.10 5.2 25.7 15.5 2008 0.07 6.2 66.7 12.1 2011 0.05 7.5 177.5 9.4 2014 0.035 9.0 465.5 7.9 Table 1: Average wiring capacitance estimates for NTRS generations using the stochastic interconnect distribution. Yr 05 08 11 14 F(µm).10.07.05.035 T ox (A) 15 11 8 6 F clk (GHz) 3.5 6.0 10.0 16.9 C w (ff) 15.6 11.1 7.9 5.7 N cells 72 266 1105 4412 V ddopt 1.05 0.75 0.55 0.51 V topt 0.19 0.18 0.16 0.14 Table-3: Average wire lengths and wiring capacitance imposed by heat removal for the sub-100nm generations. Size and number of macrocells are calculated using the stochastic wiring distribution [7] Q=50W/cm 2 Figure 10: Physical drain current and short channel MOSFET threshold voltage roll-off models are used with stochastic interconnect distributions, to project optimal critical path gate designs minimizing total power dissipated by CMOS logic circuits for each NTRS technology generation.

L (min feature size)= 50nm f clk (local clock rate) = 10GHz t ox (gate oxide thickness) = 8 Å a (% switching activity)= 0.05 b (clock skew) = 0.9 n cp (logic depth) = 5 C w (average wire cap)= 4.4fF f in (average fan-in) = 2 f out (average fan-out) = 2 T MAX (maximum temperature) = 400 o K P (Rent s exponent) = 0.6 V ddopt (optimal Vdd) = 0.6V V topt (optimal Vto) = 0.17V (W/L) n (optimal NFET W/L) = 14 (W/L) p (optimal PFET W/L) = 16 Figure 11 &12 : Total power dissipation and its component s dependence on supply voltage, threshold voltage and NFET channel width. PFET channel width is calculated for equal rise and fall times. Boundary of Macrocell clocked at local clock rates Figure 13(at left): A short-wire cellular array architecture with local and global clock frequencies where local clocks apply only within the boundary of a macrocell Figure 14 (at left): Optimal V dd and NTRS projections Figure 15 (above): Exponential increase in power with clock frequency impose limits on CMOS performance