Design Challenges in Multi-GHz Microprocessors

Similar documents
EE141- Spring 2004 Digital Integrated Circuits

Lecture 18 SOI Design Power Distribution. Midterm project reports due tomorrow. Please post links on your project web page

Power Considerations in the Design of the Alpha Microprocessor

Trends and Challenges in VLSI Technology Scaling Towards 100nm

EE141-Spring 2007 Digital Integrated Circuits

Interconnect-Power Dissipation in a Microprocessor

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Digital Integrated Circuits Lecture 20: Package, Power, Clock, and I/O

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

Interconnect/Via CONCORDIA VLSI DESIGN LAB

I/O Design EE141. Announcements. EE141-Fall 2006 Digital Integrated Circuits. Class Material. Pads + ESD Protection.

Lecture 11: Clocking

Digital Design and System Implementation. Overview of Physical Implementations

Lecture #2 Solving the Interconnect Problems in VLSI

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

Introduction. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

Low-Power Digital CMOS Design: A Survey

1 Introduction COPYRIGHTED MATERIAL

CS4617 Computer Architecture

Pramoda N V Department of Electronics and Communication Engineering, MCE Hassan Karnataka India

Lecture 04 CSE 40547/60547 Computing at the Nanoscale Interconnect

Chapter 1 Introduction

Microcircuit Electrical Issues

The challenges of low power design Karen Yorav

Signal Integrity Design of TSV-Based 3D IC

LSI and Circuit Technologies for the SX-8 Supercomputer

EECS150 - Digital Design Lecture 15 - CMOS Implementation Technologies. Overview of Physical Implementations

EECS150 - Digital Design Lecture 9 - CMOS Implementation Technologies

1 Digital EE141 Integrated Circuits 2nd Introduction

Ramon Canal NCD Master MIRI. NCD Master MIRI 1

Lecture 9: Clocking for High Performance Processors

19. Design for Low Power

Power Spring /7/05 L11 Power 1

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

LSI ON GLASS SUBSTRATES

A PROCESS AND TEMPERATURE COMPENSATED RING OSCILLATOR

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

Chapter 4. Problems. 1 Chapter 4 Problem Set

DESIGN FOR LOW-POWER USING MULTI-PHASE AND MULTI- FREQUENCY CLOCKING

Energy-Recovery CMOS Design

A Survey of the Low Power Design Techniques at the Circuit Level

PC accounts for 353 Cory will be created early next week (when the class list is completed) Discussions & Labs start in Week 3

High-speed Serial Interface

EECS150 - Digital Design Lecture 19 CMOS Implementation Technologies. Recap and Outline

A Static Power Model for Architects

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Homework 10 posted just for practice. Office hours next week, schedule TBD. HKN review today. Your feedback is important!

Probabilistic and Variation- Tolerant Design: Key to Continued Moore's Law. Tanay Karnik, Shekhar Borkar, Vivek De Circuit Research, Intel Labs

Jack Keil Wolf Lecture. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. MOSFET N-Type, P-Type.

Datorstödd Elektronikkonstruktion

Static Energy Reduction Techniques in Microprocessor Caches

Course Outcome of M.Tech (VLSI Design)

EE 330 Lecture 43. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

EE4800 CMOS Digital IC Design & Analysis. Lecture 1 Introduction Zhuo Feng

ECE 484 VLSI Digital Circuits Fall Lecture 02: Design Metrics

Low Power Design in VLSI

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 13: Interconnects in CMOS Technology

An Overview of Static Power Dissipation

EECS150 - Digital Design Lecture 2 - CMOS

Lecture 19: Design for Skew

CS250 VLSI Systems Design. Lecture 3: Physical Realities: Beneath the Digital Abstraction, Part 1: Timing

Synchronous Mirror Delays. ECG 721 Memory Circuit Design Kevin Buck

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

A 4 GSample/s 8-bit ADC in. Ken Poulton, Robert Neff, Art Muto, Wei Liu, Andrew Burstein*, Mehrdad Heshami* Agilent Laboratories Palo Alto, California

Research in Support of the Die / Package Interface

CMOS Technology for Computer Architects

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

04/29/03 EE371 Power Delivery D. Ayers 1. VLSI Power Delivery. David Ayers

EE434 ASIC & Digital Systems. Partha Pande School of EECS Washington State University

Low Jitter, Low Emission Timing Solutions For High Speed Digital Systems. A Design Methodology

! Review: MOS IV Curves and Switch Model. ! MOS Device Layout. ! Inverter Layout. ! Gate Layout and Stick Diagrams. ! Design Rules. !

Processor Power and Power Reduction

Introduction to VLSI ASIC Design and Technology

Interconnect. Courtesy of Dr. Daehyun Dr. Dr. Shmuel and Dr.

ECE 5745 Complex Digital ASIC Design Topic 2: CMOS Devices

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Low Power Design of Successive Approximation Registers

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm

TC4467 TC4468 LOGIC-INPUT CMOS QUAD DRIVERS TC4467 TC4468 TC4469 GENERAL DESCRIPTION FEATURES APPLICATIONS ORDERING INFORMATION

LSI and Circuit Technologies of the SX-9

CMOS Process Variations: A Critical Operation Point Hypothesis

Announcements. Advanced Digital Integrated Circuits. Midterm feedback mailed back Homework #3 posted over the break due April 8

A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology

ECE 497 JS Lecture - 22 Timing & Signaling

Transmission-Line-Based, Shared-Media On-Chip. Interconnects for Multi-Core Processors

A Novel Low Power Optimization for On-Chip Interconnection

A DESIGN EXPERIMENT FOR MEASUREMENT OF THE SPECTRAL CONTENT OF SUBSTRATE NOISE IN MIXED-SIGNAL INTEGRATED CIRCUITS

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, Digital EE141 Integrated Circuits 2nd Introduction

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

VLSI Design I; A. Milenkovic 1

UNIT-III POWER ESTIMATION AND ANALYSIS

Effect of Aging on Power Integrity of Digital Integrated Circuits

LSI Design Flow Development for Advanced Technology

FEASIBILITY OF OPTICAL CLOCK DISTRIBUTION FOR FUTURE CMOS TECHNOLOGY NODES

Deep-Submicron CMOS Design Methodology for High-Performance Low- Power Analog-to-Digital Converters

Semiconductor Technology Academic Research Center An RTL-to-GDS2 Design Methodology for Advanced System LSI

CHAPTER 7 A BICS DESIGN TO DETECT SOFT ERROR IN CMOS SRAM

EECS 427 Lecture 22: Low and Multiple-Vdd Design

Transcription:

Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor Development www.compaq.com Introduction Moore s Law ( Law (the trend that the demand for IC functions and the capability of the semiconductor industry to meet that demand, will double every 1.5 to 2 years will double every 1.5 to 2 years) ) has worked well during the last 3 years Difficult challenges face the industry attempting to maintain the pace With collaboration, understanding, vision and innovation this trend can continue for high performance microprocessors 2 1

Topics Historical Trends z Intel z Alpha chips and design style z Observations and trends Technology Predictions z ITRS 1999 Key Design Challenges ÎClocking and power - how Alpha has managed ÎClocking and power - long term solutions 3 Historical Trends: Then and Now Circa 197 12µ PMOS 1 transistors 5-1 mm 2 die size 1V supply 5-1 khz frequency 1-2 mw 16 pin DIPs Circa 2.18µ CMOS 1-1 million transistors 3-4 mm 2 die size 2.5V supply 5-1 MHz frequency 5-1 W 5-1 pin BGAs 4 2

1 Intel Performance History 1 1 Pentium Pentium II Pentium Pro Pe ntium III Xeon MIPS 1 1 886 8286 386 486 88.1 44.1 71 72 73 74 75 76 77 78 79 8 81 82 83 84 85 86 87 88 89 9 91 92 93 94 95 96 97 98 99 Date of Introduction 5 Intel Trends The 44 (1971) z 23 transistors in a 1u process, z 18kHz operation, executing.6 MIPs Pentium III (1999) z 28 million transistors in a.18u process, z 733MHz operation, executes 2 MIPs Over nearly 3 years z performance has increased 3,x, z transistor count has increase 1,x z frequency has increased 7,x z die size has increased only 25x. z Moore s law predicts 3,x to 1,,x improvement over this period. 6 3

Alpha Architecture Alpha is a true 64-bit load/store RISC architecture Alpha is designed for high clock speed z Simple, fixed length (32-bit) instructions z Minimal instruction ordering constraints z No conditions codes z No branch delay slots Chip micro-architecture is carefully chosen to maximize performance without impacting cycle time 7 1 Alpha Performance History EV67-7 EV6-575 EV56-5 EV56-6 SPECint95 1 EV5-3 EV56-4 EV45-275 EV4-2 1 1992 1993 1994 1995 1996 1997 1998 1999 Date of Introduction 8 4

EV4 Chip Overview.75µm 3LM N-well N CMOS, L eff =.5µm, T ox =1.5nm 3.3V Vdd 2MHz @1 C & 3.3V 16 gate delays per cycle 3W @2MHz & 3.3V 13.9mm x 16.8mm (233 mm 2 ) 1.7 Million Transistors ~.85 Million Logic Transistors 431 pin PGA (291 signals) 9 EV4 Micro-Architecture Dual In-Order Instruction Issue z single-issue issue Integer & single-issue issue FP Fully Pipelined (except Integer MUL and FP DIV) z 7-stage Integer and 1-stage FP pipelines 1-bit Branch Prediction: 2k-entry entry BHT 8kB direct-mapped I-Cache I and 8kB direct- mapped write-through through D-CacheD 32 Integer and 32 FP Registers, 64b/entry Flexible external interface: shared 128b/64b data, 34b address L2 cache and system interface 1 5

EV5 Chip Overview.5µm 4LM N-well N CMOS, L eff =.365µm, T ox =9.nm 3.3V Vdd 35MHz @1 C & 3.3V 14 gate delays per cycle 6W @35MHz & 3.3V 16.5mm x 18.1mm (298 mm 2 ) 9.3 Million Transistors ~ 2.5 Million Logic Transistors 499 pin PGA (294 signals) 11 EV5 Micro-Architecture Quad In-Order Instruction Issue z dual-issue Integer & dual-issue FP 7-stage stage Integer and 9-stage 9 FP pipelines z FP latencies reduced by 2 cycles 2-bit Branch Prediction: 2k-entry entry BHT 8kB I-Cache I and 8kB write-through through D-CacheD 96kB unified on-chip L2 Cache Improved external interface supports a non- blocking cache scheme 12 6

EV6 Chip Overview.35µm 6LM N-well N CMOS, L eff =.25µm, T ox =6.nm 2.2V Vdd 575MHz @1 C & 2.2V 12 gate delays per cycle 9W @575MHz & 2.2V 16.7mm x 18.8mm (314 mm 2 ) 15.2 Million Transistors ~ 6 Million Logic Transistors 587 pin PGA (374 signals) 13 EV6 Micro-Architecture Four-wide Instruction Fetch Tournament Branch Predictor Out-of of-order Execution Pipelines z Quad-speculative speculative-issue issue integer pipeline z Dual-speculative speculative-issue issue floating-point pipeline 8 In-flight flight Instructions Registers: 8 Integer, 72 Floating Point Queue Entries: 2 Integer, 15 Floating Point 2-Way Way 64KB L1 On-Chip Instruction and Data Caches Up to 16 outstanding off-chip memory references 14 7

EV7 Chip Overview.18µm CMOS technology 1.5V Vdd Clock frequency >1.GHz 1W ~35mm 2 ~1 Million transistors EV6 core Integrated L2 Cache (1.75 MB 7-way) 7 Integrated memory controller (RAMBUS) Integrated network interface 15 EV8 Chip Overview Clock frequency range 1.-2.GHz Leading edge.125µm CMOS technology ~1.2V Vdd <15W ~25 Million transistors Enhanced out-of of-order order execution 8-wide superscalar 4-way way simultaneous multi-threading threading (SMT) EV7 memory and system enhancements 16 8

Alpha Circuit Design Philosophy Transistor level circuit design Broad range of circuit styles and logic families z Complementary CMOS z Dynamic logic z DCVSL (cascode( cascode) z Ratioed logic Key components to enable high performance z On-chip clock generation and distribution (low- skew, fast edge) z Latching (low latency) z Low noise on-chip power distribution z On-chip signal integrity management 17 Complexity Trends Dimension (um) Transistors (M).8.7.6.5.4.3.2.1 3 25 2 15 1 5 Process Features EV4 EV5 EV6 EV7 EV8 Chip Features EV4 EV5 EV6 EV7 EV8 1 8 6 4 2 45 4 35 3 25 2 15 1 5 Metal Layers Die Size (mm 2 ) Process scaling has continued steadily Planarization has enabled an increase in the number of interconnect layers Transistor counts have increased dramatically with the L2 cache SRAMs Additionally, design team size has increased ~4% per generation Opportunities to manage complexity and productivity z Fundamental understanding and modeling of process and circuit element behaviors z High level design methods z CAD z Design reuse z Micro-architecture 18 9

Power Dissipation Trends Power (W) Current (A) 16 14 12 1 8 6 4 2 14 12 1 8 6 4 2 Power Dissipation EV4 EV5 EV6 EV7 EV8 Supply Current EV4 EV5 EV6 EV7 EV8 3.5 3 2.5 2 1.5 1.5 3.5 3 2.5 2 1.5 1.5 Voltage (V) Voltage (V) Power consumption is increasing z Better cooling technology needed Supply current is increasing faster! On-chip signal integrity will be a major issue Power and current distribution are critical Opportunities to slow power growth z Accelerate Vdd scaling z z z z /RZ GLHOHFWULFV WKLQQHU &X interconnect SOI circuit innovations Clock system design micro-architecture 19 Performance Trends Relative Performance Relative Performance 6 5 4 3 2 1 7 6 5 4 3 2 1 Clock Speed EV4 EV5 EV6 EV7 EV8 Transistor Count EV4 EV5 EV6 EV7 EV8 16 14 12 1 8 6 4 2 3 25 2 15 1 5 Frequency (MHz) Transistors (M) Performance has increased significantly (7x) faster than frequency Performance tracks transistor count when L2 cache ignored z z Transistor budget has increased more than performance when L2 cache is considered but benchmarks did not reflect larger applications Opportunities to continue performance improvements z z z Continued scaling of devices, interconnect and dielectrics Clock distribution Micro-architecture 2 System design 1

Micro-Architecture Trends Trends have included z Wider super-scalar scalar machines, deep pipelines z Larger register, L1 caches z On-chip L2 caches z Out of order execution z Sophisticated branch prediction, predication, speculation z Integrated memory and network controllers z SMT z Less idle logic but more bookkeeping logic Future opportunities include z Floating point performance improvements z Vectors z Thread-level speculation z More pipelining z Better on-chip communications Banking, replicating structures Clustering functional units z On-chip SMP 21 Challenging Design Trends Gate Delays per Cycle Cycles 2 15 1 8 7 6 5 4 3 2 1 Logic Levels per Cycle 5 EV4 EV5 EV6 EV7 EV8 Cycles Across Chip EV4 EV5 EV6 EV7 EV8 16 14 12 1 8 6 4 2 16 14 12 1 8 6 4 2 Frequency (MHz) Frequency (MHz) Micro-architecture and logic design are stressed as frequency has increased faster than scaling Further reducing the number of gate delays per cycle will be difficult Cycles to communicate across chip track with frequency Clock edge rates are not scaling Opportunities to continue performance increases z Chip implementation design z Clock system design z Micro-architecture 22 11

ITRS -1999 Key Messages No major issues through 13 nm generation, but significant issues for 1 nm generation (25) Continued technology scaling will require the introduction of new process materials and new devices Transistor densities will continue the historical trends ~2X / 2yrs Clock frequency increases will slow compared to historical trends 23 ITRS-1999 The Roadmap 1999 21 23 25 28 211 214 Generation (nm) 18 13 13 1 7 5 35 L eff (nm) 14 1 8 65 45 31 21 Devices (M) 11 22 441 882 2494 753 19949 Chip Size (mm 2 ) 45 45 567 622 713 817 937 Signals 768 124 124 124 128 148 1472 Pins 16 27 2518 3158 4437 6234 8758 24 12

ITRS-1999 The Roadmap (continued) 1999 21 23 25 28 211 214 Generation (nm) 18 13 13 1 7 5 35 Clock (MHz) 12 1454 1724 2 25 3 36 Local Clk (MHz) 125 1767 249 35 6 1 135 IO (MHz) 48 722 932 135 1285 154 18 Wiring Levels 7 7 8 9 9 1 1 Vdd (V) 1.8 1.5 1.5 1.2.9.6.6 Power (W) 9 115 14 16 17 174 183 25 ITRS - 1999 Highlights Transistors z Drive currents will remain constant through 214 at 75 µa/µm for NFETs and 35 µa/µm for PFETs z Leakage currents will double every 3 years from 5 na/µm in 1999 to 16 na/µm in 214 Interconnect z 8VHRI&XDQGORZGLHOHFWULFVZLOOEHFRPHVWDQGDUG z Local interconnect delays will scale with gate delays z Global interconnect delays, even with repeaters will not scale with gate delays z Coplanar waveguides,, free space RF and optical interconnect may be needed longer term 26 13

ITRS - 1999 Highlights (2) Packaging z Maximum junction temperature must be reduced from 1 o C to 85 o C by 22 for reliability concerns z Significant θ ja improvements will be required to maintain air cooling system solutions: a 5% reduction by 22 and another 3% by 214 Modeling & Simulation z 2D and 3D interconnect models with inductance and transmission line effects will be needed z Transistor models of non-quasi quasi-static static effects and quantum mechanical gate effects will be needed; gate currents will become important z OCV modeling will become necessary z CPU efficient and accurate models will be essential 27 ITRS - 1999 Highlights (3) Design Productivity z Design team sizes will not exceed 3 people z Design cycle times will decrease from 36 months in 1999 to 3 months in 25 to 24 months in 214 Verification and Test z Verification has become more than 5% of the total design effort z Use of formal verification will increase from 15% now to 3% in 25 to 6% in 214 z BIST coverage will increase from 2% now to 4% in 25 to 7% in 214 28 14

EV4 Clocking t cycle = 6.ns Location of clock driver on die t rise =.5ns Clock waveform t skew =.3ns 2 phase single wire clock, distributed globally z Low skew z Fast edge rate 1 clock driver channel z 3.5nF clock load z 35 cm final driver width 29 EV4 Latches First single wire clock implementation z Race immune latch z Level sensitive design z 2 latches per cycle z Can build logic into first stage of latch z t cycle latch overhead is approximately 25% D CLK Q CLK high loading latch 3 15

EV4 Thermal Gradient Temp = 76C Temp = 46C EV4 31 EV5 Clocking t cycle = 3.3ns 2 phase single wire clock, t rise =.35ns t skew = 15ps distributed globally 2 distributed driver channels z Reduced RC delay/skew final drivers z Improved thermal distribution z 3.75nF clock load z 58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking pre-driver Device variation Clock waveform Location of clock driver on die 32 16

EV5 Latches Reduce t dq and reduce clock load z Local clock inverter complicated race issues z Level sensitive design z z z z 2 latches per cycle Can build logic into first and last stages of latch t cycle latch overhead is approximately 15% Smaller, faster and lower power than EV4 latch D CLK CLK_L CLK high loading latch Q 33 EV5 Thermal Gradient 34 17

EV5 Global Clock Skew EV5 Local Clock Skew 18

EV6 Clocking t cycle = 1.67ns t rise =.35ns Global clock waveform PLL t skew = 5ps 2 Phase, with multiple conditional buffered clocks z 2.8 nf clock load z 4 cm final driver width Local clocks can be gated off to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking 37 EV6 Latches Conditional clocks to reduce power z Static design z 1 latch per cycle z Edge triggered to simplify race rules z Can build logic into latch z t cycle latch overhead is approximately 15% Q_H Q_L D 38 CLK 19

ps 5 1 15 2 25 3 35 4 45 5 EV6 Clock Results ps 3 35 31 315 32 325 33 335 34 345 GCLK Skew (at Vdd/2 Crossings) GCLK Rise Times (2% to 8% Extrapolated to % to 1%) 39 EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains NCLK (Mem Ctrl) + widely dispersed drivers DLL DLL DLL + DLLs compensate static and lowfrequency variation + divides design and verification effort L2L_CLK (L2 Cache) GCLK (CPU Core) PLL L2R_CLK (L2 Cache) - DLL design and verification is added work SYSCLK + tailored clocks 4 2

Power Consumption Clocks consume the largest fraction of power Driving inter-unit busses consumes as much power as intra-unit gates and interconnect 15% 1% 15% 4% 2% Clocks Caches Execution Units Control I/O Drivers 41 EV4-3 Metal Layers 3rd coarse and thick metal layer added to the technology for EV4 design Power supplied from two sides of the die via 3rd metal layer 2nd metal layer used to form power grid 9% of 3rd metal layer used for power/clock routing Metal 3 Metal 2 Metal 1 42 21

EV5-4 Metal Layers 4th coarse and thick metal layer added to the technology for EV5 design Power supplied from four sides of the die Grid strapping done all in coarse metal 9% of 3rd and 4th metals used for power/clock routing Metal 4 Metal 3 Metal 2 Metal 1 43 EV6-6 Metal Layers 2 reference plane metal layers added to the technology for EV6 design Solid planes dedicated to Vdd/Vss Significantly lowers resistance of grid Lowers on-chip inductance RP2/Vdd Metal 4 Metal 3 RP1/Vss 44 Metal 2 Metal 1 22

Reference Plane Example Simulation Methodology Extract Inductance & Resistance versus Frequency Model Skin Effect Both Vertically and Horizontally Construct Time-Domain SPICE Model and Simulate with SPICE Use FF Devices, High Vdd & Low Temperature to Aggravate Inductive Effects RP2 RP2 Metal 4 Metal 4 Metal 3 Substrate Metal 3 Metal 2 Metal 1 Metal 3 Victims RP1 Metal 2 Metal 1 Substrate Reference Plane Example (continued) M3 Victim M3 Victim M3 Aggressors M3 Aggressors M1 Aggressors M1 Aggressors 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Time (ns) 1.8 1.9 2. 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Time (ns) 1.8 1.9 2. RP2 Only RP1 & RP2 23

De-coupling Capacitor Ratios EV4 z total effective switching capacitance = 12.5nF z 128nF of de-coupling capacitance z de-coupling/switching capacitance ~ 1x EV5 z 13.9nF of switching capacitance z 16nF of de-coupling capacitance EV6 z 34nF of effective switching capacitance z 32nF of de-coupling capacitance -- not enough! 47 EV6 De-coupling Capacitance Design for Idd= = 25 A @ Vdd = 2.2 V, f = 6 MHz z.32-µf of on-chip de-coupling capacitance was added Under major busses and around major gridded clock drivers Occupies 15-2% of die area z 1-µF 2-cm2 2 Wirebond Attached Chip Capacitor (WACC) significantly increases Near-Chip de- coupling 16 Vdd/Vss Vss bondwire pairs on the WACC minimize inductance 48 24

EV6 WACC 389 Signal - 198 VDD/VSS Pins 389 Signal Bondwires 395 VDD/VSS Bondwires 32 VDD/VSS Bondwires WACC Microprocessor Heat Slug 587 IPGA 49 Clocking Futures Frequencies will continue to scale Clock edge rates are not scaling as well Multiple clock zones required z Architectures minimizing global communications z Adaptive and passive synchronization techniques z DLLs in the near term z Clocking schemes utilizing encoding, extraction, multi-state and local phase optimization to compensate for skew and latency Asynchronous or quasi-synchronous synchronous architectures 5 25

Power Futures Low power modes Power tradeoffs in the micro-architecture More emphasis on a low power circuits z Reducing clock load z Low swing differential clocks z Low swing buses z Adiabatic circuits, clocked drivers, retractile logic z Asynchronous design Reference plans also help to minimize inductive and wave effects 51 Conclusion Physical technology advances will enable multi- GHz chips Key challenges in power, clocking, complexity, and verification must be addressed New tools and methods will be needed CAD developers and chip designers must collaborate more closely than ever With a solid understanding of the fundamentals, a clear vision of the product and ingenuity, we will realize multi-ghz microprocessors 52 26