Resonant Clock Circuits for Energy Recovery Power Reductions

Similar documents
Resonant Clock Design for a Power-efficient, High-volume. x86-64 Microprocessor

Wide operating frequency resonant clock and data circuits forswitching power reductions

An Enhanced Design Methodology for Resonant Clock. Trees

Resonant-Clock Design for a Power-Efficient, High-Volume x86-64 Microprocessor

CURRENTLY, near/sub-threshold circuits have been

VOLTAGE scaling is one of the most effective methods for

An Active Decoupling Capacitance Circuit for Inductive Noise Suppression in Power Supply Networks

Fully Integrated Switched-Capacitor DC-DC Conversion

Course Content. Course Content. Course Format. Low Power VLSI System Design Lecture 1: Introduction. Course focus

Energy-Recovery CMOS Design

Deep Trench Capacitors for Switched Capacitor Voltage Converters

Energy Efficient Design of Logic Circuits Using Adiabatic Process

Switched-Capacitor Converters: Big & Small. Michael Seeman Ph.D. 2009, UC Berkeley SCV-PELS April 21, 2010

Towards An Efficient Low Frequency Energy Recovery Dynamic Logic

Boost Logic : A High Speed Energy Recovery Circuit Family

Microcontroller Systems. ELET 3232 Topic 13: Load Analysis

POWER minimization has become a primary concern in

REPORT DOCUMENTATION PAGE

Interconnect/Via CONCORDIA VLSI DESIGN LAB

Power Spring /7/05 L11 Power 1

An Efficient D-Flip Flop using Current Mode Signaling Scheme

Quadrature GPS Receiver Front-End in 0.13μm CMOS: The QLMV cell

Hot Topics and Cool Ideas in Scaled CMOS Analog Design

Low-Power VLSI. Seong-Ook Jung VLSI SYSTEM LAB, YONSEI University School of Electrical & Electronic Engineering

Optimization of Digitally Controlled Oscillator with Low Power

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

The Road to Integrated Power Conversion via the Switched Capacitor Approach. Prof. Seth Sanders EECS Department, UC Berkeley

EE434 ASIC & Digital Systems. Partha Pande School of EECS Washington State University

Low-Power Clock Distribution Using a Current-Pulsed Clocked Flip-Flop

Interconnect-Power Dissipation in a Microprocessor

Integrated Power Management with Switched-Capacitor DC-DC Converters

High and Low Speed Output Buffer Design with Reduced Switching Noise for USB Applications

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

EDA Challenges for Low Power Design. Anand Iyer, Cadence Design Systems

SCALING power supply has become popular in lowpower

A Solution to Simplify 60A Multiphase Designs By John Lambert & Chris Bull, International Rectifier, USA

RECYCLING CLOCK NETWORK ENERGY IN HIGH-PERFORMANCE DIGITAL DESIGNS USING ON-CHIP DC-DC CONVERTERS

An Efficient Design of CMOS based Differential LC and VCO for ISM and WI-FI Band of Applications

19. Design for Low Power

CHARGE-RECOVERY circuitry has the potential to reduce

A 60GHz CMOS Power Amplifier Using Varactor Cross-Coupling Neutralization with Adaptive Bias

CHAPTER 5 DESIGN AND ANALYSIS OF COMPLEMENTARY PASS- TRANSISTOR WITH ASYNCHRONOUS ADIABATIC LOGIC CIRCUITS

Lecture 16. Complementary metal oxide semiconductor (CMOS) CMOS 1-1

Lecture 18 SOI Design Power Distribution. Midterm project reports due tomorrow. Please post links on your project web page

A Switched Decoupling Capacitor Circuit for On-Chip Supply Resonance Damping

Low Power VLSI Circuit Synthesis: Introduction and Course Outline

Performance Evaluation of Digital CMOS Circuits Using Complementary Pass Transistor Network

EECS 141: SPRING 98 FINAL

Lecture 23: PLLs. Office hour on Monday moved to 1-2pm and 3:30-4pm Final exam next Wednesday, in class

Low Power Design Part I Introduction and VHDL design. Ricardo Santos LSCAD/FACOM/UFMS

DESIGN AND SIMULATION OF A HIGH PERFORMANCE CMOS VOLTAGE DOUBLERS USING CHARGE REUSE TECHNIQUE

Tiago Reimann Cliff Sze Ricardo Reis. Gate Sizing and Threshold Voltage Assignment for High Performance Microprocessor Designs

CS250 VLSI Systems Design. Lecture 3: Physical Realities: Beneath the Digital Abstraction, Part 1: Timing

Implementation of Power Clock Generation Method for Pass-Transistor Adiabatic Logic 4:1 MUX

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Low-Power Digital CMOS Design: A Survey

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

FEASIBILITY OF OPTICAL CLOCK DISTRIBUTION FOR FUTURE CMOS TECHNOLOGY NODES

Improved Two Phase Clocked Adiabatic Static CMOS Logic Circuit

Design of the Power Delivery System for Next Generation Gigahertz Packages

IMPLEMENTATION OF ADIABATIC DYNAMIC LOGIC IN BIT FULL ADDER

A design of 16-bit adiabatic Microprocessor core

Design of Wide Tuning Range and Low Power Dissipation of VCRO in 50nm CMOS Technology

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION

Low Power Parallel Prefix Adder Design Using Two Phase Adiabatic Logic

ADIABATIC LOGIC FOR LOW POWER DIGITAL DESIGN

High efficiency DC-DC Buck converter architecture suitable for embedded applications using switched capacitor

Low-Voltage, 1.8kHz PWM Output Temperature Sensors

LOW POWER VLSI TECHNIQUES FOR PORTABLE DEVICES Sandeep Singh 1, Neeraj Gupta 2, Rashmi Gupta 2

EE 42/100 Lecture 23: CMOS Transistors and Logic Gates. Rev A 4/15/2012 (10:39 AM) Prof. Ali M. Niknejad

Energy Recycling from Multi-GHz Clocks using Fully Integrated Switching Converters

Lecture 13: Interconnects in CMOS Technology

A 5.99 GHZ INDUCTOR-LESS CURRENT CONTROLLED OSCILLATOR FOR HIGH SPEED COMMUNICATIONS

Clock-Powered CMOS: A Hybrid Adiabatic Logic Style for Energy-Efficient Computing

Power-Area trade-off for Different CMOS Design Technologies

Module 4 : Propagation Delays in MOS Lecture 19 : Analyzing Delay for various Logic Circuits

International Journal of Engineering Trends and Technology (IJETT) Volume 45 Number 5 - March 2017

Comparative Analysis of Conventional CMOS and Adiabatic Logic Gates

An Optimal Design of Ring Oscillator and Differential LC using 45 nm CMOS Technology

On Chip Active Decoupling Capacitors for Supply Noise Reduction for Power Gating and Dynamic Dual Vdd Circuits in Digital VLSI

Noise Aware Decoupling Capacitors for Multi-Voltage Power Distribution Systems

TO ENABLE an energy-efficient operation of many-core

A 10Gb/s 10mm On-Chip Serial Link in 65nm CMOS Featuring a Half-Rate Time-Based Decision Feedback Equalizer

Lecture 07 Modeling and Optimization of VLSI Interconnects (ECG 415/615 Introduction to VLSI System Design)

A Low Phase Noise LC VCO for 6GHz

EE 434 ASIC and Digital Systems. Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University.

20 GHz Low Power QVCO and De-skew Techniques in 0.13µm Digital CMOS. Masum Hossain & Tony Chan Carusone University of Toronto

Short-Circuit Power Reduction by Using High-Threshold Transistors

Performance Analysis of Different Adiabatic Logic Families

IN the face of shrinking feature size, one of the major

Low Power Techniques for SoC Design: basic concepts and techniques

EE E6930 Advanced Digital Integrated Circuits. Spring, 2002 Lecture 7. Clocked and self-resetting logic I

EECS 427 Lecture 22: Low and Multiple-Vdd Design

A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits Using Capacitive Boosting

Design and Analysis of Energy Efficient MOS Digital Library Cell Based on Charge Recovery Logic

Comparison of adiabatic and Conventional CMOS

EMI Reduction on an Automotive Microcontroller

A GSM Band Low-Power LNA 1. LNA Schematic

Transcription:

Resonant Clock Circuits for Energy Recovery Power Reductions Riadul Islam Ignatius Bezzam SCHOOL OF ENGINEERING

CLOCKING CHALLENGE Synchronous operation needs low clock skew across chip High Performance Processors limit skew to 7-8ps across 21-64mm 2 Global Clock Power 24% in AMD Pile Driver Design (10W)* Clock Distribution Network (CDN) a critical chip component *V. S. Sathe, et al, Resonant Clock Design for Power-Efficient High-Volume x86-64 Microprocessor, AMD & U of M Ann Arbor

A Commercially Viable VLSI Power Saving Milestone Publications 1) ISSCC 2012 / SESSION 3 / PROCESSORS / 3.7 Resonant Clock Design for a Power-Efficient High-Volume x86-64 Microprocessor Visvesh Sathe1, Srikanth Arekapudi2, Charles Ouyang2, Marios Papaefthymiou3,4, Alexander Ishii3, Samuel Naffziger1 1AMD, Fort Collins, CO 2AMD, Sunnyvale, CA 3Cyclos Semiconductor, Berkeley, CA 4University of Michigan, Ann Arbor, MI 2) IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 1, JANUARY 2013 Resonant-Clock Design for a Power-Efficient, High-Volume x86-64 Microprocessor Visvesh S. Sathe, Member, IEEE, Srikanth Arekapudi, Member, IEEE, Alexander Ishii, Member, IEEE, Charles Ouyang, Member, IEEE, Marios C. Papaefthymiou, Senior Member, IEEE, and Samuel Naffziger, Senior Member, IEEE

Project Goals Review Design of clock tree to drive 500pF of load cap Power = 1mW (fcv 2 ) @ 1GHz,1pF,1V Power = 0.5Watt for 500pF total load (25% of x86) Compare power with and w/o LC resonance Power Savings Variation with changes in Frequency Temperature Voltage Can we break the fcv 2 barrier? How to improve savings and DVS performance

Clock Distribution Network (CDN) IBM Bench-Marks: ISPD2010 Clock Synthesis CDN Grid/Mesh to meet skew spec Total Clock Load (C Load ) adds up to Nano Farads IBM 90nm Cell Base Band Processor ~ 2nF C Load AMD 32nm Pile Driver x86 core ~ 1 nf C load per core INTEL 45nM Processor 64mm2 : ~ 8nF C Load Need Large Amount of Power

Savings & Barriers Clocking Power Components (Sinks of Power) CDN Power = 24 Watts for 8nF @ 4GHz /1V Local Buffer Power = 1mW for 1pF @ 1GHz /1V (x 1000) Reducing Power Reduce Each of the Terms a : Don t use (Clock gating) or Shut down (Power Gating) C : Smaller Transistor & Loads ( Technology Scaling) V 2 : Dynamic Voltage Scaling (low swing designs) f : Run only as fast as needed (Freqency Scaling ACPI) Combination of above : DVFS (Dynamic Voltage & Frequency Scaling) Can we break the f CV 2 barrier? YES with Resonance P = a C V dd 2 f Recycle instead of throwing away Adiabatic charge/discharge is lossless/heatless

CMOS Dynamic Power Consumption with Large Capacitive Loads Switching Model Total Energy From Source (E in ) CV 2 dd V DD Energy Equations Energy Stored in Capacitor (E C ) ½ CV dd 2 Pull-up S1 OUT Pull-Down ½ CV dd 2 S2 ½ CV dd 2 C Power Consumed at Frequency f P = a C V dd 2 f a Activity Factor (High for Clocks that switch every cycle) Reducing Power Reduce Each of the Terms

CDN with Energy Saving Resonance VDD Clock Driver L L PullUp S1 Resonance ON/OFF VOUT S3 C V LB C V LB C PullDown S2 C L C L V LB C L V LB C V LB =Vdd/2 Turn S3 off at non-resonance Frequencies

Efficiency = ( Pc-Pr)/Pc Design Sizing in 45nM Transistor W and Inductor L for 1GHz, 1pF Simulated Efficiencies (%) vs. Frequency (GHz) 50% 40% 30% 20% 10% 0% -10% -20% -30% 0.7 0.8 1.0 1.3 2.0 Frequency (GHz) 10X_Room 10X_HOT 5x_hot 5x_room

Improvement #1 Problem#1: Less savings from uniform placements ROCKS Clock Synthesis with Improved Inductor size & Placements

ROCKS Clock Synthesis Simulation Results : 45nm IBM Freq: 1GHz With resonant 07_rocks_test.py results: Buffers 18 Wires 185 Est_Power 13.644 mw Act_Power 9.559 mw Ambient Skew 11.000 03_grid_test_hspice.py results: Buffers 21 Wires 369 Est_Power 18.224 mw Act_Power 17.490 mw Ambient Skew 36.000 Power saving: 7.93 mw %Saving with LC resonance = 45%

Improvement # 2

A Wide Frequency Resonant Driver VDD VDD PullUp S1 VOUT Resonance ON/OFF S3 Refresh S1 VOUT PullDown S2 C L Store/Recover Close for T ON S2 i L L C v c Vdd/2 a) Conventional Driver with Resonant Option Vdd/2 (b) Wide Freq Range Resonant Driver

Power Savings Simulation Results with Parasitics 40% 30% 20% 45nM 0.5V 45nm 1V 10% 32nm 1V AMD CDN [3] 0% 0.01 0.1 1 10 Frequency (GHz ) 4. Xuchu Hu and Matthew R. Guthaus, Distributed LC Resonant Clock Grid Synthesis, IEEE Transactions On Circuits And Systems. 2012, in press. 17

18 References 1. S. C. Chan, P. J. Restle, T. J. Bucelot, J.S. Liberty, S. Weitzel, J. M. Keaty, B. Flachs, R. Volant, P. Kapusta, and J. S. Zimmerman, A Resonant Global Clock Distribution for the Cell Broadband Engine Processor, IEEE Journal Of Solid- State Circuits, Vol. 44, No. 1 pp.64-72, January 2009. 2. V. S. Sathe, S. Arekapudi, C. Ouyang, M. C. Papaefthymiou, A. Ishii and S. Naffzinger, Resonant Clock Design for Power-Efficient High-Volume x86-64 Microprocessor, IEEE Solid State Circuits Conference, 2012, vol. 55, pp. 68-69. 3. Xuchu Hu and Matthew R. Guthaus, Distributed LC Resonant Clock Grid Synthesis, IEEE Transactions On Circuits And Systems. 2012, in press. 4. Advanced Configuration and Power Interface (ACPI) is an open industry specification co-developed by Hewlett-Packard, Intel, Microsoft, Phoenix, and Toshiba: http://www.acpi.info

Questions? SCHOOL OF ENGINEERING