Resonant Clock Circuits for Energy Recovery Power Reductions Riadul Islam Ignatius Bezzam SCHOOL OF ENGINEERING
CLOCKING CHALLENGE Synchronous operation needs low clock skew across chip High Performance Processors limit skew to 7-8ps across 21-64mm 2 Global Clock Power 24% in AMD Pile Driver Design (10W)* Clock Distribution Network (CDN) a critical chip component *V. S. Sathe, et al, Resonant Clock Design for Power-Efficient High-Volume x86-64 Microprocessor, AMD & U of M Ann Arbor
A Commercially Viable VLSI Power Saving Milestone Publications 1) ISSCC 2012 / SESSION 3 / PROCESSORS / 3.7 Resonant Clock Design for a Power-Efficient High-Volume x86-64 Microprocessor Visvesh Sathe1, Srikanth Arekapudi2, Charles Ouyang2, Marios Papaefthymiou3,4, Alexander Ishii3, Samuel Naffziger1 1AMD, Fort Collins, CO 2AMD, Sunnyvale, CA 3Cyclos Semiconductor, Berkeley, CA 4University of Michigan, Ann Arbor, MI 2) IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 1, JANUARY 2013 Resonant-Clock Design for a Power-Efficient, High-Volume x86-64 Microprocessor Visvesh S. Sathe, Member, IEEE, Srikanth Arekapudi, Member, IEEE, Alexander Ishii, Member, IEEE, Charles Ouyang, Member, IEEE, Marios C. Papaefthymiou, Senior Member, IEEE, and Samuel Naffziger, Senior Member, IEEE
Project Goals Review Design of clock tree to drive 500pF of load cap Power = 1mW (fcv 2 ) @ 1GHz,1pF,1V Power = 0.5Watt for 500pF total load (25% of x86) Compare power with and w/o LC resonance Power Savings Variation with changes in Frequency Temperature Voltage Can we break the fcv 2 barrier? How to improve savings and DVS performance
Clock Distribution Network (CDN) IBM Bench-Marks: ISPD2010 Clock Synthesis CDN Grid/Mesh to meet skew spec Total Clock Load (C Load ) adds up to Nano Farads IBM 90nm Cell Base Band Processor ~ 2nF C Load AMD 32nm Pile Driver x86 core ~ 1 nf C load per core INTEL 45nM Processor 64mm2 : ~ 8nF C Load Need Large Amount of Power
Savings & Barriers Clocking Power Components (Sinks of Power) CDN Power = 24 Watts for 8nF @ 4GHz /1V Local Buffer Power = 1mW for 1pF @ 1GHz /1V (x 1000) Reducing Power Reduce Each of the Terms a : Don t use (Clock gating) or Shut down (Power Gating) C : Smaller Transistor & Loads ( Technology Scaling) V 2 : Dynamic Voltage Scaling (low swing designs) f : Run only as fast as needed (Freqency Scaling ACPI) Combination of above : DVFS (Dynamic Voltage & Frequency Scaling) Can we break the f CV 2 barrier? YES with Resonance P = a C V dd 2 f Recycle instead of throwing away Adiabatic charge/discharge is lossless/heatless
CMOS Dynamic Power Consumption with Large Capacitive Loads Switching Model Total Energy From Source (E in ) CV 2 dd V DD Energy Equations Energy Stored in Capacitor (E C ) ½ CV dd 2 Pull-up S1 OUT Pull-Down ½ CV dd 2 S2 ½ CV dd 2 C Power Consumed at Frequency f P = a C V dd 2 f a Activity Factor (High for Clocks that switch every cycle) Reducing Power Reduce Each of the Terms
CDN with Energy Saving Resonance VDD Clock Driver L L PullUp S1 Resonance ON/OFF VOUT S3 C V LB C V LB C PullDown S2 C L C L V LB C L V LB C V LB =Vdd/2 Turn S3 off at non-resonance Frequencies
Efficiency = ( Pc-Pr)/Pc Design Sizing in 45nM Transistor W and Inductor L for 1GHz, 1pF Simulated Efficiencies (%) vs. Frequency (GHz) 50% 40% 30% 20% 10% 0% -10% -20% -30% 0.7 0.8 1.0 1.3 2.0 Frequency (GHz) 10X_Room 10X_HOT 5x_hot 5x_room
Improvement #1 Problem#1: Less savings from uniform placements ROCKS Clock Synthesis with Improved Inductor size & Placements
ROCKS Clock Synthesis Simulation Results : 45nm IBM Freq: 1GHz With resonant 07_rocks_test.py results: Buffers 18 Wires 185 Est_Power 13.644 mw Act_Power 9.559 mw Ambient Skew 11.000 03_grid_test_hspice.py results: Buffers 21 Wires 369 Est_Power 18.224 mw Act_Power 17.490 mw Ambient Skew 36.000 Power saving: 7.93 mw %Saving with LC resonance = 45%
Improvement # 2
A Wide Frequency Resonant Driver VDD VDD PullUp S1 VOUT Resonance ON/OFF S3 Refresh S1 VOUT PullDown S2 C L Store/Recover Close for T ON S2 i L L C v c Vdd/2 a) Conventional Driver with Resonant Option Vdd/2 (b) Wide Freq Range Resonant Driver
Power Savings Simulation Results with Parasitics 40% 30% 20% 45nM 0.5V 45nm 1V 10% 32nm 1V AMD CDN [3] 0% 0.01 0.1 1 10 Frequency (GHz ) 4. Xuchu Hu and Matthew R. Guthaus, Distributed LC Resonant Clock Grid Synthesis, IEEE Transactions On Circuits And Systems. 2012, in press. 17
18 References 1. S. C. Chan, P. J. Restle, T. J. Bucelot, J.S. Liberty, S. Weitzel, J. M. Keaty, B. Flachs, R. Volant, P. Kapusta, and J. S. Zimmerman, A Resonant Global Clock Distribution for the Cell Broadband Engine Processor, IEEE Journal Of Solid- State Circuits, Vol. 44, No. 1 pp.64-72, January 2009. 2. V. S. Sathe, S. Arekapudi, C. Ouyang, M. C. Papaefthymiou, A. Ishii and S. Naffzinger, Resonant Clock Design for Power-Efficient High-Volume x86-64 Microprocessor, IEEE Solid State Circuits Conference, 2012, vol. 55, pp. 68-69. 3. Xuchu Hu and Matthew R. Guthaus, Distributed LC Resonant Clock Grid Synthesis, IEEE Transactions On Circuits And Systems. 2012, in press. 4. Advanced Configuration and Power Interface (ACPI) is an open industry specification co-developed by Hewlett-Packard, Intel, Microsoft, Phoenix, and Toshiba: http://www.acpi.info
Questions? SCHOOL OF ENGINEERING