Advanced Domino Circuit Design

Similar documents
Lecture 19: Design for Skew

EE E6930 Advanced Digital Integrated Circuits. Spring, 2002 Lecture 7. Clocked and self-resetting logic I

EE 330 Lecture 42. Other Logic Styles Digital Building Blocks

Lecture 10. Circuit Pitfalls

Lecture 9: Clocking for High Performance Processors

EE 330 Lecture 44. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

EE-382M-8 VLSI II. Early Design Planning: Back End. Mark McDermott. The University of Texas at Austin. EE 382M-8 VLSI-2 Page Foil # 1 1

Lecture 11: Clocking

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

EE141-Spring 2007 Digital Integrated Circuits

EEC 118 Lecture #12: Dynamic Logic

Electronic Circuits EE359A

Chapter 6 Combinational CMOS Circuit and Logic Design. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

EE 330 Lecture 43. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

! Sequential Logic. ! Timing Hazards. ! Dynamic Logic. ! Add state elements (registers, latches) ! Compute. " From state elements

Energy-Recovery CMOS Design

Dynamic Logic. Domino logic P-E logic NORA logic 2-phase logic Multiple O/P domino logic Cascode logic 11/28/2012 1

Domino Static Gates Final Design Report

Power-Area trade-off for Different CMOS Design Technologies

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

EE 330 Lecture 43. Digital Circuits. Other Logic Styles Dynamic Logic Circuits

EECS150 - Digital Design Lecture 19 CMOS Implementation Technologies. Recap and Outline

Lecture 12 Memory Circuits. Memory Architecture: Decoders. Semiconductor Memory Classification. Array-Structured Memory Architecture RWM NVRWM ROM

Topic 6. CMOS Static & Dynamic Logic Gates. Static CMOS Circuit. NMOS Transistors in Series/Parallel Connection

Power Spring /7/05 L11 Power 1

PHYSICAL STRUCTURE OF CMOS INTEGRATED CIRCUITS. Dr. Mohammed M. Farag

! Review: Sequential MOS Logic. " SR Latch. " D-Latch. ! Timing Hazards. ! Dynamic Logic. " Domino Logic. ! Charge Sharing Setup.

Memory Basics. historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities

The Effect of Threshold Voltages on the Soft Error Rate. - V Degalahal, N Rajaram, N Vijaykrishnan, Y Xie, MJ Irwin

Leakage Power Minimization in Deep-Submicron CMOS circuits

RECENT technology trends have lead to an increase in

BASIC PHYSICAL DESIGN AN OVERVIEW The VLSI design flow for any IC design is as follows

! Is it feasible? ! How do we decompose the problem? ! Vdd. ! Topology. " Gate choice, logical optimization. " Fanin, fanout, Serial vs.

A Novel Low-Power Scan Design Technique Using Supply Gating

Sleepy Keeper Approach for Power Performance Tuning in VLSI Design

EC 1354-Principles of VLSI Design

Static Random Access Memory - SRAM Dr. Lynn Fuller Webpage:

I DDQ Current Testing

EECS150 - Digital Design Lecture 15 - CMOS Implementation Technologies. Overview of Physical Implementations

EECS150 - Digital Design Lecture 9 - CMOS Implementation Technologies

Design of Low Power Vlsi Circuits Using Cascode Logic Style

CHAPTER 3 NEW SLEEPY- PASS GATE

Memory, Latches, & Registers

Leakage Current Analysis

EE 330 Lecture 44. Digital Circuits. Ring Oscillators Sequential Logic Array Logic Memory Arrays. Final: Tuesday May 2 7:30-9:30

CMOS Digital Integrated Circuits Lec 11 Sequential CMOS Logic Circuits

CMOS Digital Integrated Circuits Analysis and Design

Announcements. Advanced Digital Integrated Circuits. Quiz #3 today Homework #4 posted This lecture until 4pm

CMPEN 411 VLSI Digital Circuits Spring Lecture 24: Peripheral Memory Circuits

Reduced Swing Domino Techniques for Low Power and High Performance Arithmetic Circuits

Lecture 02: Logic Families. R.J. Harris & D.G. Bailey

CHAPTER 6 DIGITAL CIRCUIT DESIGN USING SINGLE ELECTRON TRANSISTOR LOGIC

EE241 - Spring 2004 Advanced Digital Integrated Circuits. Announcements. Borivoje Nikolic. Lecture 15 Low-Power Design: Supply Voltage Scaling

Interconnect-Power Dissipation in a Microprocessor

Introduction to CMOS VLSI Design (E158) Lecture 9: Cell Design

Digital Integrated Circuits Lecture 20: Package, Power, Clock, and I/O

Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

Contents 1 Introduction 2 MOS Fabrication Technology

Introduction to CMOS VLSI Design (E158) Lecture 5: Logic

Lecture 4&5 CMOS Circuits

IN the design of the fine comparator for a CMOS two-step flash A/D converter, the main design issues are offset cancelation

Leakage Control Techniques for Designing Robust, Low Power Wide-OR Domino Logic for Sub-130nm CMOS Technologies

Geared Oscillator Project Final Design Review. Nick Edwards Richard Wright

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Design and Implement of Low Power Consumption SRAM Based on Single Port Sense Amplifier in 65 nm

Chapter 3 DESIGN OF ADIABATIC CIRCUIT. 3.1 Introduction

Digital Microelectronic Circuits ( ) CMOS Digital Logic. Lecture 6: Presented by: Adam Teman

電子電路. Memory and Advanced Digital Circuits

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

DESIGNING powerful and versatile computing systems is

LOGIC FAMILY LOGIC FAMILY

Lecture 16: Design for Testability. MAH, AEN EE271 Lecture 16 1

1. Short answer questions. (30) a. What impact does increasing the length of a transistor have on power and delay? Why? (6)

MOS Logic and Gate Circuits. Wired OR

Designing of Low-Power VLSI Circuits using Non-Clocked Logic Style

Lecture 10 Circuit Design Rules of Thumb. Overview

NanoFabrics: : Spatial Computing Using Molecular Electronics

Zero Steady State Current Power-on-Reset Circuit with Brown-Out Detector

A Low-Power SRAM Design Using Quiet-Bitline Architecture

UMAINE ECE Morse Code ROM and Transmitter at ISM Band Frequency

LSI and Circuit Technologies for the SX-8 Supercomputer

Lecture 9: Cell Design Issues

DIGITAL INTEGRATED CIRCUITS A DESIGN PERSPECTIVE 2 N D E D I T I O N

High Speed Communication Circuits and Systems Lecture 14 High Speed Frequency Dividers

Ruixing Yang

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2012

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

I/O Design EE141. Announcements. EE141-Fall 2006 Digital Integrated Circuits. Class Material. Pads + ESD Protection.

DESIGN & IMPLEMENTATION OF SELF TIME DUMMY REPLICA TECHNIQUE IN 128X128 LOW VOLTAGE SRAM

Keywords: VLSI; CMOS; Pass Transistor Logic (PTL); Gate Diffusion Input (GDI); Parellel In Parellel Out (PIPO); RAM. I.

Clock-Powered CMOS: A Hybrid Adiabatic Logic Style for Energy-Efficient Computing

CMOS VLSI Design (A3425)

Digital Design and System Implementation. Overview of Physical Implementations

Double Stage Domino Technique: Low- Power High-Speed Noise-tolerant Domino Circuit for Wide Fan-In Gates

Microcircuit Electrical Issues

EEC 216 Lecture #8: Leakage. Rajeevan Amirtharajah University of California, Davis

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

Jack Keil Wolf Lecture. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. MOSFET N-Type, P-Type.

Incorporating Variability into Design

High-Performance of Domino Logic Circuit for Wide Fan-In Gates Using Mentor Graphics Tools

Transcription:

Advanced Domino Circuit Design Part I: Gates & Sequencing David Harris Harvey Mudd College

Outline Domino Circuits Domino Sequencing Nonmonotonic Techniques Advanced Domino Circuit Design Slide 2

Logic Static CMOS is slow from big input transistors gates use clocked precharge transistor Operate in two steps: precharge and evaluate Static NOR3 A 6 φ B 6 A C 6 Y 1 1 1 NOR3 1 precharge transistor 2 B 2 C 2 2 foot Y φ Precharge Evaluate Precharge Y Advanced Domino Circuit Design Slide 3

Feet Foot transistor prevents contention between precharge and evaluation Can be left off if inputs are low during precharge footed unfooted φ Y φ Y inputs f inputs f Advanced Domino Circuit Design Slide 4

Logical Effort gates are 1.5-2x faster because of low C in Inverter NAND2 NOR2 static A 2 2 A 4 Y 2 A B Y 2 4 Y 1 B 2 1 1 g = 3/3 g = 4/3 g = 5/3 unfooted φ A 1 1 φ 1 Y Y A 2 φ 1 Y B 2 A 1 B 1 g d = 1/3 g d = 2/3 g d = 1/3 footed φ A φ 1 Y 1 Y A 3 φ 1 Y 2 B 3 A 2 B 2 2 g d = 2/3 3 g d = 3/3 2 g d = 2/3 Advanced Domino Circuit Design Slide 5

Monotonicity Inputs to dynamic gates must be monotonically rising while gate is in evaluation A can go 0 -> 0, 0 -> 1, 1 -> 1 but not 1 -> 0 A violates monotonicity during evaluation φ Precharge Evaluate Precharge Y Output should rise but does not Advanced Domino Circuit Design Slide 6

Cascading Gates But dynamic outputs are monotonically falling Can t cascade dynamic gates with same clock A = 1 φ A X Y φ Precharge Evaluate X Precharge X monotonically falls during evaluation Y Y should rise but cannot Advanced Domino Circuit Design Slide 7

Domino Logic Alternate dynamic gates with static inverters Skew inverters HI to favor critical rising outputs φ Precharge Evaluate Precharge domino AND W A B φ W X Y Z C X Y dynamic NAND inverter static φ A B W H C X φ Z Y H Z = A B φ X C φ Z Advanced Domino Circuit Design Slide 8

Logic in the Static Stage Domino gate = dynamic gate + static gate Static gate may do logic as well φ S0 S1 S2 S3 D0 D1 D2 D3 φ H Y S4 S5 S6 S7 D4 D5 D6 D7 Advanced Domino Circuit Design Slide 9

Dual-Rail Domino Domino computes noninverting functions (AND, OR) Dual-rail domino can compute all functions Y_l inputs f φ f Y_h Y_l = A*B A_l B_l φ φ A_h B_h Y_h = A*B φ Y_l = A xnor B A_h A_l φ A_l A_h Y_h = A xor B B_l B_h φ Advanced Domino Circuit Design Slide 10

Keepers node floats when 1 during evaluation Keeper provides weak feedback to hold state φ A 1 2 2 X width: min length: L H Y φ A 1 2 2 X H width: min length: L-min Y width: min length: min Y_l φ A_l A_h Y_h φ Advanced Domino Circuit Design Slide 11

Noise-Tolerant Precharge NTP uses very small pmos input transistors Compared to keepers, NTP is slower for same noise margin, but can (eventually) recover from upsets A B.4 φ.4φ 3 3 3 1 Y Delay (ps) 65 60 55 50 45 40 35 30 Footed NTP Footless NTP 20 22 24 26 28 30 Noise Margin (% of V DD ) Footed Feedback Keeper Footless Feedback Keeper Advanced Domino Circuit Design Slide 12

Burn-In Keepers Leakage is a problem during burn-in @ high temp Need strong keeper for burn-in but don t want to slow gate during normal operation φ X normal mode keeper BI burn-in keeper H Y inputs f Advanced Domino Circuit Design Slide 13

Charge Sharing gates are prone to charge sharing noise Secondary precharge transistor solves problem φ φ A B x C x Y C Y A Y Charge sharing noise φ A B x Y secondary precharge transistor x Advanced Domino Circuit Design Slide 14

Multiple Output Domino Logic One dynamic gate may drive multiple outputs Especially useful for adder carry chains φ φ P 1 G 1 C 1 C 0 G 3 P 3 P 1 G 1 c 3 φ φ C 2 P 2 G 2 P 1 G 1 C 0 P 3 P 1 G 1 C P G 4 4 4 G 3 φ P 3 P 1 G 1 C 0 C P G 4 4 4 C G 3 3 C 2 C 1 C 0 C 0 Advanced Domino Circuit Design Slide 15

Sneak Paths Sneak paths can cause improper evaluation Prevented through mutual exclusion φ 1 1 C 4 0 0 C 3 0 0 0 0 0 C 2 C 1 sneak path Advanced Domino Circuit Design Slide 16

NORA & Zipper NP Domino / NO-Race alternate dynamic nmos and pmos stages Usually a bad idea: footed pmos worse than static Also very sensitive to noise. AT&T CRISP µproc Similar to Zipper domino φ φ φ φ inputs stable during clk=1 n-logic f p-logic f n-logic f A B Y other p blocks other n blocks dynamic pmos NAND gate Advanced Domino Circuit Design Slide 17

Noise Domino is sensitive to many noise sources including Leakage keepers Charge sharing secondary precharge Capacitive coupling spacing & shielding Back gate coupling circuit design Power supply noise good supply grid Minority carrier injection avoid injectors Soft errors adequate capacitance Advanced Domino Circuit Design Slide 18

Outline Domino Circuits Domino Sequencing Nonmonotonic Techniques Advanced Domino Circuit Design Slide 19

Traditional Domino Sequencing One half-cycle evaluates while other recharges Latches hold results of stage that recharges T c clk clk clk clk clk clk clk clk clk clk clk clk Static Static Static Latch t pdq Static Static Static Latch t pdq Advanced Domino Circuit Design Slide 20

Clock Skew Path starts on latest skewed rising edge of clock Must complete before earliest skewed falling edge T c clk clk clk clk clk clk clk clk clk clk Static Static Latch Static Static Latch t setup t skew Advanced Domino Circuit Design Slide 21

Sequencing Overhead Latch and clock skew overhead in each half-cycle T logic = T c {2t setup + 2t skew } Also unable to borrow time between half-cycles to balance paths Traditional domino sequencing has too much overhead to be practical Most companies have developed some skewtolerant alternative to eliminate latches and overhead Advanced Domino Circuit Design Slide 22

Eliminating Latches If clocks overlap, latches aren t required to hold the data when a half-cycle precharges No latch at phase boundary φ 1 Nonoverlapping clocks: data lost φ 1 Overlapping clocks: data captured φ 1 φ 2 φ 2 φ 2 a Static b c d Static a b a b c c Advanced Domino Circuit Design Slide 23

Skew-Tolerant Domino Sequencing Skew-tolerant domino techniques use multiple overlapping clock phases and eliminate latches to achieve zero sequencing overhead Many ways to do this: OTB, N-phase, Delayed Reset, Self-Resetting, Postcharge, SRCMOS, Global STP Full keeper holds state when input precharges φ f H weak full keeper transistors X Advanced Domino Circuit Design Slide 24

Time Borrowing Skew-tolerant domino permits time borrowing t logic = T c ; t borrow = t overlap t skew t overlap t borrow t skew φ 1 φ 2 φ 1 φ 1 φ 1 φ 1 φ 1 φ 2 φ 2 φ 2 Static Static Static Static Static Static Static Static Phase 1 Phase 2 Advanced Domino Circuit Design Slide 25

Opportunistic Time Borrowing Domino Used on Itanium; two phases but no latches clk clkd clkb clkbd clk clkd clkd clkd clkb clkbd clkbd clkbd Static Static Static Static Static Static Static Static Phase 1 Phase 2 Advanced Domino Circuit Design Slide 26

N-Phase Domino In general, N overlapping phases for more overlap φ 1 φ 2 φ 3 φ 4 φ 1 φ 1 φ 2 φ 2 φ 3 φ 3 φ 4 φ 4 Static Static Static Static Static Static Static Static Phase 1 Phase 2 Phase 3 Phase 4 Advanced Domino Circuit Design Slide 27

Four-Phase Domino Itanium 2 uses four-phase Skew-Tolerant Domino Simple clock generation at clock gaters Delay each phase by ¼ cycle Optional clock choppers can increase overlap en clk φ 1 φ 2 φ 3 φ 4 Advanced Domino Circuit Design Slide 28

Delayed Reset Alternatively, use one phase per gate in cycle Delayed Reset, Cascaded Reset Delayed Clocking (IBM & Sun) Can use both clock edges clk φ 1 φ 2 φ 3 φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 φ 1 Static Static Static Static Static Static Static φ 4 φ 5 φ 6 Advanced Domino Circuit Design Slide 29

More Delayed Reset Or just one clk These N-phase techniques are well-suited to unfooted gates φ 1 φ 2 φ 3 φ 4 φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 φ 1 Static Static Static Static Static Static Static φ 5 φ 6 Advanced Domino Circuit Design Slide 30

Self-Resetting Domino Instead of supplying clock, use self-resetting gate Gate precharges itself five gate delays after evaluating No power consumed when gate is idle, but complicated timing analysis Ideal for RAM decoders A A Y reset f reset timing chain H reset Y optional reset of skewed gate Advanced Domino Circuit Design Slide 31

Predicated Self-Reset Self-resetting domino requires pulsed inputs Predicated self-resetting domino stretches output pulse until input pulse has ended reset A B f H reset Y Advanced Domino Circuit Design Slide 32

Postcharge Logic Unfooted self-resetting n and p dynamic gates Used to rapidly amplify leading edge of chip select in CS RAM chips φ pb φ 1 φ 1b φ 2 Pulse Generator φ 2b CS φ pb φ 3 φ 3b φ 4 75 25 455 1125 50 790 2000 50 910 1900 φ 1 φ 2 φ 3 φ 4 720 630 575 340 5 40 200 φ 1b φ 2b φ 3b 350 75 600 130 1070 400 130 1010 300 380 240 200 To large clocked nodes Advanced Domino Circuit Design Slide 33

SRCMOS IBM variant of self-resetting gates with static eval. Turns gate into pseudo-nmos during low-speed test reset se X H Y A f reset Advanced Domino Circuit Design Slide 34

SRCMOS Example Amortize cost of self-resetting pulse generator across many gates. A f H φ 1 φ 1 Y 1 Y H 2 Y H 3 Y H 4 power-on reset f SRCMOS Datapath φ 2 φ 2 f φ 3 φ 3 Reset Timing Chain f φ 4 φ 4 se Y 1 _h Y 1 _l done Use a timing chain to produce delayed clocks. A Y 1 Y 2 Y 3 Y 4 done reset φ 1 φ 2 φ 3 φ 4 reset Pulse Generator Advanced Domino Circuit Design Slide 35

Global Self-Terminating Precharge Intel variant of self-resetting gates from Pentium 4 Derives initial pulse from frequency doubler p k H H Reset Timing Chain φ 2 φ 3 p k H H φ 4 φ 5 Frequency Doubler p k p k p k p k p φ 6 k k X clk (2 GHz) φ 1 H A Y H 1 Y H 2 Y Y 4 H 3 H (4 GHz) f f f f Datapath Stages Advanced Domino Circuit Design Slide 36

Summary Skew-tolerant domino uses overlapping phases to eliminate latches and sequencing overhead More overlap permits more time borrowing How to generate clock: Global number of fixed phases Simplest option, easy to analyze Inverter chain with one gate per stage More convenient for unfooted gates Self-resetting pulse generator Saves clock power, very complicated design Advanced Domino Circuit Design Slide 37

Outline Domino Circuits Domino Sequencing Nonmonotonic Techniques Advanced Domino Circuit Design Slide 38

Nonmonotonic Techniques gates require monotonically rising inputs during evaluation so dynamic gates with same clock cannot be cascaded Domino provides these but only computes noninverting functions Dual-rail domino computes any function but is costly Nonmonotonic dynamic techniques cascade dynamic gates with delayed clocks so first gate settles before second evaluates Advanced Domino Circuit Design Slide 39

NOR-NOR Functions gates make very fast NORs NOR NOR cascade attractive (= AND OR) Require nonmonotonic techniques AND Plane OR Plane Inputs Outputs Advanced Domino Circuit Design Slide 40

Clock-Delayed Domino Delay the clock to the second dynamic gate Add 30% margin for process/environmental variation Less margin with replica delay lines A = 1 delay φ2 φ 1 Precharge Evaluate Precharge φ 1 A X Y φ2 X X monotonically falls during evaluation Y Y correctly remains high if X falls before φ 2 rises possible precharge race Advanced Domino Circuit Design Slide 41

Delay Elements φ 1 φ 1 V ref Buffers Switched capacitance φ 2 φ 2 φ 1 φ 1 φ 1 V ref φ 2 2 C big φ 2 Current-starved inverter φ 2 2 1 2 4 V 1 V 2 V 3 Digitally controlled current-starved inverter φ 1 1 1 1 0 Dummy gate φ 2 Matched delay To next S 0 S dynamic 1 D 0 D gate 1 Datapath Advanced Domino Circuit Design Slide 42

Race-based Logic Another common problem is a fast AND function (e.g. memory decoder) NAND uses series transistors. NOR uses parallel. Prefer to recast as NOR of inverted inputs But need a monotonically rising output to drive subsequent domino stages Several tricky circuits depend on races: Annihilation gates (Itanium2) Latched domino Complementary Signal Generator (Intel) Advanced Domino Circuit Design Slide 43

Annihilation Gates X = ~(A + B + C + D); Y = ABCD W begins to pull low, but recovers if X falls rapidly weak weak φ X φ W A B C D Logic function φ Y NOR Domino Buffer φ A X W Y Advanced Domino Circuit Design Slide 44

Latched Domino Latched Domino uses a different keeper structure Y_l weak φ X φ W weak Y_h A B C D Logic function φ Advanced Domino Circuit Design Slide 45

Complementary Signal Generator CSG uses a cross-coupled keeper weak φ X φ weak W A B C D Logic function φ φ φ A X W Advanced Domino Circuit Design Slide 46

Output Prediction Logic Race-based logic suggests that inputs need not be monotonically rising so long as keeper can recover. Output Predication Logic directly cascades dynamic gates, as in Clock-Delayed Domino. But clock delays are short enough that stages will glitch. NTP gates are used to recover well. If the delays are too short, the glitches will flip the gate and recovery is very slow. If the delays are too long, the circuit behaves as CD Domino. If the glitches are just right, the circuit could be very fast. Advanced Domino Circuit Design Slide 47

OPL Waveforms Path delay depends on delay line length and pmos widths in NTP gates. A B C D E F C p φ 1 φ 2 φ 3 φ 4 φ 5 φ 6 250 1.8 1.5 1.2 0.9 0.6 0.3 B C D B E F D F B D F φ 1-6 φ 1 φ 2 φ 3 φ 6 φ 1 φ 6 0.0 0 100 200 300 400 time (ps) C E 0 100 200 300 400 time (ps) C E 0 100 200 300 400 time (ps) Path Delay (ps) 200 150 100 50 0 P=5 P=3 P=1 0 5 10 15 20 25 30 Clock Delay (ps) Advanced Domino Circuit Design Slide 48

OPL Summary OPL seems attractive because it is extremely fast But The best delays are very short. How do we really generate them? How much margin must be provided for process and environmental variations? OPL advantages have yet to be convincingly demonstrated in silicon Advanced Domino Circuit Design Slide 49

Conclusion Domino is attractive for 1.5-2x speedup Careful design required for noise and monotonicity Traditional domino sequencing has much overhead Skew-tolerant domino eliminates this overhead Many flavors of skew-tolerant domino with clocked and self-timed precharge Nonmonotonic structures exploit very fast dynamic NORs Advanced Domino Circuit Design Slide 50

Part 2 A Domino Methodology and Some Common Pitfalls Tom Grutkowski Intel Tom Grutkowski Advanced Domino Circuit Design Slide 51

Outline of remainder of Goals tutorial. A Detail discussion of a complete domino methodology in use on a production microprocessor. Outline some of the common pitfalls found in domino design. Illustrate some actual silicon bugs. Inspire a little fear. Advanced Domino Circuit Design Slide 52

Itanium 2 Background Co developed by Intel and HP. Implement EPIC ISA Code Names: McKinely: 180 nm product. Madison: 130 nm product. Runs up to 1.5 GHz. 130 Watts limit for both 180 nm and 130 nm products. Area. McKinley:421 mm 2 Madison: 374 mm 2 Advanced Domino Circuit Design Slide 53

Itanium 2 Domino Circuitry Integer execution unit 6 pipes, single cycle bypass. Multi-Media pipes. 6 pipes, two cycle latency. 2 Floating Point FMAC/FMISC units Much of the pipe control. Out of order control issue logic for 2 nd level cache. Register file. Integer and Floating Point Register File 60+ miscellaneous register files. Advanced Domino Circuit Design Slide 54

Itanium 2 Methodology How can we improve on OTB? OTB Features: Provides for removal of latches. Allows time borrowing across clock phases. What would we like? Small, flexible, and robust. Scan capability on dynamic latches. Standard interface from dynamic to static. Standard interface from static to dynamic. Limit number of clocks. Advanced Domino Circuit Design Slide 55

Why Scan Capable Domino? What is scan? Ability to observe and control state elements through a serial chain controlled by the Test Access Port (TAP). Enables small portions of the design to be tested and debugged in isolation. Two varieties Destructive: Data in state elements destroyed during scan operation. Non-Destructive: Data in state elements preserved. Definitions: Full Scan=> All State Elements are scanned. ROSL => Read Only Scan Latch; no controls Scan base testing is used at wafer sort to isolate manufacturing defects before packaging. Itanium 2 example: Single Cycle: Integer Execution: 100% domino. Four Cycle Floating Point: 100% domino. Conclusion: Without a scan-capable domino methodology, scan vector coverage can be severely limited. Advanced Domino Circuit Design Slide 56

Scan Capable Domino latch. Clock First Step: Change Half Keeper -> Full Keeper Inputs Nmos Pulldown network noh Clock Clock Full Holder allows noh to hold a low arbitrarily long after inputs have precharged away. Inputs Clock Nmos Pulldown network noh Advanced Domino Circuit Design Slide 57

Where do we scan? CK NCK CK? NCK? Phase 1 Logic Nmos Pulldown network noh Phase 2 Logic Nmos Pulldown network noh CK? NCK? CK NCK We would like to observe, and control this point! Or this point. Advanced Domino Circuit Design Slide 58

Scanning Data In CK Observing state element relatively easy. CK Nmos Pulldown network noh Want to force date here. Attach ROSL to node. Impossible to scan data into noh without significant drive fight. When CK is low, noh is being pulled high by precharge FET. When CK is high, would fight against evaluation stack. Advanced Domino Circuit Design Slide 59

Latch Converter RCK Inputs ECK Nmos Pulldown network noh shift sin shift shift shift Z To next DLC From previous DLC sout CK ECK RCK Advanced Domino Circuit Design Slide 60

Bolting on a DLC Domino NAND Gate Latch Converter shift shiftb a b RCK noh sin noh z ECK RCK noh Complete Full Keeper ECK z sout RCK Extra drive to raise noh high within a pulse width. shiftb Advanced Domino Circuit Design Slide 61

Generating RCK and ECK Local Generation No significant RC concerns. Area expensive. Need a generator for each latch. Used in domino control. Global generation Used in data path applications Need to control RC Especially on RCK Tolerable Skew: ~2-3% of cycle time Area efficient. One generator for each register. ck eck CK Domino Gates rck Clock Gaters DLCed Gates Advanced Domino Circuit Design Slide 62

Concerned about Pulse? Pulse Concerns. Must be sufficient long enough to properly pre-charge noh node. Must also have limited overlap with ECK to avoid excessive short-circuit current, and delayed evaluation. Itanium 2 already uses Pulse Latches for Static Flip Flops. Establish methodology for distributing pulse clocks. RCK add only incremental risk. Advanced Domino Circuit Design Slide 63

Summary of DLC Benefits Flexibility Any standard dynamic gate can be converted DLC. Small Overhead A DLC consist as few as 12 transistors. Creates a static output. A signal generated by a first phase DLC will remain stable throughout the second phase. This saves on the need for extra latches or catchers. Scan Capable. Enables nearly full scan across the design. Tremendous benefit in silicon debug. Sequential depth ~ 1. Advanced Domino Circuit Design Slide 64

Static to Interface Problem: How do we take a static signal and introduce it to a dynamic circuit? Domino input must either be stable on CK rising edge, or monotonically rising. D Latch Q Static Logic D? Q CK Nmos Pulldown network noh CK CK CK Cycle 1 Cycle 2 Advanced Domino Circuit Design Slide 65

Some Options CK Flip Flop Just doesn t work! Not stable, not monotonic. NCK Flip Flop Domino input will be stable on rising CK. Previous static stage limited to a single phase. Waste phase of logic. NCK Transparent Latch Provides stable input to CK dynamic Cost: area and insertion delay. Advanced Domino Circuit Design Slide 66

Enter the Entry Latch ck in noh Replace single FET with simple logic functions. z a b Sized appropriately to create a virtual pulse Small and simple. Ability to do simple logic, helps negate any insertion delay. a b Advanced Domino Circuit Design Slide 67

Domino Pit Falls? Domino is sensitive to many noise sources including, we look at a few real world examples: Leakage keepers Charge sharing secondary precharge Capacitive coupling spacing & shielding Back gate coupling circuit design Power supply noise good supply grid Minority carrier injection avoid injectors Soft errors adequate capacitance Advanced Domino Circuit Design Slide 68

Noise Margin Sensitivities V t sensitivity. That which makes it fast, also makes it more dangerous. Static CMOS has a trip point which higher than a single Vt. Flip once, and lose! Once a dynamic circuit has switched, there is no recovery mechanisms. In a static circuit, noise glitches only cause failure when they are captured by latches. Static noise event normally results in frequency degradation, not a dead chip. Advanced Domino Circuit Design Slide 69

Capacitive Coupling AggressorSignal 1 Victim Aggressor Signal 2 ck ck noh z dynamic Victim noh z dynamic Feed thru noise to next circuit V t z static V t Advanced Domino Circuit Design Slide 70

Aside: The Shmoo Before looking at real world failures, we need to be familiar with this very important tool. A graphical representation of the performance characteristics of an IC The shmoo is named after creatures in the Lil Abner cartoon strip Believe it or not, shmoo is now a registered trademark! Advanced Domino Circuit Design Slide 71

Standard Speed path Shmoo Increasing Voltage Decreasing Frequency Advanced Domino Circuit Design Slide 72

Shmoo Usage Normally varies frequency and/or voltage See how chip responds at different operating points. Green is good =>passing; Red is failing The shape of the shmoo should be the first clue to the nature of the silicon failure. Can also vary other operating parameters, examples: Frequency vs. Temp; I/O Voltage versus Core voltage Advanced Domino Circuit Design Slide 73

Register File Failure Failure in general purpose register file. Seen only at High Voltage. (Shmoo) 1 st seen in system test. Appears coupling related. Test case transferred to stored response tester: Scan collected. Clearly indicated coupling issue. Bits only failing in single direction Reads are good. Writes are failing! Advanced Domino Circuit Design Slide 74

RF Write Failure Shmoo Increasing Voltage Decreasing Frequency Advanced Domino Circuit Design Slide 75

Register File Design Register File design challenges 128 entry x 65 bits 12 read ports, 8 write ports Write ports may be written by either the IEU or MMU. Must be capable of performing write back stage bypassing. Needs to area efficient. is_ieu is_mmu 128 entries in all IEU Register File MMU Advanced Domino Circuit Design Slide 76

Register Bit Line Writes Entry D Latch Q nck 128 Entries Entry Q D Latch pck wl x10 ports wl x10 ports pck ck ck IEU Driver Large Device write write MMU Driver CK PCK Sharp Edge bitline wl Advanced Domino Circuit Design Slide 77

Backside Probe Waveforms Actual drive on bitline. Coupled drive onto bitline Bitline Word Line Advanced Domino Circuit Design Slide 78

A Closer Look. ck Culprit Driver nck 1200 um of minimum width/space Metal 2 pck Culprit Driver This FET prevents the victim from being locally held during the noise event. Fix: simply remove this FET, and hold the bitline all the time Advanced Domino Circuit Design Slide 79

Lessons Learned Circuit Design Open Drain Buses are subject to failure. Usually feed skewed receivers. No substantial drivers on victim line during noise events (weakly held) When designing risky circuits, design team must stay current with all process file changes. Engineering Tradeoffs Odd are the circuit you re working on will NOT limit the speed of the chip Robustness is much more important than speed of any particular circuit! How much is that little tweak buying you in frequency. Is it worth it? Always ask the question: Can I make this circuit more robust? Advanced Domino Circuit Design Slide 80

2 nd Level Cache Coupling Failure seen on almost all patterns. Shmoo is characterized as a half flying saucer. Not seen on first silicon, only seen on new stepping which was tweaked for speed. Shmoo shape indicates: High voltage failure. Again noise issue is suspected. Reverse speed path nature indicates a frequency dependency for the coupling event. Straight forward debug process bring debug team to the read out of the 2 nd level Cache. Failure Advanced Domino Circuit Design Slide 81

2 nd Level Cache design. 256K Byte + ECC 8 Way Cache Pseudo 4 ported. If each port is accessing a separate bank than all 4 ports Control logic prevents bank conflicts 16 banks Advanced Domino Circuit Design Slide 82

Cache Read Circuitry Bank Read Way Select Bank Selection 16:1 Mux x8 Read Data Downstream Scan Data clearly indicates that this signal is corrupted. 0s are getting transformed into 1s. Advanced Domino Circuit Design Slide 83

A Closer Look. Way Muxs Precharge/ pull down. Driver. Each victim data line has 2 lateral attackers. Slow edges. Data Inputs (open drain) rck Read Data Each victim has only one lateral attack. Select lines are mutex. Select lines Drivers Precharge/Pull down Significant Orthogonal Coupling! This must be accounted for. Advanced Domino Circuit Design Slide 84

LVP Waveforms Coupling Event Advanced Domino Circuit Design Slide 85

Fix and Lessons Select line drivers changed to static drivers. nck To Bank Select Muxs To Bank Select Muxs Decoder Decoder Solution gives up a little speed for a robust design. Orthogonal metal coupling not properly accounted for. Three dimensional geometries need to be considered Reasonably pessimistic initial assumptions for all noise analysis. Advanced Domino Circuit Design Slide 86

Power Grid and Coupling Supply/ground on a chip is not equipotential. Any circuit is only as good as its underlying power grid! Especially true for domino circuitry. in ck Can be 100s or even 1000s ums. 10s-100s of millivolts ck noh Power grid most often cannot be verified by inspection. Simulations are required. Advanced Domino Circuit Design Slide 87

Domino and Large Drivers. CK CK Nmos Pulldown network noh Large Driver Large Interconnect Load Large drivers will dump transient current into the power grid. Affects the apparent trip point of local domino circuitry. Making circuit more sensitive to noise events. Avoid this practice, if forced: Smear out evaluation: have adjacent domino drivers evaluate on skewed clocks. Bypass Cap. Advanced Domino Circuit Design Slide 88

Fighting Coupling Use caution with highly ratioed gates Performance gain vs. noise margin reduction flattens out at somewhere between 4:1 and 6:1 for high skew devices. Avoid receiving long routes directly into domino gates. Will a static design work? In most case you will save power. Pseudo NMOS? Better noise immunity Anti-Miller Devices. Statizing a domino node. Orthogonal Metal Fill. Robust Power Grid. Advanced Domino Circuit Design Slide 89

Long Interconnect sel[0] 4:1 Mux attackers x4 attackers sel[0] x4 4:1 Mux with one static gate of noise rejection Advanced Domino Circuit Design Slide 90

Anti-Miller Devices A Full Bypassed Six-Issue Data path and Register File on the Itanium-2 Microprocessor ; E.S.Fetzer, et al. Simple and small. Inverter in series with a capacitor. Capacitor is formed using an NMOS FET with shorted source-drain. Any coupling event is offset by charge of opposite sense charge being dumped on victim line. Minimal frequency impact. Enabled fully packed metal routing on Itanium 2. Advanced Domino Circuit Design Slide 91

Statizing a Domino Circuit Two attackers. a b CK CK no h Adding small PFET adds significant noise immunity while minimally reducing circuit speed. Two attackers. B Input Failing for noise check. a b CK noh CK Advanced Domino Circuit Design Slide 92

Typical Charge Share Issue. CLK Domino Gate Storage node Interstitial parasitic capacitance High capacity node dumps charge from lower capacity node forcing domino gate to flip. Easy to avoid Interstitial precharges device. Place one hot signals on top FET of stacks. Advanced Domino Circuit Design Slide 93

Charge Share + Coupling ~7 FF ~7 FF Attacker ~45 FF Circuit switches between two banked registers. If switchback signal is slow enough, memory cell easily overcomes charge share. At the fastest corner of newest process, the circuit fails. Coupling onto d1 node combines to make problem much worse. Advanced Domino Circuit Design Slide 94

Leakage Failures Leakage is becoming a more significant concern as we move to tighter geometries. Leakage grows 2-4x per generation Biggest effect is on power dissipation. 180nm: < 5% of total power. 130nm: 10-30% 90nm: 50%? Circuits need to work in the face of this reality. Keeper sizing 180nm keeper sizing: 1% 90nm keeper sizing: 6% Burn-in exacerbates the situation. Temperature saturated at high end of spec. 1.2x to 1.3x use voltage : DIBL effects Advanced Domino Circuit Design Slide 95

Leakage Failure Shmoo Increasing Voltage Decreasing Frequency Advanced Domino Circuit Design Slide 96

Cache Dump Circuit This bitline is read = 1 One or more of these maybe 0. Stacked Weak Keeper 1 0 0 0 din Leakage Paths Combines 4-16 local bit lines into a global bit line. This is a highly utilized circuit. 10s of thousands of usages. Similar versions found in nearly all caches. din nck Global bitline Predischage Logic Advanced Domino Circuit Design Slide 97

Layout Geometries Odd rows keeper fets Odd rows Col select fets S-shaped FETs are pinched causing higher leakage. Even rows keeper fets Even rows Col select fets Highly reproduced circuits need to have unquestioned robustness. Cache Circuits. Statistical Analysis -> Monty Carlo Simulation. Unexpected processing issues will cause a marginal design to fail. Risk Reward Assessment. (e.g. predischarge logic) Advanced Domino Circuit Design Slide 98

Pulse Precharge Pulse Precharge Load here will determine how feedback will shut off. Advanced Domino Circuit Design Slide 99

Pulse Precharge No Load Case sout node easily make it to full rail with RCK pulse. Worst Case Load sout is unable to fully precharge within pulse. Experience designer includes a switch to get wide pulse! Advanced Domino Circuit Design Slide 100

Some Final Words. Domino is here to stay. High performance designs demand the performance. Density benefits. Register Files. Large CAMs Muxes. Methodology Robustness is Job #1. Standardization. At most, solve a problem once per product. Minimize silicon debug issues. When to use domino? Consider static or pseudo-nmos. Make the proper tradeoffs. Advanced Domino Circuit Design Slide 101

A few more words. Multiple factors often combine to cause a silicon failure. Power grid, charge sharing, noise events, layout geometries, leakage, etc. Develop tools and an understanding that address the interplay of these factors. Problems need to found in pre-silicon. Post silicon failure are very expensive. Make the proper choices. Each risk should be balanced by sufficient benefit. Simulate, re-simulate, and then Question. Future process implications Increased Leakage. Coupling (faster edge, tighter geometries, hopefully low-k dielectric) Increased Process Variability Design for the future. Good Luck! Advanced Domino Circuit Design Slide 102

Backup Tom Grutkowski Advanced Domino Circuit Design Slide 103

Itanium 2 Die Photo Advanced Domino Circuit Design Slide 104