CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997

Similar documents
Registers. CS152 Computer Architecture and Engineering Lecture 3

CS4617 Computer Architecture

CPE/EE 427, CPE 527 VLSI Design I: Homeworks 3 & 4

Chapter 3. H/w s/w interface. hardware software Vijaykumar ECE495K Lecture Notes: Chapter 3 1

ENGIN 112 Intro to Electrical and Computer Engineering

CS61c: Introduction to Synchronous Digital Systems

CS 110 Computer Architecture Lecture 11: Pipelining

EECS150 - Digital Design Lecture 19 CMOS Implementation Technologies. Recap and Outline

CS Computer Architecture Spring Lecture 04: Understanding Performance

EECS150 - Digital Design Lecture 2 - CMOS

12 BIT ACCUMULATOR FOR DDS

DESIGN AND ANALYSIS OF LOW POWER 10- TRANSISTOR FULL ADDERS USING NOVEL X-NOR GATES

Computer Architecture (TT 2012)

Digital Design and System Implementation. Overview of Physical Implementations

ECE 334: Electronic Circuits Lecture 10: Digital CMOS Circuits

EECS150 - Digital Design Lecture 2 - Synchronous Digital Systems Review Part 1. Outline

SURVEY AND EVALUATION OF LOW-POWER FULL-ADDER CELLS

Lecture 0: Introduction

EECS150 - Digital Design Lecture 15 - CMOS Implementation Technologies. Overview of Physical Implementations

EECS150 - Digital Design Lecture 9 - CMOS Implementation Technologies

Investigation on Performance of high speed CMOS Full adder Circuits

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

Lecture 16. Complementary metal oxide semiconductor (CMOS) CMOS 1-1

EECS150 - Digital Design Lecture 28 Course Wrap Up. Recap 1

EE584 Introduction to VLSI Design Final Project Document Group 9 Ring Oscillator with Frequency selector

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

EE 42/100 Lecture 24: Latches and Flip Flops. Rev A 4/14/2010 (8:30 PM) Prof. Ali M. Niknejad

II. Previous Work. III. New 8T Adder Design

A Novel Low-Power Scan Design Technique Using Supply Gating

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique

EE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad

CSE 260 Digital Computers: Organization and Logical Design. Midterm Solutions

Preface to Third Edition Deep Submicron Digital IC Design p. 1 Introduction p. 1 Brief History of IC Industry p. 3 Review of Digital Logic Gate

Lecture 19: Design for Skew

EE141-Spring 2007 Digital Integrated Circuits

! Sequential Logic. ! Timing Hazards. ! Dynamic Logic. ! Add state elements (registers, latches) ! Compute. " From state elements

CMOS Process Variations: A Critical Operation Point Hypothesis

A Low Power Array Multiplier Design using Modified Gate Diffusion Input (GDI)

IC Layout Design of 4-bit Universal Shift Register using Electric VLSI Design System

A NOVEL 4-Bit ARITHMETIC LOGIC UNIT DESIGN FOR POWER AND AREA OPTIMIZATION

Performance Comparison of High-Speed Adders Using 180nm Technology

Lecture 9: Clocking for High Performance Processors

Low-Power Digital CMOS Design: A Survey

A Survey of the Low Power Design Techniques at the Circuit Level

ECE 683 Project Report. Winter Professor Steven Bibyk. Team Members. Saniya Bhome. Mayank Katyal. Daniel King. Gavin Lim.

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Design of Low power and Area Efficient 8-bit ALU using GDI Full Adder and Multiplexer

EECS 141: SPRING 98 FINAL

Low Power 8-Bit ALU Design Using Full Adder and Multiplexer

Lecture 02: Digital Logic Review

I/O Design EE141. Announcements. EE141-Fall 2006 Digital Integrated Circuits. Class Material. Pads + ESD Protection.

High Performance Low-Power Signed Multiplier

Gdi Technique Based Carry Look Ahead Adder Design

CMOS Digital Integrated Circuits Lec 11 Sequential CMOS Logic Circuits

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

12-nm Novel Topologies of LPHP: Low-Power High- Performance 2 4 and 4 16 Mixed-Logic Line Decoders

ADIABATIC LOGIC FOR LOW POWER DIGITAL DESIGN

Geared Oscillator Project Final Design Review. Nick Edwards Richard Wright

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Topic 6. CMOS Static & Dynamic Logic Gates. Static CMOS Circuit. NMOS Transistors in Series/Parallel Connection

ELEC Digital Logic Circuits Fall 2015 Delay and Power

Design and Implementation of Complex Multiplier Using Compressors

A Fast Dynamic 64-bit Comparator with Small Transistor Count

EMT 251 Introduction to IC Design

UNIT-III POWER ESTIMATION AND ANALYSIS

Introduction. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

Design and Analysis of Low-Power 11- Transistor Full Adder

Design & Analysis of Low Power Full Adder

Design of 32-bit ALU using Low Power Energy Efficient Full Adder Circuits

UNIT-II LOW POWER VLSI DESIGN APPROACHES

COMPREHENSIVE ANALYSIS OF ENHANCED CARRY-LOOK AHEAD ADDER USING DIFFERENT LOGIC STYLES

19. Design for Low Power

Digital Microelectronic Circuits ( ) CMOS Digital Logic. Lecture 6: Presented by: Adam Teman

Lecture 11: Clocking

Timing analysis can be done right after synthesis. But it can only be accurately done when layout is available

Lecture 4&5 CMOS Circuits

Leakage Power Reduction in 5-Bit Full Adder using Keeper & Footer Transistor

Module-3: Metal Oxide Semiconductor (MOS) & Emitter coupled logic (ECL) families

Design of 32-bit Carry Select Adder with Reduced Area

A HIGH SPEED & LOW POWER 16T 1-BIT FULL ADDER CIRCUIT DESIGN BY USING MTCMOS TECHNIQUE IN 45nm TECHNOLOGY

Chapter 1 Introduction

Low Power 8-Bit ALU Design Using Full Adder and Multiplexer Based on GDI Technique

Power-Area trade-off for Different CMOS Design Technologies

Disseny físic. Disseny en Standard Cells. Enric Pastor Rosa M. Badia Ramon Canal DM Tardor DM, Tardor

Introduction to CMOS VLSI Design (E158) Lecture 5: Logic

A Novel Low Power, High Speed 14 Transistor CMOS Full Adder Cell with 50% Improvement in Threshold Loss Problem

Chapter 3 Digital Logic Structures

Power Spring /7/05 L11 Power 1

ECE 2300 Digital Logic & Computer Organization

Layers. Layers. Layers. Transistor Manufacturing COMP375 1

CS250 VLSI Systems Design. Lecture 3: Physical Realities: Beneath the Digital Abstraction, Part 1: Timing

16-bit Digital Adder Design in 250nm and 64-bit Digital Comparator Design in 90nm CMOS Technologies

Lecture 1. Tinoosh Mohsenin

CS302 - Digital Logic Design Glossary By

Low Power Design for Systems on a Chip. Tutorial Outline

A Novel Approach for High Speed and Low Power 4-Bit Multiplier

Engr354: Digital Logic Circuits

Chapter 2 Combinational Circuits

Chapter 1: Digital logic

A Low-Power High-speed Pipelined Accumulator Design Using CMOS Logic for DSP Applications

Transcription:

CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling September 3, 1997 Dave Patterson (httpcsberkeleyedu/~patterson) lecture slides: http://www-insteecsberkeleyedu/~cs152/ cs 152 Lec3delay1 @UCB Fall 1997

Outline of Today s Lecture Review (1 minute) ISA, Performance Wrap-up (5 minutes) Performance and Technology (10 minutes) Administrative Matters and Questions (2 minutes) Delay Modeling and Gate Characterization (20 minutes) Questions and Break (5 minutes) Clocking Methodologies and Timing Considerations (25 minutes) cs 152 Lec3delay2 @UCB Fall 1997

Summary: Salient features of MIPS I 32-bit fixed format inst (3 formats) 32 32-bit GPR (R0 contains zero) and 32 FP registers (and HI LO) partitioned by software convention 3-address, reg-reg arithmetic instr Single address mode for load/store: base+displacement no indirection, scaled 16-bit immediate plus LUI Simple branch conditions compare against zero or two registers for =, no integer condition codes Delayed branch execute instruction after the branch (or jump) even if the branch is taken (Compiler can fill a delayed branch with useful work about 50% of the time) cs 152 Lec3delay3 @UCB Fall 1997

Summary: Instruction set design (MIPS) Use general purpose registers with a load-store architecture: YES Provide at least 16 general purpose registers plus separate floatingpoint registers: 31 GPR & 32 FPR Support basic addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; : YES: 16 bits for immediate, displacement (disp=0 => register deferred) All addressing modes apply to all data transfer instructions : YES Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size : Fixed Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers: YES Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8-bits long), jump, call, and return: YES, 16b Aim for a minimalist instruction set: YES cs 152 Lec3delay4 @UCB Fall 1997

Evaluating Instruction Sets? Design-time metrics: Can it be implemented, in how long, at what cost? Can it be programmed? Ease of compilation? Static Metrics: How many bytes does the program occupy in memory? Dynamic Metrics: How many instructions are executed? How many bytes does the processor fetch to execute the program? CPI How many clocks are required per instruction? How "lean" a clock is practical? Best Metric: Time to execute the program! Inst Count Cycle Time NOTE: this depends on instructions set, processor organization, and compilation techniques cs 152 Lec3delay5 @UCB Fall 1997

Review: Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle instr count CPI clock rate Program X Compiler X X Instr Set X X Organization X X Technology X cs 152 Lec3delay6 @UCB Fall 1997

Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = -------------------- = --------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) ((1-F) + F/S) X ExTime(without E) Speedup(with E) 1 (1-F) + F/S cs 152 Lec3delay7 @UCB Fall 1997

Performance and Technology Trends 1000 Performance 100 10 1 Supercomputers Microprocessors Mainframes Minicomputers 01 1965 1970 1975 1980 1985 1990 1995 2000 Technology Power: 12 x 12 x 12 = 17 x / year Feature Size: shrinks 10% / yr => Switching speed improves 12 / yr Density: improves 12x / yr Die Area: 12x / yr The lesson of RISC is to keep the ISA as simple as possible: Shorter design cycle => fully exploit the advancing technology (~3yr) Advanced branch prediction and pipeline techniques Bigger and more sophisticated on-chip caches Year cs 152 Lec3delay8 @UCB Fall 1997

Technology => Performance Complex Cell CMOS Logic Gate Transistor Wires cs 152 Lec3delay9 @UCB Fall 1997

Range of Design Styles Custom Design Standard Cell Gate Array/FPGA/CPLD Custom Control Logic Custom ALU Custom Register File Gates Standard ALU Standard Registers Gates Routing Channel Gates Routing Channel Gates Performance Design Complexity (Design Time) Compact Longer wires cs 152 Lec3delay10 @UCB Fall 1997

Basic Technology: CMOS CMOS: Complementary Metal Oxide Semiconductor NMOS (N-Type Metal Oxide Semiconductor) transistors PMOS (P-Type Metal Oxide Semiconductor) transistors NMOS Transistor Apply a HIGH (Vdd) to its gate turns the transistor into a conductor Apply a LOW (GND) to its gate shuts off the conduction path Vdd = 5V GND = 0v PMOS Transistor Apply a HIGH (Vdd) to its gate shuts off the conduction path Apply a LOW (GND) to its gate turns the transistor into a conductor Vdd = 5V GND = 0v cs 152 Lec3delay11 @UCB Fall 1997

Basic Components: CMOS Inverter Symbol Circuit Vdd PMOS In Out In Out NMOS Inverter Operation Vdd Vout Vdd Vdd Vdd Charge Open Out Open Discharge Vdd Vin cs 152 Lec3delay12 @UCB Fall 1997

Basic Components: CMOS Logic Gates NAND Gate NOR Gate A B Out A B Out A B Out 0 0 1 0 1 1 1 0 1 1 1 0 A B Out 0 0 1 0 1 0 1 0 0 1 1 0 Vdd Vdd A Out B B Out A cs 152 Lec3delay13 @UCB Fall 1997

Gate Comparison Vdd A Vdd Out B B Out A NAND Gate NOR Gate If PMOS transistors is faster: It is OK to have PMOS transistors in series NOR gate is preferred NOR gate is preferred also if H -> L is more critical than L -> H If NMOS transistors is faster: It is OK to have NMOS transistors in series NAND gate is preferred NAND gate is preferred also if L -> H is more critical than H -> L cs 152 Lec3delay14 @UCB Fall 1997

Administrative Matters CS152 news group: ucbclasscs152 (email cs152@cory with specific questions) Slides, handouts available via WWW: http://www-insteecsberkeleyedu/~cs152/fa97 Video tapes of lectures available for viewing in 205 McLaughlin Prerequisite quiz Friday September 5: CS 61C, CS 150 Review Chapters 1-4, 71-72 Ap, B of COD:HSI 2nd Edition Turn in survey forms with photo cs 152 Lec3delay15 @UCB Fall 1997

Ideal (CS) versus Reality (EE) When input 0 -> 1, output 1 -> 0 but NOT instantly Output goes 1 -> 0: output voltage goes from Vdd (5v) to 0v When input 1 -> 0, output 0 -> 1 but NOT instantly Output goes 0 -> 1: output voltage goes from 0v to Vdd (5v) Voltage does not like to change instantaneously Voltage 1 => Vdd Vout In Out Vin 0 => GND Time cs 152 Lec3delay16 @UCB Fall 1997

Fluid Timing Model Level (V) = Vdd SW1 Tank Level (Vout) SW2 Sea Level (GND) SW1 Vdd Vout Reservoir Tank (Cout) Bottomless Sea SW2 Cout Water <-> Electrical Charge Tank Capacity <-> Capacitance (C) Water Level <-> Voltage Water Flow <-> Charge Flowing (Current) Size of Pipes <-> Strength of Transistors (G) Time to fill up the tank ~ C / G cs 152 Lec3delay17 @UCB Fall 1997

Series Connection Vin V1 Vout Vdd Vdd Voltage Vdd G1 G2 Vin G1 V1 G2 C1 Vin V1 Vout Vout Cout Vdd/2 d1 d2 GND Time Total Propagation Delay = Sum of individual delays = d1 + d2 Capacitance C1 has two components: Capacitance of the wire connecting the two gates Input capacitance of the second inverter cs 152 Lec3delay18 @UCB Fall 1997

Review: Calculating Delays Vin V1 V2 Vdd Vdd V3 Vin G1 V1 C1 G2 V2 Vdd Sum delays along serial paths G3 V3 Delay (Vin -> V2)! = Delay (Vin -> V3) Delay (Vin -> V2) = Delay (Vin -> V1) + Delay (V1 -> V2) Delay (Vin -> V3) = Delay (Vin -> V1) + Delay (V1 -> V3) Critical Path = The longest among the N parallel paths C1 = Wire C + Cin of Gate 2 + Cin of Gate 3 cs 152 Lec3delay19 @UCB Fall 1997

Review: General C/L Cell Delay Model A B X Combinational Logic Cell Vout Cout Internal Delay Delay Va -> Vout X X X X X X delay per unit load Ccritical Cout Combinational Cell (symbol) is fully specified by: functional (input -> output) behavior - truth-table, logic equation, VHDL load factor of each input critical propagation delay from each input to each output for each transition - T HL (A, o) = Fixed Internal Delay + Load-dependent-delay x load Linear model composes cs 152 Lec3delay20 @UCB Fall 1997

Characterize a Gate Input capacitance for each input For each input-to-output path: For each output transition type (H->L, L->H, H->Z, L->Z etc) - Internal delay (ns) - Load dependent delay (ns / ff) Example: 2-input NAND Gate A B Out Delay A -> Out Out: Low -> High For A and B: Input Load = 61 ff For either A -> Out or B -> Out: TPlh = 05ns Tplhf = 00021ns / ff TPhl = 01ns TPhlf = 00020ns / ff 05ns Slope = 00021ns / ff Cout cs 152 Lec3delay21 @UCB Fall 1997

A Specific Example: 2 to 1 MUX A Wire 0 B Gate 1 Gate 2 Wire 1 Wire 2 Gate 3 Y = (A and!s) or (A and S) A B S2 x 1 Mux Y S Input Load (IL) A, B: IL (NAND) = 61 ff S: IL (INV) + IL (NAND) = 50 ff + 61 ff = 111 ff Load Dependent Delay (LDD): Same as Gate 3 TAYlhf = 0021 ns / ff TAYhlf = 0020 ns / ff TBYlhf = 0021 ns / ff TBYhlf = 0020 ns / ff TSYlhf = 0021 ns / ff TSYlhf = 0020 ns / ff cs 152 Lec3delay22 @UCB Fall 1997

2 to 1 MUX: Internal Delay Calculation A Wire 0 Gate 1 Wire 1 Gate 3 Y = (A and!s) or (A and S) B Gate 2 Wire 2 S Internal Delay (ID): A to Y: ID G1 + (Wire 1 C + G3 Input C) * LDD G1 + ID G3 B to Y: ID G2 + (Wire 2 C + G3 Input C) * LDD G2 + ID G3 S to Y (Worst Case) : ID Inv + (Wire 0 C + G1 Input C) * LDD Inv + Internal Delay A to Y We can approximate the effect of Wire 1 C by: Assume Wire 1 has the same C as all the gate C attache to it Total C Gate 1 need to drive: 20 x Input C of Gate 3 cs 152 Lec3delay23 @UCB Fall 1997

2 to 1 MUX: Internal Delay Calculation (continue) A Wire 0 Gate 1 Wire 1 Gate 3 Y = (A and!s) or (A and S) B Gate 2 Wire 2 S Internal Delay (ID): A to Y: ID G1 + (Wire 1 C + G3 Input C) * LDD G1 + ID G3 B to Y: ID G2 + (Wire 2 C + G3 Input C) * LDD G2 + ID G3 S to Y (Worst Case): ID Inv + (Wire 0 C + G1 Input C) * LDD Inv + Internal Delay A to Y Specific Example: TAYlh = TPhl G1 + (20 * 61 ff) * TPhlf G1 + TPlh G3 = 01ns + 122 ff * 00020 ns/ff + 05ns = 0844 ns cs 152 Lec3delay24 @UCB Fall 1997

Abstraction: 2 to 1 MUX A Gate 1 Gate 3 Y A B Y B Gate 2 S2 x 1 Mux S Input Load: A = 61 ff, B = 61 ff, S = 111 ff Load Dependent Delay: TAYlhf = 0021 ns / ff TBYlhf = 0021 ns / ff TSYlhf = 0021 ns / ff TAYhlf = 0020 ns / ff TBYhlf = 0020 ns / ff TSYlhf = 0020 ns / f F Internal Delay: TAYlh = TPhl G1 + (20 * 61 ff) * TPhlf G1 + TPlh G3 = 01ns + 122 ff * 00020ns/fF + 05ns = 0844ns Fun Exercises: TAYhl, TBYlh, TSYlh, TSYlh cs 152 Lec3delay25 @UCB Fall 1997

Break (5 Minutes) cs 152 Lec3delay26 @UCB Fall 1997

Storage Element s Timing Model Clk D Q Setup Hold D Don t Care Don t Care Q Unknown Clock-to-Q Setup Time: Input must be stable BEFORE the trigger clock edge Hold Time: Input must REMAIN stable after the trigger clock edge Clock-to-Q time: Output cannot change instantaneously at the trigger clock edge Similar to delay in logic gates, two components: - Internal Clock-to-Q - Load dependent Clock-to-Q cs 152 Lec3delay27 @UCB Fall 1997

CS152 Logic Elements NAND2, NAND3, NAND 4 NOR2, NOR3, NOR4 INV1x (normal inverter) INV4x (inverter with large output drive) cs 152 Lec3delay28 @UCB Fall 1997

CS152 Logic Elements (Continue) XOR2 XNOR2 PWR: Source of 1 s GND: Source of 0 s fast MUXes (maybe) cs 152 Lec3delay29 @UCB Fall 1997

CS152 Storage Element D flip flop with negative edge triggered cs 152 Lec3delay30 @UCB Fall 1997

Clocking Methodology Clk Combination Logic All storage elements are clocked by the same clock edge The combination logic block s: Inputs are updated at each clock tick All outputs MUST be stable before the next clock tick cs 152 Lec3delay31 @UCB Fall 1997

Critical Path & Cycle Time Clk Critical path: the slowest path between any two storage devices Cycle time is a function of the critical path must be greater than: Clock-to-Q + Longest Path through the Combination Logic + Setup cs 152 Lec3delay32 @UCB Fall 1997

Clock Skew s Effect on Cycle Time Clk1 Clk2 Clock Skew The worst case scenario for cycle time consideration: The input register sees CLK1 The output register sees CLK2 Cycle Time CLK-to-Q + Longest Delay + Setup + Clock Skew cs 152 Lec3delay33 @UCB Fall 1997

Tricks to Reduce Cycle Time Reduce the number of gate levels A B C D A B C D Pay attention to loading One gate driving many gates is a bad idea Avoid using a small gate to drive a long wire Use multiple stages to drive large load INV4x Clarge INV4x cs 152 Lec3delay34 @UCB Fall 1997

How to Avoid Hold Time Violation? Clk Combination Logic Hold time requirement: Input to register must NOT change immediately after the clock tick This is usually easy to meet in the edge trigger clocking scheme Hold time of most FFs is <= 0 ns CLK-to-Q + Shortest Delay Path must be greater than Hold Time cs 152 Lec3delay35 @UCB Fall 1997

Clock Skew s Effect on Hold Time Clk1 Clk2 Clock Skew Combination Logic Clk2 Clk1 The worst case scenario for hold time consideration: The input register sees CLK2 The output register sees CLK1 fast FF2 output must not change input to FF1 for same clock edge (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time cs 152 Lec3delay36 @UCB Fall 1997

Summary Performance and Technology Trends Keep the design simple to take advantage of the latest technology CMOS inverter and CMOS logic gates Delay Modeling and Gate Characterization Delay = Internal Delay + (Load Dependent Delay x Output Load) Clocking Methodology and Timing Considerations Simplest clocking methodology - All storage elements use the SAME clock edge Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time cs 152 Lec3delay37 @UCB Fall 1997

To Get More Information A Classic Book that Started it All: Carver Mead and Lynn Conway, Introduction to VLSI Systems, Addison-Wesley Publishing Company, October 1980 A Good VLSI Circuit Design Book Lance Glasser & Daniel Dobberpuhl, The Design and Analysis of VLSI Circuits, Addison-Wesley Publishing Company, 1985 - Mr Dobberpuhl is responsible for the DEC Alpha chip design A Book on How and Why Digital ICs Work: David Hodges & Horace Jackson, Analysis and Design of Digital Integrated Circuits, McGraw-Hill Book Company, 1983 cs 152 Lec3delay38 @UCB Fall 1997