CS61C : Machine Structures

Similar documents
Pipelining and ISA Design

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #20. Warehouse Scale Computer

CMSC 611: Advanced Computer Architecture

CS 61C: Great Ideas in Computer Architecture Pipelining. Anything can be represented as a number, i.e., data or instrucvons

CS 110 Computer Architecture Lecture 11: Pipelining

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Lecture 4: Introduction to Pipelining

CMSC 611: Advanced Computer Architecture

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

CS420/520 Computer Architecture I

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

CS429: Computer Organization and Architecture

EECE 321: Computer Organiza5on

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

Pipelined Processor Design

ABSTRACTT FFT FFT-' Proc. of SPIE Vol U-1

Investigation. Name: a About how long would the threaded rod need to be if the jack is to be stored with

THE UNIVERSITY OF NEW SOUTH WALES. School of Electrical Engineering & Telecommunications

where and are polynomials with real coefficients and of degrees m and n, respectively. Assume that and have no zero on axis.

Optimal Strategies in Jamming Resistant Uncoordinated Frequency Hopping Systems. Bingwen Zhang

An Efficient Control Approach for DC-DC Buck-Boost Converter

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Lecture 2: Review of Pipelines

IEEE Broadband Wireless Access Working Group < Modifications to the Feedback Methodologies in UL Sounding

Configurable M-factor VLSI DVB-S2 LDPC decoder architecture with optimized memory tiling design

WIRELESS SENSORS EMBEDDED IN CONCRETE

STACK DECODING OF LINEAR BLOCK CODES FOR DISCRETE MEMORYLESS CHANNEL USING TREE DIAGRAM

Low-Complexity Time-Domain SNR Estimation for OFDM Systems

Discussion #7 Example Problem This problem illustrates how Fourier series are helpful tools for analyzing electronic circuits. Often in electronic

Spectrum Sharing between Public Safety and Commercial Users in 4G-LTE

1 Performance and Cost

An Improved Implementation of Activity Based Costing Using Wireless Mesh Networks with MIMO Channels

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

GRADE 6 FLORIDA. Division WORKSHEETS

UC Berkeley CS61C : Machine Structures

INCREMENTAL REDUNDANCY (IR) SCHEMES FOR W-CDMA HS-DSCH

A multichannel Satellite Scheduling Algorithm

UC Berkeley CS61C : Machine Structures

Key Laboratory of Earthquake Engineering and Engineering Vibration, China Earthquake Administration, China

Feasibility of a triple mode, low SAR material coated antenna for mobile handsets

UC Berkeley CS61C : Machine Structures

Diagnosis method of radiated emission from battery management system for electric vehicle

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

Figure Geometry for Computing the Antenna Parameters.

On Reducing Blocking Probability in Cooperative Ad-hoc Networks

Design and Characterization of Conformal Microstrip Antennas Integrated into 3D Orthogonal Woven Fabrics

Single vs. Mul2- cycle MIPS. Single Clock Cycle Length

Assignment 0/0 2 /0 8 /0 16 Version: 3.2a Last Updated: 9/20/ :29 PM Binary Ones Comp Twos Comp

Journal of Applied Science and Agriculture

10! !. 3. Find the probability that a five-card poker hand (i.e. 5 cards out of a 52-card deck) will be:

GAMMA SHAPED MONOPOLE PATCH ANTENNA FOR TABLET PC

OPTIMUM MEDIUM ACCESS TECHNIQUE FOR NEXT GENERATION WIRELESS SYSTEMS

Optimal Design of Smart Mobile Terminal Antennas for Wireless Communication and Computing Systems

Computer Hardware. Pipeline

Design and Implementation of 4 - QAM VLSI Architecture for OFDM Communication

Statement of Works Data Template Version: 4.0 Date:

a b y UC Berkeley CS61C : Machine Structures Hello Helo,world!

QoE Enhancement of Audio Video IP Transmission with IEEE e EDCA in Mobile Ad Hoc Networks

HYBRID FUZZY PD CONTROL OF TEMPERATURE OF COLD STORAGE WITH PLC

ScienceDirect. Simplified Generation of Electromagnetic Field within EMC Immunity Test Area

Experiments with the HoloEye LCD spatial light modulator

School of Electrical and Computer Engineering, Cornell University. ECE 303: Electromagnetic Fields and Waves. Fall 2007

Derangements. Brian Conrey and Tom Davis and March 23, 2000

Demosaicking using Adaptive Bilateral Filters

ONE-WAY RADAR EQUATION / RF PROPAGATION

LECTURE 8. Pipelining: Datapath and Control

IAS 2.4. Year 12 Mathematics. Contents. Trigonometric Relationships. ulake Ltd. Robert Lakeland & Carl Nugent

A Transmission Scheme for Continuous ARQ Protocols over Underwater Acoustic Channels

A New Buck-Boost DC/DC Converter of High Efficiency by Soft Switching Technique

Analysis of Occurrence of Digit 0 in Natural Numbers Less Than 10 n

Probabilistic Spectrum Assignment for QoS-constrained Cognitive Radios with Parallel Transmission Capability

Performance Analysis of Z-Source Inverter Considering Inductor Resistance

Efficient Power Control for Broadcast in Wireless Communication Systems

Chapter 5: Trigonometric Functions of Angles

NICKEL RELEASE REGULATIONS, EN 1811:2011 WHAT S NEW?

Analysis of a Fractal Microstrip Patch Antenna

f o r o l d e r g i r l s Discover Northwest Indiana I m coming home, it s plain to see. I still got Indiana soul in me. tofrom

ECEN326: Electronic Circuits Fall 2017

Gas Tube Arresters. Certifications, Device Selection Purpose, Operation, Installation Part Number Construction, Part Marking. General Information

CCSDS Coding&Synchronization Working Group March Washington DC, USA SLS-C&S_08-CNES02

Hard machining of steel grades up to 65 HRC. High-efficiency carbide cutters with ultra-high performance in hard machining applications

CHEVY TH350/700R AUTO TRANSMISSION TO JEEP 4.0L, ENGINE BLOCKS NEW STYLE

Wireless Communication (Subject Code: 7EC3)

B O S L E Y W A S H DETENTION BASIN

Realistic Simulation of a Wireless Signal Propagation in an Urban Environment

Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

Minimizing Ringing and Crosstalk

Controller Design of Discrete Systems by Order Reduction Technique Employing Differential Evolution Optimization Algorithm

Design of FIR Filter using Filter Response Masking Technique

The reliability of wireless backhaul mesh networks

Near-field Computation and. Uncertainty Estimation using Basic. Cylindrical-Spherical Formulae

Off-line Bangla Signature Verification: An Empirical Study

On Performance of SCH OFDMA CDM in Frequency Selective Indoor Environment

Computer Architecture

Transcription:

inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue 29 Intoduction to Pipelined Execution Lectue PSOE Dan Gacia www.cs.bekeley.edu/~ddgacia Bionic Eyes let blind see! Johns Hopkins eseaches have announced they have invented a bionic eye with a compute chip on the back of the eye and a small wieless video camea in a pai of glasses. Face ecognition? Not yet, but soon! news.bbc.co.uk/1/hi/health/4411591.stm CS 61C L30 Intoduction to Pipelined Execution (1)

Review: Single cycle 5 steps datapath to design a pocesso 1. Analyze instuction set => datapath equiements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the equiements 4. Analyze implementation of each instuction to detemine setting of contol points that effects the egiste tansfe. 5. Assemble the contol logic Contol is the had pat MIPS makes that easie Instuctions same size Souce egistes always in same place Immediates same size, location Opeations always on egistes/immediates CS 61C L30 Intoduction to Pipelined Execution (2) Pocesso Contol Datapath Memoy Input Output

Review Datapath (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams. Contol instucts datapath on what to do next. Datapath needs: access to stoage (geneal pupose egistes and memoy) computational ability () helpe hadwae (local egistes and PC) CS 61C L30 Intoduction to Pipelined Execution (3)

Review Datapath (2/3) Five stages of datapath (executing an instuction): 1. Instuction Fetch (Incement PC) 2. Instuction Decode (Read Registes) 3. (Computation) 4. Memoy Access 5. Wite to Registes ALL instuctions must go though ALL five stages. CS 61C L30 Intoduction to Pipelined Execution (4)

Review Datapath (3/3) PC instuction memoy d s t egistes Data memoy +4 imm 1. Instuction Fetch 2. Decode/ Registe Read 3. Execute 4. Memoy 5. Wite Back CS 61C L30 Intoduction to Pipelined Execution (5)

Gotta Do Laundy Ann, Bian, Cathy, Dave each have one load of clothes to wash, dy, fold, and put away A B C D Washe takes 30 minutes Dye takes 30 minutes Folde takes 30 minutes Stashe takes 30 minutes to put clothes into dawes CS 61C L30 Intoduction to Pipelined Execution (6)

Sequential Laundy 6 PM 7 8 9 10 11 12 1 2 AM T a s k O d e A B C D 3030 30303030 30 303030 30 303030 30 30 Time Sequential laundy takes 8 hous fo 4 loads CS 61C L30 Intoduction to Pipelined Execution (7)

Pipelined Laundy 12 2 AM 6 PM 7 8 9 10 11 1 T a s k O d e A B C D 3030 3030303030 Time Pipelined laundy takes 3.5 hous fo 4 loads! CS 61C L30 Intoduction to Pipelined Execution (8)

Geneal Definitions Latency: time to completely execute a cetain task fo example, time to ead a secto fom disk is disk access time o disk latency Thoughput: amount of wok that can be done ove a peiod of time CS 61C L30 Intoduction to Pipelined Execution (9)

T a s k O d e Pipelining Lessons (1/2) Pipelining doesn t help 6 PM 7 8 9 A B C D Time 3030 3030303030 latency of single task, it helps thoughput of entie wokload Multiple tasks opeating simultaneously using diffeent esouces Potential speedup = Numbe pipe stages Time to fill pipeline and time to dain it educes speedup: 2.3X v. 4X in this example CS 61C L30 Intoduction to Pipelined Execution (10)

T a s k O d e Pipelining Lessons (2/2) Suppose new 6 PM 7 8 9 Washe takes 20 minutes, new Time Stashe takes 20 3030 3030303030 minutes. How A much faste is pipeline? B C D Pipeline ate limited by slowest pipeline stage Unbalanced lengths of pipe stages also educes speedup CS 61C L30 Intoduction to Pipelined Execution (11)

Steps in Executing MIPS 1) IFetch: Fetch Instuction, Incement PC 2) Decode Instuction, Read Registes 3) Execute: Mem-ef: Calculate Addess Aith-log: Pefom Opeation 4) Memoy: Load: Stoe: Read Data fom Memoy Wite Data to Memoy 5) Wite Back: Wite Data to Registe CS 61C L30 Intoduction to Pipelined Execution (12)

Pipelined Execution Repesentation Time IFtch Dcd IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Exec Mem WB Evey instuction must take same numbe of steps, also called pipeline stages, so some will go idle sometimes CS 61C L30 Intoduction to Pipelined Execution (13)

Review: Datapath fo MIPS PC instuction memoy d s t egistes Data memoy +4 imm 1. Instuction Fetch 2. Decode/ Registe Read 3. Execute 4. Memoy5. Wite Back Use datapath figue to epesent pipeline IFtch Dcd Exec Mem WB I$ Reg D$ Reg CS 61C L30 Intoduction to Pipelined Execution (14)

I n s t. O d e Gaphical Pipeline Repesentation (In Reg, ight half highlight ead, left half wite) Time (clock cycles) Load Add Stoe Sub O I$ Reg I$ Reg I$ CS 61C L30 Intoduction to Pipelined Execution (15) D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg Reg D$ Reg D$ Reg

Example Suppose 2 ns fo memoy access, 2 ns fo opeation, and 1 ns fo egiste file ead o wite; compute inst ate Nonpipelined Execution: lw : IF + Read Reg + + Memoy + Wite Reg = 2 + 1 + 2 + 2 + 1 = 8 ns add: IF + Read Reg + + Wite Reg = 2 + 1 + 2 + 1 = 6 ns Pipelined Execution: Max(IF,Read Reg,,Memoy,Wite Reg) = 2 ns CS 61C L30 Intoduction to Pipelined Execution (16)

Pipeline Hazad: Matching socks in late load 12 2 AM 6 PM 7 8 9 10 11 1 T a s k O d e A B C D E F 3030 303030 3030 bubble Time A depends on D; stall since folde tied up CS 61C L30 Intoduction to Pipelined Execution (17)

Administivi a Any administation? CS 61C L30 Intoduction to Pipelined Execution (18)

Poblems fo Computes Limits to pipelining: Hazads pevent next instuction fom executing duing its designated clock cycle Stuctual hazads: HW cannot suppot this combination of instuctions (single peson to fold and put clothes away) Contol hazads: Pipelining of banches & othe instuctions stall the pipeline until the hazad; bubbles in the pipeline Data hazads: Instuction depends on esult of pio instuction still in the pipeline (missing sock) CS 61C L30 Intoduction to Pipelined Execution (19)

Stuctual Hazad #1: Single Memoy (1/2) I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg Read same memoy twice in same clock cycle CS 61C L30 Intoduction to Pipelined Execution (20)

Stuctual Hazad #1: Single Memoy (2/2) Solution: infeasible and inefficient to ceate second memoy (We ll lean about this moe next week) so simulate this by having two Level 1 Caches (a tempoay smalle [of usually most ecently used] copy of memoy) have both an L1 Instuction Cache and an L1 Data Cache need moe complex hadwae to contol when both caches miss CS 61C L30 Intoduction to Pipelined Execution (21)

I n s t. Stuctual Hazad #2: Registes (1/2) sw Inst 1 Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg O d e Inst 2 Inst 3 Inst 4 I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg Can t ead and wite to egistes simultaneously CS 61C L30 Intoduction to Pipelined Execution (22)

Stuctual Hazad #2: Registes (2/2) Fact: Registe access is VERY fast: takes less than half the time of stage Solution: intoduce convention always Wite to Registes duing fist half of each clock cycle always Read fom Registes duing second half of each clock cycle Result: can pefom Read and Wite duing same clock cycle CS 61C L30 Intoduction to Pipelined Execution (23)

Pee Instuction A. Thanks to pipelining, I have educed the time it took me to wash my shit. B. Longe pipelines ae always a win (since less wok pe stage & a faste clock). C. We can ely on compiles to help us avoid data hazads by eodeing insts. CS 61C L30 Intoduction to Pipelined Execution (24) ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

Pee Instuction Answe A. Thoughput bette, not execution time B. longe pipelines do usually mean faste clock, but banches cause poblems! C. they happen too often & delay too long. Fowading! (e.g, Mem ) F A L S E F A L S E A. Thanks to pipelining, I have educed the time it took me to wash my shit. B. Longe pipelines ae always a win (since less wok pe stage & a faste clock). C. We can ely on compiles to help us avoid data hazads by eodeing insts. F A L S E CS 61C L30 Intoduction to Pipelined Execution (25) ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

Things to Remembe Optimal Pipeline Each stage is executing pat of an instuction each clock cycle. One instuction finishes duing each clock cycle. On aveage, execute fa moe quickly. What makes this wok? Similaities between instuctions allow us to use same stages fo all instuctions (geneally). Each stage takes about the same amount of time as all othes: little wasted time. CS 61C L30 Intoduction to Pipelined Execution (26)