CS61C : Machine Structures

Similar documents
CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

CS61C : Machine Structures

Pipelining and ISA Design

CMSC 611: Advanced Computer Architecture

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #20. Warehouse Scale Computer

CS 110 Computer Architecture Lecture 11: Pipelining

CS 61C: Great Ideas in Computer Architecture Pipelining. Anything can be represented as a number, i.e., data or instrucvons

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

EECE 321: Computer Organiza5on

Pipelined Processor Design

LECTURE 8. Pipelining: Datapath and Control

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

Instruction Level Parallelism. Data Dependence Static Scheduling

Single vs. Mul2- cycle MIPS. Single Clock Cycle Length

CMSC 611: Advanced Computer Architecture

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

Lecture 4: Introduction to Pipelining

Low-Complexity Time-Domain SNR Estimation for OFDM Systems

ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

Discussion #7 Example Problem This problem illustrates how Fourier series are helpful tools for analyzing electronic circuits. Often in electronic

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

RISC Design: Pipelining

where and are polynomials with real coefficients and of degrees m and n, respectively. Assume that and have no zero on axis.

10! !. 3. Find the probability that a five-card poker hand (i.e. 5 cards out of a 52-card deck) will be:

Lecture 2: Review of Pipelines

GRADE 6 FLORIDA. Division WORKSHEETS

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

ECEN326: Electronic Circuits Fall 2017

ABSTRACTT FFT FFT-' Proc. of SPIE Vol U-1

N2-1. The Voltage Source. V = ε ri. The Current Source

Proposal of Circuit Breaker Type Disconnector for Surge Protective Device

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

Design of FIR Filter using Filter Response Masking Technique

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Computer Architecture

RISC Central Processing Unit

CS420/520 Computer Architecture I

Sliding Mode Control for Half-Wave Zero Current Switching Quasi-Resonant Buck Converter

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Figure Geometry for Computing the Antenna Parameters.

1 Performance and Cost

Analysis of Occurrence of Digit 0 in Natural Numbers Less Than 10 n

Investigation. Name: a About how long would the threaded rod need to be if the jack is to be stored with

STACK DECODING OF LINEAR BLOCK CODES FOR DISCRETE MEMORYLESS CHANNEL USING TREE DIAGRAM

Derangements. Brian Conrey and Tom Davis and March 23, 2000

Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

Minimizing Ringing and Crosstalk

Short-Circuit Fault Protection Strategy of Parallel Three-phase Inverters

A multichannel Satellite Scheduling Algorithm

IEEE Broadband Wireless Access Working Group < Modifications to the Feedback Methodologies in UL Sounding

SERVO CONTROL OF RF CAVITIES UNDER BEAM LOADING


NICKEL RELEASE REGULATIONS, EN 1811:2011 WHAT S NEW?

Chapter 9 Cascode Stages and Current Mirrors

Chapter 2 Instrumentation for Analytical Electron Microscopy Lecture 6. Chapter 2 CHEM 793, 2011 Fall

Statement of Works Data Template Version: 4.0 Date:

Week 5. Lecture Quiz 1. Forces of Friction, cont. Forces of Friction. Forces of Friction, final. Static Friction

Selected Solutions to Problem-Set #3 COE 608: Computer Organization and Architecture Single Cycle Datapath and Control

Real-time Self Compensating AC/DC Digitally Controlled Power Supply

4 Trigonometric and Inverse Trigonometric Functions

A Transmission Scheme for Continuous ARQ Protocols over Underwater Acoustic Channels

ECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution

Instruction Level Parallelism Part II - Scoreboard

Small Loop Antenna and Duality Theorem

School of Electrical and Computer Engineering, Cornell University. ECE 303: Electromagnetic Fields and Waves. Fall 2007

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

A New Buck-Boost DC/DC Converter of High Efficiency by Soft Switching Technique

Design and Characterization of Conformal Microstrip Antennas Integrated into 3D Orthogonal Woven Fabrics

An Efficient Control Approach for DC-DC Buck-Boost Converter

For Teachers and Self Learning. train the trainer. for the DEDICATED ONLY. by ryan mcclelland

IAS 2.4. Year 12 Mathematics. Contents. Trigonometric Relationships. ulake Ltd. Robert Lakeland & Carl Nugent

EECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://

An Ultra Low Power Segmented Digital-to-Analog Converter

Design of an LLC Resonant Converter Using Genetic Algorithm

Key Laboratory of Earthquake Engineering and Engineering Vibration, China Earthquake Administration, China

You are Here! Processor Design Process. Agenda. Agenda 10/25/12. CS 61C: Great Ideas in Computer Architecture Single Cycle MIPS CPU Part II

Analysis of a Fractal Microstrip Patch Antenna

CZ3001 ADVANCED COMPUTER ARCHITECTURE

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Synopsis of Technical Report: Designing and Specifying Aspheres for Manufacturability By Jay Kumler

Variance? which variance? R squared effect size measures in simple mediation models

The Periodic Ambiguity Function Its Validity and Value

Dynamic Scheduling I

Lecture 23. OUTLINE BJT Differential Amplifiers (cont d) Reading: Chapter

Development of Corona Ozonizer Using High Voltage Controlling of Produce Ozone Gas for Cleaning in Cage

Configurable M-factor VLSI DVB-S2 LDPC decoder architecture with optimized memory tiling design

Optimal Eccentricity of a Low Permittivity Integrated Lens for a High-Gain Beam-Steering Antenna

Missing Piece Issue and Upload Strategies in Flashcrowds and P2P-assisted Filesharing

Novel Analytic Technique for PID and PIDA Controller Design. Seul Jung and Richard C. Dorf. Department of Electrical and Computer Engineering

Hard machining of steel grades up to 65 HRC. High-efficiency carbide cutters with ultra-high performance in hard machining applications

CMP 301B Computer Architecture. Appendix C

Experimental Investigation of Influence on Non-destructive Testing by Form of Eddy Current Sensor Probe

Feasibility of a triple mode, low SAR material coated antenna for mobile handsets

Transcription:

Election Data is now available Puple Ameica! inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue 31 Pipelined Execution, pat II 2004-11-10 Lectue PSOE Dan Gacia www.cs.bekeley.edu/~ddgacia The Incedibles! www.pinceton.edu/~vdb/java/election2004/ www.usatoday.com/news/politicselections/vote2004/countymap.htm CS61C L31 Pipelined Execution, pat II (1)

Review: Pipeline (1/2) Optimal Pipeline Each stage is executing pat of an instuction each clock cycle. One inst. finishes duing each clock cycle. On aveage, execute fa moe quickly. What makes this wok? Similaities between instuctions allow us to use same stages fo all instuctions (geneally). Each stage takes about the same amount of time as all othes: little wasted time. CS61C L31 Pipelined Execution, pat II (2)

Review: Pipeline (2/2) Pipelining is a BIG IDEA widely used concept What makes it less than pefect? Stuctual hazads: suppose we had only one cache? Need moe HW esouces Contol hazads: need to woy about banch instuctions? Delayed banch Data hazads: an instuction depends on a pevious instuction? CS61C L31 Pipelined Execution, pat II (3)

Contol Hazad: Banching (1/7) I n s t. O d e beq Inst 1 Inst 2 Inst 3 Inst 4 Time (clock cycles) I$ Whee do we do the compae fo the banch? Reg D$ Reg CS61C L31 Pipelined Execution, pat II (4)

Contol Hazad: Banching (2/7) We put banch decision-making hadwae in stage theefoe two moe instuctions afte the banch will always be fetched, whethe o not the banch is taken Desied functionality of a banch if we do not take the banch, don t waste any time and continue executing nomally if we take the banch, don t execute any instuctions afte the banch, just go to the desied label CS61C L31 Pipelined Execution, pat II (5)

Contol Hazad: Banching (3/7) Initial Solution: Stall until decision is made inset no-op instuctions: those that accomplish nothing, just take time Dawback: banches take 3 clock cycles each (assuming compaato is put in stage) CS61C L31 Pipelined Execution, pat II (6)

Contol Hazad: Banching (4/7) Optimization #1: move asynchonous compaato up to Stage 2 as soon as instuction is decoded (Opcode identifies is as a banch), immediately make a decision and set the value of the PC (if necessay) Benefit: since banch is complete in Stage 2, only one unnecessay instuction is fetched, so only one no-op is needed Side Note: This means that banches ae idle in Stages 3, 4 and 5. CS61C L31 Pipelined Execution, pat II (7)

Contol Hazad: Banching (5/7) I n s t. O d e Inset a single no-op (bubble) add beq lw Time (clock cycles) bub ble I$ Reg D$ Reg Impact: 2 clock cycles pe banch instuction slow CS61C L31 Pipelined Execution, pat II (8)

Contol Hazad: Banching (6/7) Optimization #2: Redefine banches Old definition: if we take the banch, none of the instuctions afte the banch get executed by accident New definition: whethe o not we take the banch, the single instuction immediately following the banch gets executed (called the banch-delay slot) The tem Delayed Banch means we always execute inst afte banch CS61C L31 Pipelined Execution, pat II (9)

Contol Hazad: Banching (7/7) Notes on Banch-Delay Slot Wost-Case Scenaio: can always put a no-op in the banch-delay slot Bette Case: can find an instuction peceding the banch which can be placed in the banch-delay slot without affecting flow of the pogam - e-odeing instuctions is a common method of speeding up pogams - compile must be vey smat in ode to find instuctions to do this - usually can find such an instuction at least 50% of the time - Jumps also have a delay slot CS61C L31 Pipelined Execution, pat II (10)

Example: Nondelayed vs. Delayed Banch Nondelayed Banch o $8, $9,$10 add $1,$2,$3 sub $4, $5,$6 beq $1, $4, Exit xo $10, $1,$11 Delayed Banch add $1,$2,$3 sub $4, $5,$6 beq $1, $4, Exit o $8, $9,$10 xo $10, $1,$11 Exit: Exit: CS61C L31 Pipelined Execution, pat II (11)

Data Hazads (1/2) Conside the following sequence of instuctions add $t0, $t1, $t2 sub $t4, $t0,$t3 and $t5, $t0,$t6 o $t7, $t0,$t8 xo $t9, $t0,$t10 CS61C L31 Pipelined Execution, pat II (12)

Data Hazads (2/2) Dependencies backwads in time ae hazads I Time (clock cycles) n IF ID/RF EX MEM WB s add $t0,$t1,$t2 t. sub $t4,$t0,$t3 O d e and $t5,$t0,$t6 o $t7,$t0,$t8 xo $t9,$t0,$t10 I$ Reg D$ Reg CS61C L31 Pipelined Execution, pat II (13)

Data Hazad Solution: Fowading Fowad esult fom one stage to anothe add $t0,$t1,$t2 sub $t4,$t0,$t3 and $t5,$t0,$t6 IF ID/RF EX MEM WB o $t7,$t0,$t8 xo $t9,$t0,$t10 I$ Reg D$ Reg o hazad solved by egiste hadwae CS61C L31 Pipelined Execution, pat II (14)

Data Hazad: Loads (1/4) Dependencies backwads in time ae hazads lw $t0,0($t1) IF ID/RF EX MEM WB sub $t3,$t0,$t2 Can t solve with fowading Must stall instuction dependent on load, then fowad (moe hadwae) CS61C L31 Pipelined Execution, pat II (15)

Data Hazad: Loads (2/4) Hadwae must stall pipeline Called intelock lw $t0, 0($t1) sub $t3,$t0,$t2 and $t5,$t0,$t4 IF ID/RF EX MEM WB I$ bub Reg D$ Reg ble I$ bub Reg D$ Reg ble bub ble o $t7,$t0,$t6 I$ Reg D$ CS61C L31 Pipelined Execution, pat II (16)

Data Hazad: Loads (3/4) Instuction slot afte a load is called load delay slot If that instuction uses the esult of the load, then the hadwae intelock will stall it fo one cycle. If the compile puts an unelated instuction in that slot, then no stall Letting the hadwae stall the instuction in the delay slot is equivalent to putting a nop in the slot (except the latte uses moe code space) CS61C L31 Pipelined Execution, pat II (17)

Data Hazad: Loads (4/4) Stall is equivalent to nop lw $t0, 0($t1) nop bub ble bub ble bub ble bub ble bub ble sub $t3,$t0,$t2 and $t5,$t0,$t4 o $t7,$t0,$t6 I$ Reg D$ CS61C L31 Pipelined Execution, pat II (18)

Histoical Tivia Fist MIPS design did not intelock and stall on load-use data hazad Real eason fo name behind MIPS: Micopocesso without Intelocked Pipeline Stages Wod Play on aconym fo Millions of Instuctions Pe Second, also called MIPS CS61C L31 Pipelined Execution, pat II (19)

Administivia No lab this week (wed, thu o fi) Due to Veteans Day holiday on Thusday. The lab is posted as a take-home lab; show TA you esults in the following lab. Gade feezing update : though HW4 You have until next Wed to equest egades on HW3,HW4 & P1 Back to 61C Advanced Pipelining! Out-of-ode Execution Supescala Execution CS61C L31 Pipelined Execution, pat II (20)

Review Pipeline Hazad: Stall is dependency T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 30 30 bubble Time A depends on D; stall since folde tied up CS61C L31 Pipelined Execution, pat II (21)

Out-of-Ode Laundy: Don t Wait T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 30 30 bubble A depends on D; est continue; need moe esouces to allow out-of-ode CS61C L31 Pipelined Execution, pat II (22) Time

Supescala Laundy: Paallel pe stage T a s k O d e 12 2 AM 6 PM 7 8 9 10 11 1 A B C D E F 3030 30 30 30 Moe esouces, HW to match mix of paallel tasks? CS61C L31 Pipelined Execution, pat II (23) Time (light clothing) (dak clothing) (vey dity clothing) (light clothing) (dak clothing) (vey dity clothing)

Supescala Laundy: Mismatch Mix 12 2 AM 6 PM 7 8 9 10 11 1 T a s k A Time 3030 30 30 30 30 30 (light clothing) O d e B C D (light clothing) (dak clothing) (light clothing) Task mix undeutilizes exta esouces CS61C L31 Pipelined Execution, pat II (24)

Pee Instuction Assume 1 inst/clock, delayed banch, 5 stage pipeline, fowading, intelock on unesolved load hazads (afte 10 3 loops, so pipeline full) Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addiu $s1, $s1, -4 bne $s1, $zeo, Loop nop How many pipeline stages (clock cycles) pe loop CS61C iteation L31 Pipelined Execution, to pat execute II (25) this code? 1 2 3 4 5 6 7 8 9 10

Pee Instuction Answe Assume 1 inst/clock, delayed banch, 5 stage pipeline, fowading, intelock on unesolved load hazads. 10 3 iteations, so pipeline full. 2. (data hazad so stall) Loop: 1. lw $t0, 0($s1) 3. addu $t0, $t0, $s2 4. sw $t0, 0($s1) 5. addiu $s1, $s1, -4 6. bne $s1, $zeo, Loop 7. nop (delayed banch so exec. nop) How many pipeline stages (clock cycles) pe loop iteation to execute this code? 1 2 3 4 5 6 7 8 9 10 CS61C L31 Pipelined Execution, pat II (26)

And in Conclusion.. Pipeline challenge is hazads Fowading helps w/many data hazads Delayed banch helps with contol hazad in 5 stage pipeline Moe aggessive pefomance: Supescala Out-of-ode execution CS61C L31 Pipelined Execution, pat II (27)