CMSC 611: Advanced Computer Architecture

Similar documents
Pipelining and ISA Design

Lecture 4: Introduction to Pipelining

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

CS61C : Machine Structures

CS61C : Machine Structures

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #20. Warehouse Scale Computer

Lecture 2: Review of Pipelines

CMSC 611: Advanced Computer Architecture

CS 61C: Great Ideas in Computer Architecture Pipelining. Anything can be represented as a number, i.e., data or instrucvons

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

EECE 321: Computer Organiza5on

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

CS 110 Computer Architecture Lecture 11: Pipelining

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

CISC 662 Graduate Computer Architecture. Lecture 9 - Scoreboard

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Pipelined Processor Design

Instruction Level Parallelism. Data Dependence Static Scheduling

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

CS420/520 Computer Architecture I

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

ECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution

GRADE 6 FLORIDA. Division WORKSHEETS

COSC4201. Scoreboard

CMP 301B Computer Architecture. Appendix C

RISC Design: Pipelining

Instruction Level Parallelism Part II - Scoreboard

LECTURE 8. Pipelining: Datapath and Control

Computer Architecture

ABSTRACTT FFT FFT-' Proc. of SPIE Vol U-1

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Minimizing Ringing and Crosstalk

Experiments with the HoloEye LCD spatial light modulator

1 Performance and Cost

EECS 470 Lecture 5. Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont

EECS 470. Lecture 9. MIPS R10000 Case Study. Fall 2018 Jon Beaumont

CS521 CSE IITG 11/23/2012

CHEVY TH350/700R AUTO TRANSMISSION TO JEEP 4.0L, ENGINE BLOCKS NEW STYLE

CS429: Computer Organization and Architecture

An Improved Implementation of Activity Based Costing Using Wireless Mesh Networks with MIMO Channels

Problem: hazards delay instruction completion & increase the CPI. Compiler scheduling (static scheduling) reduces impact of hazards

Out-of-Order Execution. Register Renaming. Nima Honarmand

Analysis of Dielectric Properties On Agricultural Waste for Microwave Communication Application

Statement of Works Data Template Version: 4.0 Date:

Experimental Investigation of Influence on Non-destructive Testing by Form of Eddy Current Sensor Probe

On the Rules of Low-Power Design

where and are polynomials with real coefficients and of degrees m and n, respectively. Assume that and have no zero on axis.

Dynamic Scheduling I

CSEN 601: Computer System Architecture Summer 2014

U. Wisconsin CS/ECE 752 Advanced Computer Architecture I

Tomasolu s s Algorithm

Instruction Level Parallelism III: Dynamic Scheduling

ECEN326: Electronic Circuits Fall 2017

10! !. 3. Find the probability that a five-card poker hand (i.e. 5 cards out of a 52-card deck) will be:

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Assignment 0/0 2 /0 8 /0 16 Version: 3.2a Last Updated: 9/20/ :29 PM Binary Ones Comp Twos Comp

A multichannel Satellite Scheduling Algorithm

Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

STACK DECODING OF LINEAR BLOCK CODES FOR DISCRETE MEMORYLESS CHANNEL USING TREE DIAGRAM

OPTIMUM MEDIUM ACCESS TECHNIQUE FOR NEXT GENERATION WIRELESS SYSTEMS

ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

NICKEL RELEASE REGULATIONS, EN 1811:2011 WHAT S NEW?

WIRELESS SENSORS EMBEDDED IN CONCRETE

Computer Hardware. Pipeline

Design and Implementation of 4 - QAM VLSI Architecture for OFDM Communication

Lecture 23. OUTLINE BJT Differential Amplifiers (cont d) Reading: Chapter

Exercise 1 (MC Question)

Development of Corona Ozonizer Using High Voltage Controlling of Produce Ozone Gas for Cleaning in Cage

Lecture 8-1 Vector Processors 2 A. Sohn

Discussion #7 Example Problem This problem illustrates how Fourier series are helpful tools for analyzing electronic circuits. Often in electronic

An Efficient Control Approach for DC-DC Buck-Boost Converter

Dynamic Scheduling II

EECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://

E /11/2018 AA

Real-time Self Compensating AC/DC Digitally Controlled Power Supply

Design of an LLC Resonant Converter Using Genetic Algorithm

Diagnosis method of radiated emission from battery management system for electric vehicle

Real-Time Fault Diagnostics for a Permanent Magnet Synchronous Motor Drive for Aerospace Applications

CHAN Chin Han 1*, Max ONG Chong Hup 2, TAN Winie 1, Mohamad Kamal HARUN 1, LEE Jia Yin 3

Low-Complexity Time-Domain SNR Estimation for OFDM Systems

IAS 2.4. Year 12 Mathematics. Contents. Trigonometric Relationships. ulake Ltd. Robert Lakeland & Carl Nugent

Novel Analytic Technique for PID and PIDA Controller Design. Seul Jung and Richard C. Dorf. Department of Electrical and Computer Engineering

Design of composite digital filter with least square method parameter identification

1550 nm WDM read-out of volume holographic memory

Gas Tube Arresters. Certifications, Device Selection Purpose, Operation, Installation Part Number Construction, Part Marking. General Information

Replacement Part Number List

6.1 Reciprocal, Quotient, and Pythagorean Identities

CSE502: Computer Architecture CSE 502: Computer Architecture

Available online at ScienceDirect. Procedia Engineering 100 (2015 )

Single vs. Mul2- cycle MIPS. Single Clock Cycle Length

Transcription:

CMSC 611: Advanced Compute Achitectue Pipelining Some mateial adapted fom Mohamed Younis, UMBC CMSC 611 Sp 2003 couse slides Some mateial adapted fom Hennessy & Patteson / 2003 Elsevie Science

Pipeline Hazads Cases that affect instuction execution semantics and thus need to be detected and coected Hazads types Stuctual hazad: attempt to use a esouce two diffeent ways at same time Single memoy fo instuction and data Data hazad: attempt to use item befoe it is eady Instuction depends on esult of pio instuction still in the pipeline Contol hazad: attempt to make a decision befoe condition is evaluated banch instuctions Hazads can always be esolved by waiting

Detecting and Resolving Stuctual Hazad Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Stall Inst 3 Bubble Bubble Bubble Bubble Bubble

Stalls & Pipeline Pefomance Aveage instuction time unpipelined Pipelining Speedup = Aveage instuction time pipelined CPI unpipelined = CPI pipelined " Clock cycle unpipelined Clock cycle pipelined Ideal CPI pipelined = 1 CPI pipelined = Ideal CPI+ Pipeline stall cycles pe instuction = 1+ Pipeline stall cycles pe instuction CPI unpipelined Clock cycle unpipelined Speedup = " 1 + Pipeline stall cycles pe instuction Clock cycle pipelined Assuming all pipeline stages ae balanced Speedup = Pipeline depth 1 + Pipeline stall cycles pe instuction

Time (clock cycles) Data Hazads IF ID/RF EX MEM WB I n s t. add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11

Thee Geneic Data Hazads Read Afte Wite (RAW) Inst J ties to ead opeand befoe Inst I wites it I: add 1,2,3 J: sub 4,1,3 Caused by a Data Dependence (in compile nomenclatue). This hazad esults fom an actual need fo communication.

Thee Geneic Data Hazads Wite Afte Read (WAR) Inst J wites opeand befoe Inst I eads it I: sub 4,1,3 J: add 1,2,3 K: mul 6,1,7 Called an anti-dependence in compiles. This esults fom euse of the name 1. Can t happen in MIPS 5 stage pipeline because: All instuctions take 5 stages, and Reads ae always in stage 2, and Wites ae always in stage 5

Thee Geneic Data Hazads Wite Afte Wite (WAW) Inst J wites opeand befoe Inst I wites it. I: mul 1,4,3 J: add 1,2,3 K: sub 6,1,7 Called an output dependence in compiles This also esults fom the euse of name 1. Can t happen in MIPS 5 stage pipeline: All instuctions take 5 stages, and Wites ae always in stage 5 Do see WAR and WAW in moe complicated pipes

Fowading to Avoid Data Hazad Time (clock cycles) I n s t. add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11

HW Change fo Fowading NextPC istes ID/EX mux mux EX/MEM Data Memoy MEM/WR Immediate mux

Data Hazad Even with Fowading Time (clock cycles) I n s t. lw 1, 0(2) sub 4,1,6 O d e and 6,1,7 o 8,1,9

Resolving Load Hazads Adding hadwae? How? Whee? Detection? Compilation techniques? What is the cost of load delays?

Resolving the Load Data Hazad Time (clock cycles) I n s t. O d e lw 1, 0(2) sub 4,1,6 and 6,1,7 Bubble Bubble o 8,1,9 Bubble How is this diffeent fom the instuction issue stall?

Softwae Scheduling to Avoid Load Hazads Ty poducing fast code fo a = b + c; d = e f; assuming a, b, c, d,e, and f in memoy. Slow code: LW Rb,b Fast code: LW Rb,b LW Rc,c LW Rc,c LW Re,e ADD Ra,Rb,Rc ADD Ra,Rb,Rc SW a,ra LW Rf,f LW Re,e SW a,ra LW Rf,f SUB Rd,Re,Rf SW d,rd SUB Rd,Re,Rf SW d,rd

Instuction Set Connection What is exposed about this oganizational hazad in the instuction set? k cycle delay? bad, CPI is not pat of ISA k instuction slot delay load should not be followed by use of the value in the next k instuctions Nothing, but code can educe un-time delays MIPS did the tansfomation in the assemble