Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

Similar documents
ECE473 Computer Architecture and Organization. Pipeline: Introduction

Lecture 4: Introduction to Pipelining

CS 110 Computer Architecture Lecture 11: Pipelining

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

EECE 321: Computer Organiza5on

Pipelined Processor Design

CMSC 611: Advanced Computer Architecture

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

CS429: Computer Organization and Architecture

Computer Hardware. Pipeline

CS420/520 Computer Architecture I

LECTURE 8. Pipelining: Datapath and Control

7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

CSE 2021: Computer Organization

Instruction Level Parallelism. Data Dependence Static Scheduling

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Computer Architecture

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

RISC Central Processing Unit

CMP 301B Computer Architecture. Appendix C

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

RISC Design: Pipelining

ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Dynamic Scheduling I

EE 457 Homework 5 Redekopp Name: Score: / 100_

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

Pipelining and ISA Design

CS61C : Machine Structures

Dynamic Scheduling II

Department Computer Science and Engineering IIT Kanpur

COSC4201. Scoreboard

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

ECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution

CMSC 611: Advanced Computer Architecture

Single-Cycle CPU The following exercises are taken from Hennessy and Patterson, CO&D 2 nd, 3 rd, and 4 th Ed.

CISC 662 Graduate Computer Architecture. Lecture 9 - Scoreboard

Metrics How to improve performance? CPI MIPS Benchmarks CSC3501 S07 CSC3501 S07. Louisiana State University 4- Performance - 1

EECS 470 Lecture 5. Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont

Project 5: Optimizer Jason Ansel

Problem: hazards delay instruction completion & increase the CPI. Compiler scheduling (static scheduling) reduces impact of hazards

Instruction Level Parallelism Part II - Scoreboard

EECS 470. Tomasulo s Algorithm. Lecture 4 Winter 2018

CS61C : Machine Structures

Computer Architecture ( L), Fall 2017 HW 3: Branch handling and GPU SOLUTIONS

CSE502: Computer Architecture CSE 502: Computer Architecture

On the Rules of Low-Power Design

Best Instruction Per Cycle Formula >>>CLICK HERE<<<

Lecture 8-1 Vector Processors 2 A. Sohn

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Lecture 9: Clocking for High Performance Processors

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture Welcome to CSE 502

CSE502: Computer Architecture CSE 502: Computer Architecture

Computer Arithmetic (2)

DIGITAL DESIGN WITH SM CHARTS

CS61c: Introduction to Synchronous Digital Systems

EE382V-ICS: System-on-a-Chip (SoC) Design

Disclaimer. Primer. Agenda. previous work at the EIT Department, activities at Ericsson

Out-of-Order Execution. Register Renaming. Nima Honarmand

Digital Integrated CircuitDesign

Reading Material + Announcements

CS 6290 Evaluation & Metrics

CSEN 601: Computer System Architecture Summer 2014

Computer Architecture and Organization:

HIGH-PERFORMANCE HYBRID WAVE-PIPELINE SCHEME AS IT APPLIES TO ADDER MICRO-ARCHITECTURES

EECS 470. Lecture 9. MIPS R10000 Case Study. Fall 2018 Jon Beaumont

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

EECS 470 Lecture 8. P6 µarchitecture. Fall 2018 Jon Beaumont Core 2 Microarchitecture

FMP For More Practice

EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

Tomasolu s s Algorithm

ECOM 4311 Digital System Design using VHDL. Chapter 9 Sequential Circuit Design: Practice

Know your Algorithm! Architectural Trade-offs in the Implementation of a Viterbi Decoder. Matthias Kamuf,

CS302 Digital Logic Design Solved Objective Midterm Papers For Preparation of Midterm Exam

Multi-Channel FIR Filters

CS 61C: Great Ideas in Computer Architecture Finite State Machines, Functional Units

A LOW POWER DESIGN FOR ARITHMETIC AND LOGIC UNIT

Precise State Recovery. Out-of-Order Pipelines

CZ3001 ADVANCED COMPUTER ARCHITECTURE

Selecting Telecommunication Test Equipment To Maximize Throughput and Accuracy. By Robert Green Keithley Instruments, Inc.

CS152 Computer Architecture and Engineering Lecture 3: ReviewTechnology & Delay Modeling. September 3, 1997

Parallel architectures Electronic Computers LM

Combinational Logic Circuits. Combinational Logic

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Selected Solutions to Problem-Set #3 COE 608: Computer Organization and Architecture Single Cycle Datapath and Control

CS4617 Computer Architecture

Learning Log Title: CHAPTER 2: ARITHMETIC STRATEGIES AND AREA. Date: Lesson: Chapter 2: Arithmetic Strategies and Area

Chapter 3 Digital Logic Structures

CS Computer Architecture Spring Lecture 04: Understanding Performance

EECS150 - Digital Design Lecture 19 CMOS Implementation Technologies. Recap and Outline

Datapath Components. Multipliers, Counters, Timers, Register Files

EECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://

[Krishna, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Transcription:

1! CSE 30321 Lecture 12 Introduction to Pipelining! CSE 30321 Lecture 12 Introduction to Pipelining! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 12" Introduction to Pipelining! 3! CSE 30321 Lecture 12 Introduction to Pipelining! Processor components! Multicore processors and programming! CSE 30321 Lecture 12 Introduction to Pipelining! Example: We have to build x cars...! Processor comparison! vs.!...each car takes 6 steps to build...! Build the frame" (~1 hour)" Build the body" (~1.25 hours)" Install interior" (~1.25 hours)" Put on axles, wheels" (~1 hour)" Paint" (~1.5 hours)" Roll out" (~1 hours)" Goal:" Describe the fundamental components required in a single core of a modern microprocessor as well as how they interact with each other, with main memory, and with external storage media." CSE 30321! Writing more! efficient code! The right HW for the right application! HLL code translation! 4!

CSE 30321 Lecture 12 Introduction to Pipelining! 5! Sequential Car Building... (a lot like multi-cycle)! Build the frame" (~ 1 hour)" Build the body" (~1.25 hours)" Install interior" (~1.25 hours)" Pipelined Car Building...! Put on axles, wheels" (~1 hour)" Paint" (~1.5 hours)" Roll out (~1 hours)" Total time: 7 Hours.! (~1 hour/stage)" 1 car done ~ every 1.5 hours! (like multi-cycle, limited by time of the longest stage)" T a s k O r d e r CSE 30321 Lecture 12 Introduction to Pipelining! 6! Pipelining Lessons (laundry example)! A B C D 6 PM 7 8 9 Time 30 40 40 40 40 20! Multiple tasks operating simultaneously!! Pipelining doesn#t help latency of single task, it helps throughput of entire workload!! Pipeline rate limited by slowest pipeline stage!! Potential speedup = Number pipe stages!! Unbalanced lengths of pipe stages reduces speedup!! Also, need time to fill and drain the pipeline.! CSE 30321 Lecture 12 Introduction to Pipelining! 7! Pipelining: Some terms!! If you#re doing laundry or implementing a µp, each stage where something is done called a pipe stage! CSE 30321 Lecture 12 Introduction to Pipelining! 8! On the board!! The math behind pipelining!! In laundry example, washer, dryer, and folding table are pipe stages; clothes enter at one end, exit other!! In a µp, instructions enter at one end and have been executed when they leave!! Throughput is how often stuff comes out of a pipeline! A!

CSE 30321 Lecture 12 Introduction to Pipelining! 9! Recap: Pipeline Math!! If times for all S stages are equal to T:!! Time for one initiation to complete still ST!! Time between 2 initiates = T not ST!! Initiations per second = 1/T! Time for N initiations to complete:!nt + (S-1)T! Throughput:!!!!Time per initiation = T + (S-1)T/N! T!!! Pipelining: Overlap multiple executions of same sequence!! Improves THROUGHPUT, not the time to perform a single operation! CSE 30321 Lecture 12 Introduction to Pipelining! 11! More technical detail!! Book#s approach to draw pipeline timing diagrams!! Time runs left-to-right, in units of stage time!! Each row below corresponds to distinct initiation!! Boundary b/t 2 column entries: pipeline register!! (i.e. hamper)!! Look at columns to see what stage is doing what! Latch! CSE 30321 Lecture 12 Introduction to Pipelining! 10! Recap: How much (ideal) speedup?! Unpipelined! Pipelined! Latch! delay for 1 piece of data = 4# + latch setup (assume small)! approximate delay for 1000 pieces of data = 4000#" delay for 1 piece of data = 4(# + latch setup)! approximate delay for 1000 pieces of data = 3# + 1000#" 4000! speedup for 1000 pieces of data =! 1003! = ~ 4! Ideal speedup = # of pipeline stages! Latch! Latch! CSE 30321 Lecture 12 Introduction to Pipelining! 12! The new look dataflow! Note: Some extra HW needed.! 0! 1! 2! 3! 4! 5! 6! Wash 1! Dry 1! Fold 1! Pack 1! Wash 2! Dry 2! Fold 2! Pack 2! Wash 3! Dry 3! Fold 3! Pack 3! Wash 4! Dry 4! Fold 4! Pack 4! Wash 5! Dry 5! Fold 5! Wash 6! Dry 6! Time for N initiations to complete:!nt + (S-1)T! Throughput:!!!!Time per initiation = T + (S-1)T/N! T!! Data must be stored from one stage to the next in pipeline registers/latches.! hold temporary values between clocks and needed info. for execution.!

CSE 30321 Lecture 12 Introduction to Pipelining! 13! Another way to look at it! Clock Number! Inst. #! 1! 2! 3! 4! 5! 6! 7! 8! Inst. i! IF! ID! EX! MEM! WB! Inst. i+1! IF! ID! EX! MEM! WB! Inst. i+2! IF! ID! EX! MEM! WB! Inst. i+3! IF! ID! EX! MEM! WB! Program execution order (in instructions)! IM! Reg! IM! Reg! IM! Reg! Time! IM! Reg! CSE 30321 Lecture 12 Introduction to Pipelining! 14! So, what about the details?!! In each cycle, new instruction fetched and begins 5 cycle execution!! In perfect world (pipeline) performance improved 5 times over!!! Now, let#s talk about overhead!! (i.e. what else do we have to worry about?)!! Must know what#s going on in every cycle of machine!! What if 2 instructions need same resource at same time?!! (LOTS more on this later)!! Separate instruction/data memories, multiple register ports, etc. help avoid this! CSE 30321 Lecture 12 Introduction to Pipelining! 15! Limits, limits, limits!! So, now that the ideal stuff is out of the way, let#s look at how a pipeline REALLY works!! Pipelines are slowed b/c of:!! Pipeline latency!! Imbalance of pipeline stages!! (Think: A chain is only as strong as its weakest link)!! Well, a pipeline is only as fast as its slowest stage!! Pipeline overhead (from where?)!! Register delay from pipe stage latches! CSE 30321 Lecture 12 Introduction to Pipelining! 16! Let#s look at some examples:!! Specifically:!! (1 instruction sequence with a problem)!! (2 instruction sequence)!

CSE 30321 Lecture 12 Introduction to Pipelining! 17! Executing Instructions in Pipelined Datapath!! Following charts describe 3 scenarios:! CSE 30321 Lecture 12 Introduction to Pipelining! 18! Load word: Cycle 1!! Processing of load word (lw) instruction!! Bug included in design (make SURE you understand the bug)!! Processing of lw!! Bug corrected (make SURE you understand the fix)!! Processing of lw followed in pipeline by sub!! (Sets the stage for discussion of HAZARDS and interinstruction dependencies)! CSE 30321 Lecture 12 Introduction to Pipelining! 19! Load Word: Cycle 2! CSE 30321 Lecture 12 Introduction to Pipelining! 20! Load Word: Cycle 3!

CSE 30321 Lecture 12 Introduction to Pipelining! 21! Load Word: Cycle 4! CSE 30321 Lecture 12 Introduction to Pipelining! 22! Load Word: Cycle 5! CSE 30321 Lecture 12 Introduction to Pipelining! 23! Load Word: Fixed Bug! CSE 30321 Lecture 12 Introduction to Pipelining! 24! A 2 instruction sequence!! Examine multiple-cycle & single-cycle diagrams for a sequence of 2 independent instructions!! (i.e. no common registers b/t them)!! lw!$10, 9($1)!! sub!$11, $2, $3!

CSE 30321 Lecture 12 Introduction to Pipelining! 25! Single-cycle diagrams: cycle 1! CSE 30321 Lecture 12 Introduction to Pipelining! 26! Single-cycle diagrams: cycle 2! CSE 30321 Lecture 12 Introduction to Pipelining! 27! Single-cycle diagrams: cycle 3! CSE 30321 Lecture 12 Introduction to Pipelining! 28! Single-cycle diagrams: cycle 4!

CSE 30321 Lecture 12 Introduction to Pipelining! 29! Single-cycle diagrams: cycle 5! CSE 30321 Lecture 12 Introduction to Pipelining! 30! Single-cycle diagrams: cycle 6! CSE 30321 Lecture 12 Introduction to Pipelining! 31! CSE 30321 Lecture 12 Introduction to Pipelining! 32! Questions about control signals!! Following discussion relevant to a single instruction! What about control signals?!! Q: Are all control signals active at the same time?!! A:!?!! Q: Can we generate all these signals at the same time?!! A:!?!

CSE 30321 Lecture 12 Introduction to Pipelining! 33! Passing control w/pipe registers! CSE 30321 Lecture 12 Introduction to Pipelining! 34! Pipelined datapath w/control signals!! Analogy: send instruction with car on assembly line!! Install Corinthian leather interior on car 6 @ stage 3! CSE 30321 Lecture 12 Introduction to Pipelining! 35! CSE 30321 Lecture 12 Introduction to Pipelining! 36! On the board!! Let#s look at hazards!! and how they (generally) impact performance.! Hazards! B!

CSE 30321 Lecture 12 Introduction to Pipelining! 37! The hazards of pipelining!! Pipeline hazards prevent next instruction from executing during designated clock cycle!! There are 3 classes of hazards:!! Structural Hazards:!! Arise from resource conflicts!! HW cannot support all possible combinations of instructions!! Data Hazards:!! Occur when given instruction depends on data from an instruction ahead of it in pipeline!! Control Hazards:!! Result from branch, other instructions that change flow of program (i.e. change PC)! CSE 30321 Lecture 12 Introduction to Pipelining! 38! How do we deal with hazards?!! Often, pipeline must be stalled!! Stalling pipeline usually lets some instruction(s) in pipeline proceed, another/others wait for data, resource, etc.!! A note on terminology:!! If we say an instruction was issued later than instruction x, we mean that it was issued after instruction x and is not as far along in the pipeline!! If we say an instruction was issued earlier than instruction x, we mean that it was issued before instruction x and is further along in the pipeline! CSE 30321 Lecture 12 Introduction to Pipelining! 39! Stalls and performance!! Stalls impede progress of a pipeline and result in deviation from 1 instruction executing/clock cycle!! Pipelining can be viewed to:!! Decrease CPI or clock cycle time for instruction!! Let#s see what affect stalls have on CPI!! CPI pipelined =!! Ideal CPI + Pipeline stall cycles per instruction!! 1 + Pipeline stall cycles per instruction!! Ignoring overhead and assuming stages are balanced:! CSE 30321 Lecture 12 Introduction to Pipelining! 40! Structural hazards!! 1 way to avoid structural hazards is to duplicate resources!! i.e.: An ALU to perform an arithmetic operation and an adder to increment PC!! If not all possible combinations of instructions can be executed, structural hazards occur!! Most common instances of structural hazards:!! When a functional unit not fully pipelined!! When some resource not duplicated enough!! Pipelines stall result of hazards, CPI increased from the usual 1!! If no stalls, speedup equal to # of pipeline stages in ideal case!

CSE 30321 Lecture 12 Introduction to Pipelining! 41! CSE 30321 Lecture 12 Introduction to Pipelining! 42! An example of a structural hazard! How is it resolved?! Load! Load! Instruction 1! Instruction 1! Instruction 2! Instruction 2! Instruction 3! Stall! Bubble" Bubble" Bubble" Bubble" Bubble" Instruction 4! Instruction 3! Time! What$s the problem here?" Time! Pipeline generally stalled by " inserting a bubble or NOP" CSE 30321 Lecture 12 Introduction to Pipelining! 43! Or alternatively! Clock Number! Inst. #! 1! 2! 3! 4! 5! 6! 7! 8! 9! 10! LOAD! IF! ID! EX! MEM! WB! Inst. i+1! IF! ID! EX! MEM! WB! Inst. i+2! IF! ID! EX! MEM! WB! Inst. i+3! stall! IF! ID! EX! MEM! WB! Inst. i+4! IF! ID! EX! MEM! WB! Inst. i+5! IF! ID! EX! MEM! Inst. i+6! IF! ID! EX!! LOAD instruction steals an instruction fetch cycle which will cause the pipeline to stall."! The facts:! CSE 30321 Lecture 12 Introduction to Pipelining! 44! A simple example!! Data references constitute 40% of an instruction mix!! Ideal CPI of the pipelined machine is 1!! A machine with a structural hazard has a clock rate that#s 1.05 times higher than a machine without the hazard.!! How much does this LOAD problem hurt us?!! Recall: Avg. Inst. Time = CPI x Clock Cycle Time!! = (1 + 0.4 x 1) x (Clock cycle time ideal /1.05)!! = 1.3 x Clock cycle time ideal!! Therefore the machine without the hazard is better!! Thus, no instruction completes on clock cycle 8"

CSE 30321 Lecture 12 Introduction to Pipelining! 45! Remember the common case!! All things being equal, a machine without structural hazards will always have a lower CPI.!! But, in some cases it may be better to allow them than to eliminate them.! CSE 30321 Lecture 12 Introduction to Pipelining! 46! What#s the realistic solution?!! Answer: Add more hardware.!! As we#ll see, CPI degrades quickly from our ideal $1# for even the simplest of cases!! These are situations a computer architect might have to consider:!! Is pipelining functional units or duplicating them costly in terms of HW?!! Does structural hazard occur often?!! What#s the common case???!