CSE 2021: Computer Organization

Similar documents
7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)

7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy

Chapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:

Pipelined Processor Design

Lecture Topics. Announcements. Today: Pipelined Processors (P&H ) Next: continued. Milestone #4 (due 2/23) Milestone #5 (due 3/2)

LECTURE 8. Pipelining: Datapath and Control

EECE 321: Computer Organiza5on

Lecture 4: Introduction to Pipelining

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

IF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps

Computer Architecture

RISC Design: Pipelining

A B C D. Ann, Brian, Cathy, & Dave each have one load of clothes to wash, dry, and fold. Time

Pipelining A B C D. Readings: Example: Doing the laundry. Ann, Brian, Cathy, & Dave. each have one load of clothes to wash, dry, and fold

CS 110 Computer Architecture Lecture 11: Pipelining

ECE 2300 Digital Logic & Computer Organization. More Pipelined Microprocessor

Suggested Readings! Lecture 12" Introduction to Pipelining! Example: We have to build x cars...! ...Each car takes 6 steps to build...! ! Readings!

Instruction Level Parallelism. Data Dependence Static Scheduling

RISC Central Processing Unit

Computer Hardware. Pipeline

6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors

CSEN 601: Computer System Architecture Summer 2014

Project 5: Optimizer Jason Ansel

EN164: Design of Computing Systems Lecture 22: Processor / ILP 3

CMP 301B Computer Architecture. Appendix C

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Out-of-Order Execution. Register Renaming. Nima Honarmand

CMSC 611: Advanced Computer Architecture

Dynamic Scheduling I

Asanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.

ECE473 Computer Architecture and Organization. Pipeline: Introduction

Computer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks

Chapter 16 - Instruction-Level Parallelism and Superscalar Processors

Pipelining and ISA Design

Dynamic Scheduling II

Department Computer Science and Engineering IIT Kanpur

Instruction Level Parallelism Part II - Scoreboard

EECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture

Computer Architecture ( L), Fall 2017 HW 3: Branch handling and GPU SOLUTIONS

Instructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona

On the Rules of Low-Power Design

CS420/520 Computer Architecture I

Instruction Level Parallelism III: Dynamic Scheduling

EECS 470. Tomasulo s Algorithm. Lecture 4 Winter 2018

EECS 470 Lecture 5. Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

EE 457 Homework 5 Redekopp Name: Score: / 100_

U. Wisconsin CS/ECE 752 Advanced Computer Architecture I

FMP For More Practice

Problem: hazards delay instruction completion & increase the CPI. Compiler scheduling (static scheduling) reduces impact of hazards

Pipelined Beta. Handouts: Lecture Slides. Where are the registers? Spring /10/01. L16 Pipelined Beta 1

ECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution

COSC4201. Scoreboard

Single-Cycle CPU The following exercises are taken from Hennessy and Patterson, CO&D 2 nd, 3 rd, and 4 th Ed.

CISC 662 Graduate Computer Architecture. Lecture 9 - Scoreboard

CSE502: Computer Architecture CSE 502: Computer Architecture

Parallel architectures Electronic Computers LM

Selected Solutions to Problem-Set #3 COE 608: Computer Organization and Architecture Single Cycle Datapath and Control

Tomasolu s s Algorithm

CS429: Computer Organization and Architecture

Precise State Recovery. Out-of-Order Pipelines

CS61C : Machine Structures

SATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation

CMSC 611: Advanced Computer Architecture

EECS 470. Lecture 9. MIPS R10000 Case Study. Fall 2018 Jon Beaumont

Multiple Predictors: BTB + Branch Direction Predictors

EECS150 - Digital Design Lecture 2 - Synchronous Digital Systems Review Part 1. Outline

Pipelined Architecture (2A) Young Won Lim 4/10/18

Pipelined Architecture (2A) Young Won Lim 4/7/18

CS521 CSE IITG 11/23/2012

CSE502: Computer Architecture CSE 502: Computer Architecture

CS61c: Introduction to Synchronous Digital Systems

Topics. Low Power Techniques. Based on Penn State CSE477 Lecture Notes 2002 M.J. Irwin and adapted from Digital Integrated Circuits 2002 J.

Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors

CS61C : Machine Structures

EECS 470 Lecture 8. P6 µarchitecture. Fall 2018 Jon Beaumont Core 2 Microarchitecture

Computer Elements and Datapath. Microarchitecture Implementation of an ISA

A Static Power Model for Architects

How a processor can permute n bits in O(1) cycles

CSE502: Computer Architecture Welcome to CSE 502

Single vs. Mul2- cycle MIPS. Single Clock Cycle Length

EE382V-ICS: System-on-a-Chip (SoC) Design

DAT105: Computer Architecture

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #20. Warehouse Scale Computer

Lecture Topics. Announcements. Today: Memory Management (Stallings, chapter ) Next: continued. Self-Study Exercise #6. Project #4 (due 10/11)

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Meltdown & Spectre. Side-channels considered harmful. Qualcomm Mobile Security Summit May, San Diego, CA. Moritz Lipp

You are Here! Processor Design Process. Agenda. Agenda 10/25/12. CS 61C: Great Ideas in Computer Architecture Single Cycle MIPS CPU Part II

Tomasulo s Algorithm. Tomasulo s Algorithm

Metrics How to improve performance? CPI MIPS Benchmarks CSC3501 S07 CSC3501 S07. Louisiana State University 4- Performance - 1

Best Instruction Per Cycle Formula >>>CLICK HERE<<<

CZ3001 ADVANCED COMPUTER ARCHITECTURE

Issue. Execute. Finish

Compiler Optimisation

Power Issues with Embedded Systems. Rabi Mahapatra Computer Science

Outline Single Cycle Processor Design Multi cycle Processor. Pipelined Processor Design. Overall clock period. Analyzing performance 3/18/2015

Flexibility, Speed and Accuracy in VLIW Architectures Simulation and Modeling

Embedded Hardware (1) Kai Huang

Transcription:

CSE 2021: Computer Organization Lecture-11 CPU Design : Pipelining-2 Review, Hazards Shakil M. Khan

IF for Load (Review) CSE-2021 July-14-2011 2

ID for Load (Review) CSE-2021 July-14-2011 3

EX for Load (Review) CSE-2021 July-14-2011 4

MEM for Load (Review) CSE-2021 July-14-2011 5

WB for Load (Review) Wrong register number CSE-2021 July-14-2011 6

Corrected Datapath for Load (Review) CSE-2021 July-14-2011 7

Pipelined Control (Review) CSE-2021 July-14-2011 8

Data Hazards in ALU Instructions Consider this sequence: sub $2, $1,$3 and $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) We can resolve hazards with forwarding how do we detect when to forward? CSE-2021 July-14-2011 9

Dependencies & Forwarding CSE-2021 July-14-2011 10

Detecting the Need to Forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg CSE-2021 July-14-2011 11

Detecting the Need to Forward But only if forwarding instruction will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite And only if Rd for that instruction is not $zero EX/MEM.RegisterRd 0, MEM/WB.RegisterRd 0 CSE-2021 July-14-2011 12

Forwarding Paths CSE-2021 July-14-2011 13

Forwarding Conditions EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 CSE-2021 July-14-2011 14

Forwarding Conditions MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 CSE-2021 July-14-2011 15

Double Data Hazard Consider the sequence: add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Both hazards occur want to use the most recent Revise MEM hazard condition only fwd if EX hazard condition isn t true CSE-2021 July-14-2011 16

Revised Forwarding Condition MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 CSE-2021 July-14-2011 17

Datapath with Forwarding CSE-2021 July-14-2011 18

Load-Use Data Hazard Need to stall for one cycle CSE-2021 July-14-2011 19

Load-Use Hazard Detection Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard when ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall and insert bubble CSE-2021 July-14-2011 20

How to Stall the Pipeline Force control values in ID/EX register to 0 EX, MEM and WB do nop (no-operation) Prevent update of PC and IF/ID register using instruction is decoded again following instruction is fetched again 1-cycle stall allows MEM to read data for lw can subsequently forward to EX stage CSE-2021 July-14-2011 21

Stall/Bubble in the Pipeline Stall inserted here CSE-2021 July-14-2011 22

Stall/Bubble in the Pipeline Or, more accurately CSE-2021 July-14-2011 23

Datapath with Hazard Detection CSE-2021 July-14-2011 24

Stalls and Performance Stalls reduce performance but are required to get correct results Compiler can arrange code to avoid hazards and stalls requires knowledge of the pipeline structure The BIG Picture CSE-2021 July-14-2011 25

Branch Hazards If branch outcome determined in MEM Flush these instructions (Set control values to 0) PC CSE-2021 July-14-2011 26

Reducing Branch Delay Move hardware to determine outcome to ID stage move target address adder (easy) add register comparator (hard) need additional forwarding h/w as operands might depend on previous instruction CSE-2021 July-14-2011 27

Example: Branch Taken 36: sub $10, $4, $8 40: beq $1, $3, 7 44: and $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $7... 72: lw $4, 50($7) CSE-2021 July-14-2011 28

Example: Branch Taken CSE-2021 July-14-2011 29

Example: Branch Taken CSE-2021 July-14-2011 30

Data Hazards for Branches If a comparison register is a destination of 2 nd or 3 rd preceding ALU instruction add $1, $2, $3 IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB IF ID EX MEM WB beq $1, $4, target IF ID EX MEM WB Can resolve using forwarding CSE-2021 July-14-2011 31

Data Hazards for Branches If a comparison register is a destination of preceding ALU instruction or 2 nd preceding load instruction need 1 stall cycle lw $1, addr IF ID EX MEM WB add $4, $5, $6 IF ID EX MEM WB beq stalled IF ID beq $1, $4, target ID EX MEM WB CSE-2021 July-14-2011 32

Data Hazards for Branches If a comparison register is a destination of immediately preceding load instruction need 2 stall cycles lw $1, addr IF ID EX MEM WB beq stalled IF ID beq stalled ID beq $1, $0, target ID EX MEM WB CSE-2021 July-14-2011 33

Dynamic Branch Prediction In deeper and superscalar pipelines, branch penalty is more significant Use dynamic prediction branch prediction buffer (aka branch history table) indexed by recent branch instruction addresses stores outcome (taken/not taken) to execute a branch check table, expect the same outcome start fetching from fall-through or target if wrong, flush pipeline and flip prediction CSE-2021 July-14-2011 34

1-Bit Predictor: Shortcoming Inner loop branches mispredicted twice! outer: inner: beq,, inner beq,, outer Mispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around CSE-2021 July-14-2011 35

2-Bit Predictor Only change prediction on two successive mispredictions CSE-2021 July-14-2011 36

Calculating the Branch Target Even with predictor, still need to calculate the target address 1-cycle penalty for a taken branch Branch target buffer cache of target addresses indexed by PC when instruction fetched if hit and instruction is branch predicted taken, can fetch target immediately CSE-2021 July-14-2011 37

Concluding Remarks ISA influences design of datapath and control Datapath and control influence design of ISA Pipelining improves instruction throughput using parallelism more instructions completed per second latency for each instruction not reduced Hazards: structural, data, control Multiple issue and dynamic scheduling (ILP) dependencies limit achievable parallelism complexity leads to the power wall CSE-2021 July-14-2011 38